Lambdas and Streams

Flashcards for topic Lambdas and Streams

Intermediate61 cardsGeneral

Preview Cards

Card 1

Front

What are the key differences between method references and lambdas, and when should you prefer one over the other?

Back

Method references provide a more concise alternative to lambdas when referencing existing methods:

Differences:

Method references are typically shorter and clearer for simple method calls
Lambdas excel when computation logic doesn't match an existing method
Lambdas can provide descriptive parameter names as documentation
Method references can't access or manipulate parameters beyond passing them to the method

When to prefer method references:

When referencing a method that exactly matches the functional interface
When the method name clearly describes the operation

When brevity improves readability

// Method reference: clean and concise
words.sort(comparingInt(String::length));

// Lambda: more verbose for this case
words.sort((s1, s2) -> Integer.compare(s1.length(), s2.length()));

When to prefer lambdas:

When parameter names provide useful documentation
When minor computation is needed beyond a method call

When the method is in the same class and has a very long name

// Lambda preferred here (method name is excessive)
service.execute(() -> action());
// vs less readable reference
service.execute(GoshThisClassNameIsHumongous::action);

Rule of thumb: Where method references are shorter and clearer, use them; where they aren't, stick with lambdas.

Card 2

Front

What is the key distinction between bound and unbound instance method references in Java 8, and when would you use each?

Back

Distinction between bound and unbound instance method references:

Bound Method References:

The receiving object is specified in the method reference itself
Syntax: objectInstance::instanceMethod
Example: Instant.now()::isAfter
Equivalent lambda: t -> Instant.now().isAfter(t)
The object that will execute the method is determined when the reference is created
Function signature matches the method's parameter list exactly

Unbound Method References:

The receiving object is provided later as the first parameter when the method is invoked
Syntax: ClassName::instanceMethod
Example: String::toLowerCase
Equivalent lambda: str -> str.toLowerCase()
The object that will execute the method is the first parameter passed to the function
Function signature includes an additional first parameter for the instance itself

When to use each:

Use bound references when:
- You already have the specific object instance that should perform the operation
- The object state is important to the function's behavior
- You need to capture a specific object's state at reference creation time
Use unbound references when:
- You want to apply an instance method to objects provided later
- The method will be used in operations like mapping or filtering collections
- You're processing streams of objects that should invoke their own methods

Card 3

Front

What are the three primary forms of the toMap collector and when would you use each one?

Back

The three forms of toMap collector:

Basic form: toMap(keyMapper, valueMapper)
- Use when each element maps to a unique key
- Throws IllegalStateException if multiple elements map to the same key
```
Map<String, Operation> stringToEnum = Stream.of(values())
    .collect(toMap(Object::toString, e -> e));
```

Merge function form: toMap(keyMapper, valueMapper, mergeFunction)

Use when handling collisions (multiple elements map to the same key)
The merge function combines values with the same key

// Last-write-wins policy
toMap(keyMapper, valueMapper, (v1, v2) -> v2)

// Find maximum value for each key
Map<Artist, Album> topHits = albums.collect(
    toMap(Album::artist, a->a, maxBy(comparing(Album::sales))));

Map factory form: toMap(keyMapper, valueMapper, mergeFunction, mapFactory)
- Use when you need a specific Map implementation (EnumMap, TreeMap, etc.)
- The mapFactory parameter creates the result container
```
toMap(keyMapper, valueMapper, mergeFunction, TreeMap::new)
```

Card 4

Front

How do you choose the appropriate return type for methods that return sequences of elements in Java, and what are the trade-offs?

Back

Guidelines for choosing sequence return types in Java:

Collection interface (preferred when applicable)
- Use when clients may want to:
  - Iterate multiple times
  - Check membership with contains()
  - Know the size immediately
- Specific interfaces:
  - List: When order matters or duplicates are allowed
  - Set: When uniqueness is guaranteed
  - Map: For key-value associations
Iterable interface
- Use when:
  - The method exists primarily for for-each loops
  - The sequence can't efficiently implement Collection methods
  - You only need sequential access
Array
- Use when:
  - Working with primitive values (to avoid boxing)
  - Performance is critical
  - API expects arrays (compatibility)
Stream
- Use when:
  - Performing functional-style operations
  - Laziness is beneficial (e.g., infinite sequences)
  - Parallelism might be desirable

Collection AND Stream

Best practice: Return a Collection that clients can stream if needed

// Clients can either iterate or stream as needed
public List<Element> getElements() { ... }

// Usage:
for (Element e : obj.getElements()) { ... } // Iteration
obj.getElements().stream()... // Streaming

Recommendation: Prefer Collection over Stream as a return type when possible, as it allows clients to choose their processing model.

Trade-off: If providing both would be expensive, choose based on the most likely use case.

Card 5

Front

What are the most significant limitations of using Collection as a return type compared to Stream or Iterable?

Back

Limitations of Collection as return type:

Size constraints:
- Collection.size() returns int, limiting sequence length to Integer.MAX_VALUE
- Stream and Iterable can represent infinite sequences
Memory requirements:
- Collections typically store all elements in memory
- Streams can be lazily evaluated, processing elements on demand
- Large datasets may cause OutOfMemoryError with collections
Implementation complexity:
- Must implement contains() and size() in addition to iterator()
- These operations may be inefficient or impossible to implement for some sequences
- More complex to implement than a simple Stream or Iterable
Practical limitations:
- For large sequences like power sets (2^n elements), collections become impractical quickly
- Power sets become unmanageable at n>30 due to size constraints

While Collection provides both iteration and streaming capabilities, these limitations make Stream or Iterable preferable for very large or infinite sequences.

Card 6

Front

What is the "locality of reference" concept and why is it critical for effective parallelization of Java streams?

Back

Locality of reference:

Definition: The property where data elements that are accessed together are also stored physically close together in memory.

Why it's critical for parallel streams:

Memory access efficiency:
- When elements are stored contiguously, processors can load chunks into cache more efficiently
- Reduces cache misses, which are extremely expensive in terms of performance
- Without good locality, threads spend significant time idle, waiting for data from main memory
Impact on parallelization:
- Modern CPUs are much faster than memory access
- Memory access becomes the bottleneck in data-intensive operations
- Good locality dramatically reduces thread starvation
Data structures with best locality:
- Primitive arrays (best): Data stored in continuous memory blocks
- ArrayList: Elements stored in backing array
- Arrays.asList results: Backed by original array
- Linked data structures (poor): Nodes scattered throughout memory
Practical implications:
- Parallelizing operations on ArrayList is more efficient than LinkedList
- Operations on primitive arrays generally parallelize most effectively
- Collections with poor locality may see minimal or negative benefits from parallelization
Optimizing for locality:
- Prefer array-backed collections when parallelization is planned
- Consider reorganizing data for better spatial locality before parallelizing
- Be especially cautious with parallelizing operations on linked data structures

Good locality of reference can often be the difference between significant speedups and disappointing slowdowns when parallelizing streams.

Card 7

Front

How does the BigInteger.isProbablePrime(int certainty) method work, and why is it particularly suitable for parallelization?

Back

BigInteger.isProbablePrime(int certainty) works as follows:

It implements the Miller-Rabin primality test, a probabilistic algorithm
The certainty parameter (50 in the example) determines how many rounds of testing to perform
Higher values increase confidence but take longer
Returns false for definite composites, true for probable primes
The probability of a false positive is less than 2^(-certainty)

Suitable for parallelization because:

CPU-intensive with no I/O waiting
Completely independent per number (no shared state)
Substantial work per element (especially for larger numbers)
Fixed memory footprint (doesn't create excessive objects)
No side effects or ordering requirements
Computation time scales predictably with input size

These properties create an "embarrassingly parallel" problem where performance scales almost linearly with available cores.

Card 8

Front

How do you implement bidirectional adapter methods between Stream and Iterable in Java, and what are the key considerations for using each adapter?

Back

Stream → Iterable Adapter

public static <E> Iterable<E> iterableOf(Stream<E> stream) {
    return stream::iterator;
}

Iterable → Stream Adapter

public static <E> Stream<E> streamOf(Iterable<E> iterable) {
    return StreamSupport.stream(iterable.spliterator(), false);
}

Key Implementation Details:

Stream to Iterable uses method reference syntax (stream::iterator) to return the stream's iterator method
Iterable to Stream leverages StreamSupport.stream() with the iterable's spliterator
The boolean parameter (false) in StreamSupport.stream() indicates non-parallel processing

When to Use Stream → Iterable:

To use for-each loop syntax with a Stream
To pass Stream data to methods that expect an Iterable
When you need to iterate the same elements multiple times (note: creates a new Stream each time)

When to Use Iterable → Stream:

To apply stream operations (map, filter, etc.) on an Iterable collection
For functional-style processing of legacy Iterable data
When you need lazy evaluation or parallel processing on Iterable data

Usage Example:

// Stream to Iterable
for (ProcessHandle p : iterableOf(ProcessHandle.allProcesses())) {
    // Process each handle with for-each syntax
}

// Iterable to Stream
List<String> names = new ArrayList<>();
streamOf(names)
    .filter(name -> name.startsWith("A"))
    .map(String::toUpperCase)
    .forEach(System.out::println);

Card 9

Front

What principles should guide your selection between standard and custom functional interfaces when designing Java 8+ APIs that accept functions?

Back

Core Decision Framework

Default approach: Strongly prefer standard interfaces (java.util.function) unless you have compelling reasons not to.
Selection process for standard interfaces:
- Match parameter/return types to appropriate interface:
  - Same input/output type → UnaryOperator<T> or BinaryOperator<T>
  - Boolean return → Predicate<T> or BiPredicate<T,U>
  - Different input/output types → Function<T,R> or BiFunction<T,U,R>
  - Void return → Consumer<T> or BiConsumer<T,U>
  - No parameters → Supplier<T>
- Consider primitive specializations for performance-critical code

Benefits of Standard Interfaces

Reduces API surface area and cognitive load for users
Leverages built-in default methods (e.g., Predicate.and(), Function.compose())
Improves interoperability with existing code and libraries
Follows familiar patterns developers already understand

When to Create Custom Interfaces

Only create custom functional interfaces when:

It has a clear specialty not covered by the 43 standard interfaces
It serves as a conceptual type in your domain model
It will be used in many places, justifying specialized documentation
It benefits from special domain-specific methods
Strong typing adds significant safety or clarity

Custom Interface Requirements

Always annotate with @FunctionalInterface
Provide informative Javadoc
Follow naming conventions of standard interfaces
Consider serialization requirements

Example: Standard vs Custom Interface

// BAD: Unnecessary custom interface
@FunctionalInterface
interface EldestEntryRemovalFunction<K,V> {
    boolean remove(Map<K,V> map, Map.Entry<K,V> eldest);
}

// GOOD: Standard interface for the same purpose
BiPredicate<Map<K,V>, Map.Entry<K,V>>

// Usage remains intuitive
cacheWithEviction((map, entry) -> map.size() > 100);

Card 10

Front

What are the comprehensive risks and failure modes of parallelizing Java stream pipelines, and what specific conditions lead to these issues?

Back

Parallelization Risk Categories

Performance Degradation
- Overhead of splitting/merging exceeds benefits
- Poor memory locality causing thread waiting
- Inefficient parallelization sources:
  - Stream.iterate() operations
  - Operations with limit() intermediate operations
  - Small datasets where overhead dominates
- Memory bandwidth saturation
- Context switching overhead
Correctness Failures
- Race conditions from non-thread-safe operations
- Incorrect results due to violating parallel stream requirements:
  - Non-associative reduction operations
  - Non-interference violations (modifying shared state)
  - Stateful lambda expressions (relying on mutable external state)
- Order-dependent operations failing due to out-of-order processing
Liveness Failures
- Program hangs indefinitely while appearing active
- CPU usage spikes to 100% with no progress
- Deadlocks from resource contention
- Particularly difficult to diagnose and debug
System-Wide Impact
- All parallel streams share common fork-join pool
- One misbehaving stream affects entire application
- Resource contention affecting unrelated system components

Manifestation Patterns

Intermittent failures depending on thread scheduling
Non-deterministic results varying between runs
Dramatically worse performance than sequential version
Data corruption appearing randomly
System-wide performance degradation

Best Practice: Always benchmark performance before/after parallelization under realistic conditions, and verify correctness with thorough testing across multiple runs.

Showing 10 of 61 cards. Add this deck to your collection to see all cards.