Introduction
Throughout this Stream API series, we’ve used many built-in collectors:
toList()toSet()groupingBy()partitioningBy()mapping()joining()counting()summingInt()averagingDouble()maxBy()minBy()
These collectors solve most day-to-day programming problems.
But have you ever wondered:
- How does
Collectors.toList()work? - How does Java know how to accumulate Stream elements?
- How does
groupingBy()create maps? - How are partial results merged when using Parallel Streams?
- Can we create our own collectors?
The answer is yes.
Every collector in the JDK is built using the same mechanism available to application developers.
Understanding this mechanism will fundamentally change how you think about the Stream API.
By the end of this article, you’ll understand how collectors work internally and be able to build your own production-ready collectors.
Learning Objectives
By the end of this article, you will be able to:
- Understand the
Collectorinterface. - Learn the lifecycle of a Collector.
- Implement custom collectors.
- Understand mutable reduction.
- Learn the roles of Supplier, Accumulator, Combiner, Finisher, and Characteristics.
- Build enterprise-grade collectors.
- Understand how Parallel Streams merge partial results.
Why Build a Custom Collector?
Most enterprise applications eventually require aggregations that are not provided by the JDK.
Examples include:
- Building invoice summaries
- Customer dashboards
- Financial statements
- Regulatory reports
- Risk calculations
- Custom statistical models
- Domain-specific DTO generation
Instead of writing multiple loops, we can encapsulate the aggregation logic inside reusable collectors.
The Collector Interface
At the heart of the framework is:
Collector<T, A, R>
Many developers find these type parameters intimidating.
Let’s break them down.
| Type | Meaning |
|---|---|
T | Stream element type |
A | Mutable accumulation container |
R | Final result type |
Example:
Collector<Employee,
List<Employee>,
List<Employee>>
- Stream contains
Employee - Accumulator is
List<Employee> - Final result is
List<Employee>
A collector may also transform the accumulated container into a different result.
Example:
Collector<Employee,
List<Employee>,
Integer>
Here, employees are accumulated in a list and the final result is the size of that list.
Collector Lifecycle
Every collector follows the same lifecycle.
Create Container
│
▼
Receive Stream Element
│
▼
Accumulate
│
▼
More Elements?
│
▼
Combine Partial Results (Parallel)
│
▼
Finish
Let’s examine each stage.
Supplier
Creates the initial mutable container.
Example:
ArrayList::new
Equivalent Java:
() -> new ArrayList<>()
This method is invoked once per accumulation container. In parallel Streams, multiple containers may be created.
Accumulator
Processes each Stream element.
Example:
(list, employee) -> list.add(employee)
Every element passes through the accumulator.
Visual representation:
Employee 1
↓
Add to List
↓
Employee 2
↓
Add to List
↓
Employee 3
↓
Add to List
Combiner
The combiner becomes important when processing Parallel Streams.
Imagine four worker threads.
Thread A
Alice
Bob
Thread B
Carol
David
Each thread builds its own partial result.
The combiner merges them.
Alice
Bob
Carol
David
Without a combiner, Parallel Streams could not produce a single result.
Finisher
Sometimes the accumulated container is already the final result.
Example:
List<Employee>
No conversion is necessary.
Other times, a transformation is required.
Example:
List<Employee>
↓
EmployeeSummary
The finisher performs this final conversion.
Characteristics
Collectors expose optimization hints to the Stream framework.
The most common are:
IDENTITY_FINISH
The accumulation container is already the final result.
No finishing transformation is required.
CONCURRENT
Multiple threads can safely accumulate into the same container.
This characteristic is useful only when the collector and the data structure are designed for concurrent updates.
UNORDERED
The collector does not rely on encounter order.
This allows additional optimization opportunities during parallel execution.
Building Our First Custom Collector
Suppose we want to collect employee names into a single string separated by " | ".
Instead of using Collectors.joining(), we’ll build it ourselves.
public class EmployeeNameCollector
implements Collector<Employee,
StringJoiner,
String> {
@Override
public Supplier<StringJoiner> supplier() {
return () -> new StringJoiner(" | ");
}
@Override
public BiConsumer<StringJoiner, Employee> accumulator() {
return (joiner, employee) ->
joiner.add(employee.getName());
}
@Override
public BinaryOperator<StringJoiner> combiner() {
return StringJoiner::merge;
}
@Override
public Function<StringJoiner, String> finisher() {
return StringJoiner::toString;
}
@Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
}
Usage:
String names = employees.stream()
.collect(new EmployeeNameCollector());
Although Collectors.joining() already exists, this example demonstrates the complete Collector lifecycle.
Enterprise Example – Customer Summary Collector
Suppose a banking system needs a summary object containing:
- Total customers
- Total balance
- Average balance
- Highest balance
- Lowest balance
Instead of multiple traversals, a custom collector can accumulate all statistics in one pass.
CustomerSummary
↓
count
totalBalance
averageBalance
highestBalance
lowestBalance
This approach is efficient and encapsulates reporting logic in a reusable component.
Spring Boot Integration
Imagine a reporting service.
@Service
public class CustomerReportService {
public CustomerSummary buildSummary() {
return repository.findAll()
.stream()
.collect(new CustomerSummaryCollector());
}
}
The service layer remains clean, while the aggregation logic resides in a dedicated collector.
Migration from Java 7
Java 7
double total = 0;
double max = Double.MIN_VALUE;
double min = Double.MAX_VALUE;
for (Customer customer : customers) {
total += customer.getBalance();
max = Math.max(max, customer.getBalance());
min = Math.min(min, customer.getBalance());
}
Java 8+
CustomerSummary summary =
customers.stream()
.collect(new CustomerSummaryCollector());
The custom collector hides the implementation details and promotes reuse.
Parallel Stream Behavior
When a Parallel Stream executes:
Source
↓
Split
↓
Thread 1
Thread 2
Thread 3
↓
Partial Results
↓
Combiner
↓
Final Result
A correctly implemented combiner ensures that partial results are merged without losing data.
Performance Considerations
- Prefer built-in collectors when they satisfy your requirements.
- Create custom collectors only for domain-specific aggregation.
- Ensure accumulator and combiner implementations are efficient.
- Design collectors to minimize object creation.
- Thoroughly test collectors before using them with Parallel Streams.
Common Mistakes
Ignoring the Combiner
A collector that works with sequential Streams may fail with Parallel Streams if the combiner is incorrect.
Returning Mutable Internal State
Avoid exposing mutable accumulation containers directly when the final result should be immutable.
Marking CONCURRENT Incorrectly
Only use the CONCURRENT characteristic when the accumulator supports safe concurrent updates and the collection semantics allow it.
Best Practices
- Keep collectors focused on a single responsibility.
- Prefer immutable result objects.
- Reuse collectors across services.
- Unit test both sequential and parallel execution paths.
- Document collector behavior, especially if it will be shared across teams.
Interview Questions
What is the purpose of the accumulator?
It processes each Stream element and updates the mutable accumulation container.
Why is the combiner required?
It merges partial results produced during parallel execution.
What does IDENTITY_FINISH mean?
The accumulator object is already the final result, so no finishing transformation is necessary.
Should custom collectors replace built-in collectors?
No. Built-in collectors should be preferred whenever they satisfy the business requirement. Custom collectors are most valuable for domain-specific aggregation.
Hands-On Exercise
Build a Spring Boot application that implements a custom CustomerSummaryCollector to produce a report containing:
- Total customers.
- Total account balance.
- Average balance.
- Highest balance.
- Lowest balance.
- Number of premium customers.
Expose the summary through a REST endpoint and compare the implementation with a Java 7 solution using multiple loops.
Summary
Custom collectors allow developers to encapsulate complex aggregation logic in reusable, testable components. By understanding the Collector interface and its lifecycle—Supplier, Accumulator, Combiner, Finisher, and Characteristics—you gain insight into how the JDK itself implements the Collectors framework.
This knowledge is particularly valuable when building enterprise reporting engines, financial summaries, and analytics pipelines that go beyond the capabilities of the standard collectors.
Coming Up Next
Part 12 – Parallel Streams and Spliterator: How the Stream API Executes Work Across Multiple Cores
We’ll look beneath the Stream API to understand how data is split, how tasks are distributed across threads, how the ForkJoinPool coordinates parallel execution, and when parallel Streams improve performance—or make it worse.