Introduction

Throughout this Stream API series, we’ve used many built-in collectors:

toList()
toSet()
groupingBy()
partitioningBy()
mapping()
joining()
counting()
summingInt()
averagingDouble()
maxBy()
minBy()

These collectors solve most day-to-day programming problems.

But have you ever wondered:

How does Collectors.toList() work?
How does Java know how to accumulate Stream elements?
How does groupingBy() create maps?
How are partial results merged when using Parallel Streams?
Can we create our own collectors?

The answer is yes.

Every collector in the JDK is built using the same mechanism available to application developers.

Understanding this mechanism will fundamentally change how you think about the Stream API.

By the end of this article, you’ll understand how collectors work internally and be able to build your own production-ready collectors.

Learning Objectives

By the end of this article, you will be able to:

Understand the Collector interface.
Learn the lifecycle of a Collector.
Implement custom collectors.
Understand mutable reduction.
Learn the roles of Supplier, Accumulator, Combiner, Finisher, and Characteristics.
Build enterprise-grade collectors.
Understand how Parallel Streams merge partial results.

Why Build a Custom Collector?

Most enterprise applications eventually require aggregations that are not provided by the JDK.

Examples include:

Building invoice summaries
Customer dashboards
Financial statements
Regulatory reports
Risk calculations
Custom statistical models
Domain-specific DTO generation

Instead of writing multiple loops, we can encapsulate the aggregation logic inside reusable collectors.

The Collector Interface

At the heart of the framework is:

Collector<T, A, R>

Many developers find these type parameters intimidating.

Let’s break them down.

Type	Meaning
`T`	Stream element type
`A`	Mutable accumulation container
`R`	Final result type

Example:

Collector<Employee,
          List<Employee>,
          List<Employee>>

Stream contains Employee
Accumulator is List<Employee>
Final result is List<Employee>

A collector may also transform the accumulated container into a different result.

Example:

Collector<Employee,
          List<Employee>,
          Integer>

Here, employees are accumulated in a list and the final result is the size of that list.

Collector Lifecycle

Every collector follows the same lifecycle.

Create Container
        │
        ▼
Receive Stream Element
        │
        ▼
Accumulate
        │
        ▼
More Elements?
        │
        ▼
Combine Partial Results (Parallel)
        │
        ▼
Finish

Let’s examine each stage.

Supplier

Creates the initial mutable container.

Example:

ArrayList::new

Equivalent Java:

() -> new ArrayList<>()

This method is invoked once per accumulation container. In parallel Streams, multiple containers may be created.

Accumulator

Processes each Stream element.

Example:

(list, employee) -> list.add(employee)

Every element passes through the accumulator.

Visual representation:

Employee 1

↓

Add to List

↓

Employee 2

↓

Add to List

↓

Employee 3

↓

Add to List

Combiner

The combiner becomes important when processing Parallel Streams.

Imagine four worker threads.

Thread A

Alice

Bob

Thread B

Carol

David

Each thread builds its own partial result.

The combiner merges them.

Alice

Bob

Carol

David

Without a combiner, Parallel Streams could not produce a single result.

Finisher

Sometimes the accumulated container is already the final result.

Example:

List<Employee>

No conversion is necessary.

Other times, a transformation is required.

Example:

List<Employee>

↓

EmployeeSummary

The finisher performs this final conversion.

Characteristics

Collectors expose optimization hints to the Stream framework.

The most common are:

`IDENTITY_FINISH`

The accumulation container is already the final result.

No finishing transformation is required.

`CONCURRENT`

Multiple threads can safely accumulate into the same container.

This characteristic is useful only when the collector and the data structure are designed for concurrent updates.

`UNORDERED`

The collector does not rely on encounter order.

This allows additional optimization opportunities during parallel execution.

Building Our First Custom Collector

Suppose we want to collect employee names into a single string separated by " | ".

Instead of using Collectors.joining(), we’ll build it ourselves.

public class EmployeeNameCollector
        implements Collector<Employee,
                             StringJoiner,
                             String> {

    @Override
    public Supplier<StringJoiner> supplier() {

        return () -> new StringJoiner(" | ");

    }

    @Override
    public BiConsumer<StringJoiner, Employee> accumulator() {

        return (joiner, employee) ->
                joiner.add(employee.getName());

    }

    @Override
    public BinaryOperator<StringJoiner> combiner() {

        return StringJoiner::merge;

    }

    @Override
    public Function<StringJoiner, String> finisher() {

        return StringJoiner::toString;

    }

    @Override
    public Set<Characteristics> characteristics() {

        return Collections.emptySet();

    }

}

Usage:

String names = employees.stream()
        .collect(new EmployeeNameCollector());

Although Collectors.joining() already exists, this example demonstrates the complete Collector lifecycle.

Enterprise Example – Customer Summary Collector

Suppose a banking system needs a summary object containing:

Total customers
Total balance
Average balance
Highest balance
Lowest balance

Instead of multiple traversals, a custom collector can accumulate all statistics in one pass.

CustomerSummary

↓

count

totalBalance

averageBalance

highestBalance

lowestBalance

This approach is efficient and encapsulates reporting logic in a reusable component.

Spring Boot Integration

Imagine a reporting service.

@Service
public class CustomerReportService {

    public CustomerSummary buildSummary() {

        return repository.findAll()

                .stream()

                .collect(new CustomerSummaryCollector());

    }

}

The service layer remains clean, while the aggregation logic resides in a dedicated collector.

Migration from Java 7

Java 7

double total = 0;
double max = Double.MIN_VALUE;
double min = Double.MAX_VALUE;

for (Customer customer : customers) {

    total += customer.getBalance();

    max = Math.max(max, customer.getBalance());

    min = Math.min(min, customer.getBalance());
}

Java 8+

CustomerSummary summary =
        customers.stream()
                 .collect(new CustomerSummaryCollector());

The custom collector hides the implementation details and promotes reuse.

Parallel Stream Behavior

When a Parallel Stream executes:

Source

↓

Split

↓

Thread 1

Thread 2

Thread 3

↓

Partial Results

↓

Combiner

↓

Final Result

A correctly implemented combiner ensures that partial results are merged without losing data.

Performance Considerations

Prefer built-in collectors when they satisfy your requirements.
Create custom collectors only for domain-specific aggregation.
Ensure accumulator and combiner implementations are efficient.
Design collectors to minimize object creation.
Thoroughly test collectors before using them with Parallel Streams.

Common Mistakes

Ignoring the Combiner

A collector that works with sequential Streams may fail with Parallel Streams if the combiner is incorrect.

Returning Mutable Internal State

Avoid exposing mutable accumulation containers directly when the final result should be immutable.

Marking `CONCURRENT` Incorrectly

Only use the CONCURRENT characteristic when the accumulator supports safe concurrent updates and the collection semantics allow it.

Best Practices

Keep collectors focused on a single responsibility.
Prefer immutable result objects.
Reuse collectors across services.
Unit test both sequential and parallel execution paths.
Document collector behavior, especially if it will be shared across teams.

Interview Questions

What is the purpose of the accumulator?

It processes each Stream element and updates the mutable accumulation container.

Why is the combiner required?

It merges partial results produced during parallel execution.

What does `IDENTITY_FINISH` mean?

The accumulator object is already the final result, so no finishing transformation is necessary.

Should custom collectors replace built-in collectors?

No. Built-in collectors should be preferred whenever they satisfy the business requirement. Custom collectors are most valuable for domain-specific aggregation.

Hands-On Exercise

Build a Spring Boot application that implements a custom CustomerSummaryCollector to produce a report containing:

Total customers.
Total account balance.
Average balance.
Highest balance.
Lowest balance.
Number of premium customers.

Expose the summary through a REST endpoint and compare the implementation with a Java 7 solution using multiple loops.

Summary

Custom collectors allow developers to encapsulate complex aggregation logic in reusable, testable components. By understanding the Collector interface and its lifecycle—Supplier, Accumulator, Combiner, Finisher, and Characteristics—you gain insight into how the JDK itself implements the Collectors framework.

This knowledge is particularly valuable when building enterprise reporting engines, financial summaries, and analytics pipelines that go beyond the capabilities of the standard collectors.

Coming Up Next

Part 12 – Parallel Streams and Spliterator: How the Stream API Executes Work Across Multiple Cores

We’ll look beneath the Stream API to understand how data is split, how tasks are distributed across threads, how the ForkJoinPool coordinates parallel execution, and when parallel Streams improve performance—or make it worse.

Part 11: Custom Collectors – Building Your Own Aggregation Engine

Introduction

Learning Objectives

Why Build a Custom Collector?

The Collector Interface

Collector Lifecycle

Supplier

Accumulator

Combiner

Finisher

Characteristics

`IDENTITY_FINISH`

`CONCURRENT`

`UNORDERED`

Building Our First Custom Collector

Enterprise Example – Customer Summary Collector

Spring Boot Integration

Migration from Java 7

Java 7

Java 8+

Parallel Stream Behavior

Performance Considerations

Common Mistakes

Ignoring the Combiner

Returning Mutable Internal State

Marking `CONCURRENT` Incorrectly

Best Practices

Interview Questions

What is the purpose of the accumulator?

Why is the combiner required?

What does `IDENTITY_FINISH` mean?

Should custom collectors replace built-in collectors?

Hands-On Exercise

Summary

Coming Up Next

Leave a Reply Cancel reply

Introduction

Learning Objectives

Why Build a Custom Collector?

The Collector Interface

Collector Lifecycle

Supplier

Accumulator

Combiner

Finisher

Characteristics

IDENTITY_FINISH

CONCURRENT

UNORDERED

Building Our First Custom Collector

Enterprise Example – Customer Summary Collector

Spring Boot Integration

Migration from Java 7

Java 7

Java 8+

Parallel Stream Behavior

Performance Considerations

Common Mistakes

Ignoring the Combiner

Returning Mutable Internal State

Marking CONCURRENT Incorrectly

Best Practices

Interview Questions

What is the purpose of the accumulator?

Why is the combiner required?

What does IDENTITY_FINISH mean?

Should custom collectors replace built-in collectors?

Hands-On Exercise

Summary

Coming Up Next

Leave a Reply Cancel reply

Exception Handling in Java: From Throwable to Global Exception Handling in Spring Boot

Java 8 to 21 journey & Graal VM

Facade Design Pattern in Java: Simplifying Complex Systems Behind a Single Interface

Part 34: Java 16 – Pattern Matching for instanceof – Smarter Type Checking with Less Boilerplate

`IDENTITY_FINISH`

`CONCURRENT`

`UNORDERED`

Marking `CONCURRENT` Incorrectly

What does `IDENTITY_FINISH` mean?