Introduction

One of the most impressive capabilities introduced in Java 8 was the ability to process collections in parallel with almost no changes to application code.

Changing this:

employees.stream()

to this:

employees.parallelStream()

employees.stream().parallel()

appears almost magical.

Suddenly, multiple CPU cores begin processing data simultaneously.

But what actually happens?

How does Java decide which thread processes which elements?

How are millions of records divided between CPU cores?

How are the partial results merged together?

The answer lies in one of the least discussed but most important interfaces in the Stream API:

Spliterator

Without Spliterator, there would be no Parallel Streams.

Without Parallel Streams, modern Java would struggle to utilize today’s multi-core processors efficiently.

In this article, we’ll explore how Parallel Streams work internally, understand the role of Spliterator, examine the ForkJoin framework, and learn when parallel execution improves performance—and when it doesn’t.

Learning Objectives

By the end of this article, you will be able to:

Understand the difference between sequential and parallel Streams.
Learn what a Spliterator is.
Understand how Streams divide work.
Learn how the ForkJoinPool executes Stream operations.
Understand work stealing.
Recognize when Parallel Streams improve performance.
Avoid common mistakes with Parallel Streams.

The Evolution of CPU Hardware

For many years, processor performance improved by increasing clock speed.

2000

1 GHz

↓

2002

2 GHz

↓

2004

3 GHz

Eventually, physical and thermal limitations prevented further increases.

Hardware manufacturers adopted a different strategy:

CPU

↓

Core 1

Core 2

Core 3

Core 4

...

Core 32

Instead of making a single core faster, processors gained more cores.

To benefit from this hardware, software needed to execute work concurrently.

Sequential Streams

A normal Stream processes elements one after another.

employees.stream()
         .map(Employee::getName)
         .toList();

Execution:

Employee 1

↓

Employee 2

↓

Employee 3

↓

Employee 4

Only one thread performs the work.

This is often the right choice for:

Small collections.
Lightweight operations.
Operations that depend on encounter order.
Code that interacts with shared mutable state.

Parallel Streams

A Parallel Stream divides the workload across multiple threads.

employees.parallelStream()
         .map(Employee::getName)
         .toList();

Execution:

Employee Collection

        │

────────┼────────

│       │       │

▼       ▼       ▼

Thread A

Thread B

Thread C

        │

────────┼────────

        ▼

Merged Result

Notice that your code does not explicitly create or manage threads.

The Stream framework handles the complexity.

What Is a Spliterator?

The name Spliterator combines two words:

Split
Iterator

A traditional Iterator only moves forward.

Employee 1

↓

Employee 2

↓

Employee 3

A Spliterator can do something an Iterator cannot:

It can divide its work into smaller pieces.

How Spliterator Works

Suppose we have:

16 Employees

Initial Spliterator:

1 - 16

First split:

1 - 8

9 - 16

Second split:

Each split can now be processed independently by different worker threads.

Spliterator Characteristics

Every Spliterator exposes characteristics that help the Stream framework optimize execution.

Some of the most important are:

`ORDERED`

Elements have a defined encounter order.

Examples:

List
LinkedList

`DISTINCT`

No duplicate elements.

Example:

Set

`SORTED`

Elements are already sorted.

`SIZED`

The exact number of elements is known.

Example:

ArrayList

`IMMUTABLE`

The underlying data source cannot change during traversal.

`CONCURRENT`

The data source supports concurrent modification.

Example:

ConcurrentHashMap

These characteristics allow the Stream framework to make optimization decisions.

The ForkJoinPool

Parallel Streams do not create a new thread for every element.

Instead, they use the common ForkJoinPool.

Conceptually:

ForkJoinPool

↓

Worker 1

Worker 2

Worker 3

Worker 4

Each worker receives a portion of the data from the Spliterator.

Fork Phase

The work is recursively divided.

Employees

↓

Split

↓

Split Again

↓

Split Again

This continues until the tasks become small enough to process efficiently.

Join Phase

After processing:

Result A

Result B

Result C

↓

Merge

↓

Final Result

The Stream framework combines the partial results using collector combiners or reduction logic.

Work Stealing

One of the most powerful features of the ForkJoinPool is work stealing.

Suppose:

Worker 1

Finished

Worker 2

Still Busy

Instead of remaining idle, Worker 1 steals unfinished work from Worker 2.

This keeps CPU utilization high and improves throughput.

Enterprise Example

Suppose a reporting service calculates yearly statistics for 20 million transactions.

Sequential processing:

20 Million

↓

One Thread

Parallel processing:

20 Million

↓

8 Worker Threads

↓

Merged Report

For CPU-intensive calculations, this can significantly reduce execution time.

When Parallel Streams Work Well

Parallel Streams are generally beneficial when:

Processing large collections.
Operations are CPU-intensive.
Each element can be processed independently.
The workload is balanced.
The cost of splitting and merging is small relative to the work performed.

Examples include:

Financial calculations
Image processing
Scientific computations
Large-scale reporting
Data analytics

When Parallel Streams Are a Poor Choice

Parallel Streams can perform worse when:

Collections are very small.
Tasks perform blocking I/O (database calls, REST calls, file operations).
The workload per element is trivial.
Operations rely heavily on encounter order.
The pipeline mutates shared state.

In these situations, the overhead of parallelization may outweigh any benefit.

Spring Boot Considerations

Avoid using Parallel Streams inside request-processing code simply to “speed things up.”

For example:

orders.parallelStream()
      .map(this::loadCustomerFromDatabase)
      .toList();

Each element performs a database call.

This is an I/O-bound workload, not a CPU-bound one.

Using Parallel Streams here can:

Increase database contention.
Exhaust connection pools.
Reduce throughput under load.

For asynchronous I/O operations, technologies such as CompletableFuture, structured concurrency (Java 21), or reactive programming may be more appropriate depending on the use case.

Migration from Java 7

Java 7

ExecutorService executor =
        Executors.newFixedThreadPool(8);

// Submit tasks manually
// Wait for completion
// Merge results
// Handle exceptions
// Shutdown executor

Java 8+

employees.parallelStream()
         .map(Employee::calculateBonus)
         .toList();

The Stream framework manages task partitioning, scheduling, and result merging automatically.

Performance Considerations

Measure before introducing Parallel Streams.
Avoid blocking operations inside parallel pipelines.
Prefer immutable data structures.
Ensure collector combiners are efficient.
Understand that parallel execution introduces overhead for splitting, scheduling, and merging work.

Common Mistakes

Assuming Parallel Is Always Faster

It isn’t.

Parallel execution introduces coordination overhead.

Always benchmark representative workloads.

Using Shared Mutable Objects

Bad example:

List<String> names = new ArrayList<>();

employees.parallelStream()
         .forEach(employee -> names.add(employee.getName()));

This is not thread-safe.

Instead, collect the results:

List<String> names = employees.parallelStream()
        .map(Employee::getName)
        .toList();

Parallelizing Database Calls

Parallel Streams do not magically improve database throughput.

Treat database access as an I/O-bound problem, not a CPU-bound one.

Best Practices

Use Parallel Streams for CPU-bound work on sufficiently large datasets.
Keep Stream operations stateless and free of side effects.
Avoid blocking operations inside parallel pipelines.
Benchmark using representative data volumes.
Prefer readability over premature optimization.

Interview Questions

What is a Spliterator?

A Spliterator is an interface that traverses and can partition elements from a data source, enabling efficient parallel processing.

Why do Parallel Streams use a ForkJoinPool?

The ForkJoinPool provides efficient task scheduling, recursive task splitting, and work stealing for CPU-intensive workloads.

What is work stealing?

Idle worker threads take unfinished tasks from busy workers, improving CPU utilization and load balancing.

Are Parallel Streams always faster?

No. They are most effective for large, CPU-bound workloads where the benefits of parallel execution outweigh the overhead.

Should Parallel Streams be used for REST calls or database access?

Generally no. Those operations are I/O-bound and are usually better handled using asynchronous or reactive approaches rather than Parallel Streams.

Hands-On Exercise

Build a Spring Boot application that:

Calculates yearly account interest for one million in-memory accounts using a sequential Stream.
Calculates the same result using a parallel Stream.
Measures execution time with System.nanoTime() or, preferably, a benchmarking framework such as JMH.
Compare throughput, CPU utilization, and code readability.
Explain why the results differ.

Summary

Parallel Streams make it remarkably easy to leverage modern multi-core processors, but the simplicity of the API hides a sophisticated execution model built on Spliterators and the ForkJoinPool.

Understanding how Streams partition work, distribute tasks, and merge results is essential for making informed performance decisions. Parallel execution is a powerful tool, but like any optimization, it should be applied deliberately and validated with measurement.

In the next article, we’ll go even deeper by implementing our own Spliterator, giving us a firsthand understanding of how the JDK partitions work for parallel processing.

Coming Up Next

Part 13 – Writing Your Own Spliterator: Building a Parallel Processing Engine

We’ll implement a custom Spliterator, explore the tryAdvance() and trySplit() methods, understand how the Stream framework partitions data, and build a production-ready example for processing large CSV files in parallel.

Introduction

Learning Objectives

The Evolution of CPU Hardware

Sequential Streams

Parallel Streams

What Is a Spliterator?

How Spliterator Works

Spliterator Characteristics

ORDERED

DISTINCT

SORTED

SIZED

IMMUTABLE

CONCURRENT

The ForkJoinPool

Fork Phase

Join Phase

Work Stealing

Enterprise Example

When Parallel Streams Work Well

When Parallel Streams Are a Poor Choice

Spring Boot Considerations

Migration from Java 7

Java 7

Java 8+

Performance Considerations

Common Mistakes

Assuming Parallel Is Always Faster

Using Shared Mutable Objects

Parallelizing Database Calls

Best Practices

Interview Questions

What is a Spliterator?

Why do Parallel Streams use a ForkJoinPool?

What is work stealing?

Are Parallel Streams always faster?

Should Parallel Streams be used for REST calls or database access?

Hands-On Exercise

Summary

Coming Up Next

Leave a Reply Cancel reply

Adapter Design Pattern in Java: Making Incompatible Interfaces Work Together

Part 38: Java 18, 19 & 20 – The Journey to Java 21 – Why These Releases Matter More Than You Think

Facade Design Pattern in Java: Simplifying Complex Systems Behind a Single Interface

Part 17: Spring Boot, Hibernate, Jackson & REST – End-to-End Date & Time Handling in Enterprise Applications

`ORDERED`

`DISTINCT`

`SORTED`

`SIZED`

`IMMUTABLE`

`CONCURRENT`