Introduction
One of the most impressive capabilities introduced in Java 8 was the ability to process collections in parallel with almost no changes to application code.
Changing this:
employees.stream()
to this:
employees.parallelStream()
or
employees.stream().parallel()
appears almost magical.
Suddenly, multiple CPU cores begin processing data simultaneously.
But what actually happens?
How does Java decide which thread processes which elements?
How are millions of records divided between CPU cores?
How are the partial results merged together?
The answer lies in one of the least discussed but most important interfaces in the Stream API:
Spliterator
Without Spliterator, there would be no Parallel Streams.
Without Parallel Streams, modern Java would struggle to utilize today’s multi-core processors efficiently.
In this article, we’ll explore how Parallel Streams work internally, understand the role of Spliterator, examine the ForkJoin framework, and learn when parallel execution improves performance—and when it doesn’t.
Learning Objectives
By the end of this article, you will be able to:
- Understand the difference between sequential and parallel Streams.
- Learn what a Spliterator is.
- Understand how Streams divide work.
- Learn how the ForkJoinPool executes Stream operations.
- Understand work stealing.
- Recognize when Parallel Streams improve performance.
- Avoid common mistakes with Parallel Streams.
The Evolution of CPU Hardware
For many years, processor performance improved by increasing clock speed.
2000
1 GHz
↓
2002
2 GHz
↓
2004
3 GHz
Eventually, physical and thermal limitations prevented further increases.
Hardware manufacturers adopted a different strategy:
CPU
↓
Core 1
Core 2
Core 3
Core 4
...
Core 32
Instead of making a single core faster, processors gained more cores.
To benefit from this hardware, software needed to execute work concurrently.
Sequential Streams
A normal Stream processes elements one after another.
employees.stream()
.map(Employee::getName)
.toList();
Execution:
Employee 1
↓
Employee 2
↓
Employee 3
↓
Employee 4
Only one thread performs the work.
This is often the right choice for:
- Small collections.
- Lightweight operations.
- Operations that depend on encounter order.
- Code that interacts with shared mutable state.
Parallel Streams
A Parallel Stream divides the workload across multiple threads.
employees.parallelStream()
.map(Employee::getName)
.toList();
Execution:
Employee Collection
│
────────┼────────
│ │ │
▼ ▼ ▼
Thread A
Thread B
Thread C
│
────────┼────────
▼
Merged Result
Notice that your code does not explicitly create or manage threads.
The Stream framework handles the complexity.
What Is a Spliterator?
The name Spliterator combines two words:
- Split
- Iterator
A traditional Iterator only moves forward.
Employee 1
↓
Employee 2
↓
Employee 3
A Spliterator can do something an Iterator cannot:
It can divide its work into smaller pieces.
How Spliterator Works
Suppose we have:
16 Employees
Initial Spliterator:
1 - 16
First split:
1 - 8
9 - 16
Second split:
1-4
5-8
9-12
13-16
Each split can now be processed independently by different worker threads.
Spliterator Characteristics
Every Spliterator exposes characteristics that help the Stream framework optimize execution.
Some of the most important are:
ORDERED
Elements have a defined encounter order.
Examples:
ListLinkedList
DISTINCT
No duplicate elements.
Example:
Set
SORTED
Elements are already sorted.
SIZED
The exact number of elements is known.
Example:
ArrayList
IMMUTABLE
The underlying data source cannot change during traversal.
CONCURRENT
The data source supports concurrent modification.
Example:
ConcurrentHashMap
These characteristics allow the Stream framework to make optimization decisions.
The ForkJoinPool
Parallel Streams do not create a new thread for every element.
Instead, they use the common ForkJoinPool.
Conceptually:
ForkJoinPool
↓
Worker 1
Worker 2
Worker 3
Worker 4
Each worker receives a portion of the data from the Spliterator.
Fork Phase
The work is recursively divided.
Employees
↓
Split
↓
Split Again
↓
Split Again
This continues until the tasks become small enough to process efficiently.
Join Phase
After processing:
Result A
Result B
Result C
↓
Merge
↓
Final Result
The Stream framework combines the partial results using collector combiners or reduction logic.
Work Stealing
One of the most powerful features of the ForkJoinPool is work stealing.
Suppose:
Worker 1
Finished
Worker 2
Still Busy
Instead of remaining idle, Worker 1 steals unfinished work from Worker 2.
This keeps CPU utilization high and improves throughput.
Enterprise Example
Suppose a reporting service calculates yearly statistics for 20 million transactions.
Sequential processing:
20 Million
↓
One Thread
Parallel processing:
20 Million
↓
8 Worker Threads
↓
Merged Report
For CPU-intensive calculations, this can significantly reduce execution time.
When Parallel Streams Work Well
Parallel Streams are generally beneficial when:
- Processing large collections.
- Operations are CPU-intensive.
- Each element can be processed independently.
- The workload is balanced.
- The cost of splitting and merging is small relative to the work performed.
Examples include:
- Financial calculations
- Image processing
- Scientific computations
- Large-scale reporting
- Data analytics
When Parallel Streams Are a Poor Choice
Parallel Streams can perform worse when:
- Collections are very small.
- Tasks perform blocking I/O (database calls, REST calls, file operations).
- The workload per element is trivial.
- Operations rely heavily on encounter order.
- The pipeline mutates shared state.
In these situations, the overhead of parallelization may outweigh any benefit.
Spring Boot Considerations
Avoid using Parallel Streams inside request-processing code simply to “speed things up.”
For example:
orders.parallelStream()
.map(this::loadCustomerFromDatabase)
.toList();
Each element performs a database call.
This is an I/O-bound workload, not a CPU-bound one.
Using Parallel Streams here can:
- Increase database contention.
- Exhaust connection pools.
- Reduce throughput under load.
For asynchronous I/O operations, technologies such as CompletableFuture, structured concurrency (Java 21), or reactive programming may be more appropriate depending on the use case.
Migration from Java 7
Java 7
ExecutorService executor =
Executors.newFixedThreadPool(8);
// Submit tasks manually
// Wait for completion
// Merge results
// Handle exceptions
// Shutdown executor
Java 8+
employees.parallelStream()
.map(Employee::calculateBonus)
.toList();
The Stream framework manages task partitioning, scheduling, and result merging automatically.
Performance Considerations
- Measure before introducing Parallel Streams.
- Avoid blocking operations inside parallel pipelines.
- Prefer immutable data structures.
- Ensure collector combiners are efficient.
- Understand that parallel execution introduces overhead for splitting, scheduling, and merging work.
Common Mistakes
Assuming Parallel Is Always Faster
It isn’t.
Parallel execution introduces coordination overhead.
Always benchmark representative workloads.
Using Shared Mutable Objects
Bad example:
List<String> names = new ArrayList<>();
employees.parallelStream()
.forEach(employee -> names.add(employee.getName()));
This is not thread-safe.
Instead, collect the results:
List<String> names = employees.parallelStream()
.map(Employee::getName)
.toList();
Parallelizing Database Calls
Parallel Streams do not magically improve database throughput.
Treat database access as an I/O-bound problem, not a CPU-bound one.
Best Practices
- Use Parallel Streams for CPU-bound work on sufficiently large datasets.
- Keep Stream operations stateless and free of side effects.
- Avoid blocking operations inside parallel pipelines.
- Benchmark using representative data volumes.
- Prefer readability over premature optimization.
Interview Questions
What is a Spliterator?
A Spliterator is an interface that traverses and can partition elements from a data source, enabling efficient parallel processing.
Why do Parallel Streams use a ForkJoinPool?
The ForkJoinPool provides efficient task scheduling, recursive task splitting, and work stealing for CPU-intensive workloads.
What is work stealing?
Idle worker threads take unfinished tasks from busy workers, improving CPU utilization and load balancing.
Are Parallel Streams always faster?
No. They are most effective for large, CPU-bound workloads where the benefits of parallel execution outweigh the overhead.
Should Parallel Streams be used for REST calls or database access?
Generally no. Those operations are I/O-bound and are usually better handled using asynchronous or reactive approaches rather than Parallel Streams.
Hands-On Exercise
Build a Spring Boot application that:
- Calculates yearly account interest for one million in-memory accounts using a sequential Stream.
- Calculates the same result using a parallel Stream.
- Measures execution time with
System.nanoTime()or, preferably, a benchmarking framework such as JMH. - Compare throughput, CPU utilization, and code readability.
- Explain why the results differ.
Summary
Parallel Streams make it remarkably easy to leverage modern multi-core processors, but the simplicity of the API hides a sophisticated execution model built on Spliterators and the ForkJoinPool.
Understanding how Streams partition work, distribute tasks, and merge results is essential for making informed performance decisions. Parallel execution is a powerful tool, but like any optimization, it should be applied deliberately and validated with measurement.
In the next article, we’ll go even deeper by implementing our own Spliterator, giving us a firsthand understanding of how the JDK partitions work for parallel processing.
Coming Up Next
Part 13 – Writing Your Own Spliterator: Building a Parallel Processing Engine
We’ll implement a custom Spliterator, explore the tryAdvance() and trySplit() methods, understand how the Stream framework partitions data, and build a production-ready example for processing large CSV files in parallel.