Part 12: Parallel Streams and Spliterator – Understanding How the Stream API Executes Work

Introduction

One of the most impressive capabilities introduced in Java 8 was the ability to process collections in parallel with almost no changes to application code.

Changing this:

employees.stream()

to this:

employees.parallelStream()

or

employees.stream().parallel()

appears almost magical.

Suddenly, multiple CPU cores begin processing data simultaneously.

But what actually happens?

How does Java decide which thread processes which elements?

How are millions of records divided between CPU cores?

How are the partial results merged together?

The answer lies in one of the least discussed but most important interfaces in the Stream API:

Spliterator

Without Spliterator, there would be no Parallel Streams.

Without Parallel Streams, modern Java would struggle to utilize today’s multi-core processors efficiently.

In this article, we’ll explore how Parallel Streams work internally, understand the role of Spliterator, examine the ForkJoin framework, and learn when parallel execution improves performance—and when it doesn’t.


Learning Objectives

By the end of this article, you will be able to:

  • Understand the difference between sequential and parallel Streams.
  • Learn what a Spliterator is.
  • Understand how Streams divide work.
  • Learn how the ForkJoinPool executes Stream operations.
  • Understand work stealing.
  • Recognize when Parallel Streams improve performance.
  • Avoid common mistakes with Parallel Streams.

The Evolution of CPU Hardware

For many years, processor performance improved by increasing clock speed.

2000

1 GHz

↓

2002

2 GHz

↓

2004

3 GHz

Eventually, physical and thermal limitations prevented further increases.

Hardware manufacturers adopted a different strategy:

CPU

↓

Core 1

Core 2

Core 3

Core 4

...

Core 32

Instead of making a single core faster, processors gained more cores.

To benefit from this hardware, software needed to execute work concurrently.


Sequential Streams

A normal Stream processes elements one after another.

employees.stream()
         .map(Employee::getName)
         .toList();

Execution:

Employee 1

↓

Employee 2

↓

Employee 3

↓

Employee 4

Only one thread performs the work.

This is often the right choice for:

  • Small collections.
  • Lightweight operations.
  • Operations that depend on encounter order.
  • Code that interacts with shared mutable state.

Parallel Streams

A Parallel Stream divides the workload across multiple threads.

employees.parallelStream()
         .map(Employee::getName)
         .toList();

Execution:

Employee Collection

        │

────────┼────────

│       │       │

▼       ▼       ▼

Thread A

Thread B

Thread C

        │

────────┼────────

        ▼

Merged Result

Notice that your code does not explicitly create or manage threads.

The Stream framework handles the complexity.


What Is a Spliterator?

The name Spliterator combines two words:

  • Split
  • Iterator

A traditional Iterator only moves forward.

Employee 1

↓

Employee 2

↓

Employee 3

A Spliterator can do something an Iterator cannot:

It can divide its work into smaller pieces.


How Spliterator Works

Suppose we have:

16 Employees

Initial Spliterator:

1 - 16

First split:

1 - 8

9 - 16

Second split:

1-4

5-8

9-12

13-16

Each split can now be processed independently by different worker threads.


Spliterator Characteristics

Every Spliterator exposes characteristics that help the Stream framework optimize execution.

Some of the most important are:

ORDERED

Elements have a defined encounter order.

Examples:

  • List
  • LinkedList

DISTINCT

No duplicate elements.

Example:

  • Set

SORTED

Elements are already sorted.


SIZED

The exact number of elements is known.

Example:

ArrayList

IMMUTABLE

The underlying data source cannot change during traversal.


CONCURRENT

The data source supports concurrent modification.

Example:

ConcurrentHashMap

These characteristics allow the Stream framework to make optimization decisions.


The ForkJoinPool

Parallel Streams do not create a new thread for every element.

Instead, they use the common ForkJoinPool.

Conceptually:

ForkJoinPool

↓

Worker 1

Worker 2

Worker 3

Worker 4

Each worker receives a portion of the data from the Spliterator.


Fork Phase

The work is recursively divided.

Employees

↓

Split

↓

Split Again

↓

Split Again

This continues until the tasks become small enough to process efficiently.


Join Phase

After processing:

Result A

Result B

Result C

↓

Merge

↓

Final Result

The Stream framework combines the partial results using collector combiners or reduction logic.


Work Stealing

One of the most powerful features of the ForkJoinPool is work stealing.

Suppose:

Worker 1

Finished
Worker 2

Still Busy

Instead of remaining idle, Worker 1 steals unfinished work from Worker 2.

This keeps CPU utilization high and improves throughput.


Enterprise Example

Suppose a reporting service calculates yearly statistics for 20 million transactions.

Sequential processing:

20 Million

↓

One Thread

Parallel processing:

20 Million

↓

8 Worker Threads

↓

Merged Report

For CPU-intensive calculations, this can significantly reduce execution time.


When Parallel Streams Work Well

Parallel Streams are generally beneficial when:

  • Processing large collections.
  • Operations are CPU-intensive.
  • Each element can be processed independently.
  • The workload is balanced.
  • The cost of splitting and merging is small relative to the work performed.

Examples include:

  • Financial calculations
  • Image processing
  • Scientific computations
  • Large-scale reporting
  • Data analytics

When Parallel Streams Are a Poor Choice

Parallel Streams can perform worse when:

  • Collections are very small.
  • Tasks perform blocking I/O (database calls, REST calls, file operations).
  • The workload per element is trivial.
  • Operations rely heavily on encounter order.
  • The pipeline mutates shared state.

In these situations, the overhead of parallelization may outweigh any benefit.


Spring Boot Considerations

Avoid using Parallel Streams inside request-processing code simply to “speed things up.”

For example:

orders.parallelStream()
      .map(this::loadCustomerFromDatabase)
      .toList();

Each element performs a database call.

This is an I/O-bound workload, not a CPU-bound one.

Using Parallel Streams here can:

  • Increase database contention.
  • Exhaust connection pools.
  • Reduce throughput under load.

For asynchronous I/O operations, technologies such as CompletableFuture, structured concurrency (Java 21), or reactive programming may be more appropriate depending on the use case.


Migration from Java 7

Java 7

ExecutorService executor =
        Executors.newFixedThreadPool(8);

// Submit tasks manually
// Wait for completion
// Merge results
// Handle exceptions
// Shutdown executor

Java 8+

employees.parallelStream()
         .map(Employee::calculateBonus)
         .toList();

The Stream framework manages task partitioning, scheduling, and result merging automatically.


Performance Considerations

  • Measure before introducing Parallel Streams.
  • Avoid blocking operations inside parallel pipelines.
  • Prefer immutable data structures.
  • Ensure collector combiners are efficient.
  • Understand that parallel execution introduces overhead for splitting, scheduling, and merging work.

Common Mistakes

Assuming Parallel Is Always Faster

It isn’t.

Parallel execution introduces coordination overhead.

Always benchmark representative workloads.


Using Shared Mutable Objects

Bad example:

List<String> names = new ArrayList<>();

employees.parallelStream()
         .forEach(employee -> names.add(employee.getName()));

This is not thread-safe.

Instead, collect the results:

List<String> names = employees.parallelStream()
        .map(Employee::getName)
        .toList();

Parallelizing Database Calls

Parallel Streams do not magically improve database throughput.

Treat database access as an I/O-bound problem, not a CPU-bound one.


Best Practices

  • Use Parallel Streams for CPU-bound work on sufficiently large datasets.
  • Keep Stream operations stateless and free of side effects.
  • Avoid blocking operations inside parallel pipelines.
  • Benchmark using representative data volumes.
  • Prefer readability over premature optimization.

Interview Questions

What is a Spliterator?

A Spliterator is an interface that traverses and can partition elements from a data source, enabling efficient parallel processing.


Why do Parallel Streams use a ForkJoinPool?

The ForkJoinPool provides efficient task scheduling, recursive task splitting, and work stealing for CPU-intensive workloads.


What is work stealing?

Idle worker threads take unfinished tasks from busy workers, improving CPU utilization and load balancing.


Are Parallel Streams always faster?

No. They are most effective for large, CPU-bound workloads where the benefits of parallel execution outweigh the overhead.


Should Parallel Streams be used for REST calls or database access?

Generally no. Those operations are I/O-bound and are usually better handled using asynchronous or reactive approaches rather than Parallel Streams.


Hands-On Exercise

Build a Spring Boot application that:

  1. Calculates yearly account interest for one million in-memory accounts using a sequential Stream.
  2. Calculates the same result using a parallel Stream.
  3. Measures execution time with System.nanoTime() or, preferably, a benchmarking framework such as JMH.
  4. Compare throughput, CPU utilization, and code readability.
  5. Explain why the results differ.

Summary

Parallel Streams make it remarkably easy to leverage modern multi-core processors, but the simplicity of the API hides a sophisticated execution model built on Spliterators and the ForkJoinPool.

Understanding how Streams partition work, distribute tasks, and merge results is essential for making informed performance decisions. Parallel execution is a powerful tool, but like any optimization, it should be applied deliberately and validated with measurement.

In the next article, we’ll go even deeper by implementing our own Spliterator, giving us a firsthand understanding of how the JDK partitions work for parallel processing.


Coming Up Next

Part 13 – Writing Your Own Spliterator: Building a Parallel Processing Engine

We’ll implement a custom Spliterator, explore the tryAdvance() and trySplit() methods, understand how the Stream framework partitions data, and build a production-ready example for processing large CSV files in parallel.

Leave a Reply

Your email address will not be published. Required fields are marked *