Grouping Operations

In Java, grouping operations with the Stream API are powerful for organizing data into categories, typically using the Collectors.groupingBy collector. These operations allow you to group stream elements based on a classification function, similar to SQL’s GROUP BY. The result is often a Map where keys are categories and values are collections of elements (or aggregated results).

Important Concepts:

  • groupingBy: A collector that groups elements by a classifier function, producing a Map.
  • Classifier: A function that determines the key for each element (e.g., String::length).
  • Downstream Collectors: Optional collectors applied to grouped elements (e.g., toList(), counting()).
  • Multi-level Grouping: Nesting groupingBy for hierarchical grouping.
  • Partitioning: A special case of grouping with a boolean classifier, using partitioningBy.

Collectors.groupingBy Variants

The groupingBy collector has three overloads:

  1. groupingBy(classifier): Groups elements into a Map<K, List<T>>.
  2. groupingBy(classifier, downstream): Groups elements and applies a downstream collector to each group.
  3. groupingBy(classifier, mapFactory, downstream): Specifies a custom Map implementation (e.g., TreeMap).

Basic Grouping Examples

Group by a Property (Single-Level) Groups strings by their length, with each value being a List of strings:

List<String> names = List.of("Alice", "Bob", "Charlie", "David");
Map<Integer, List<String>> namesByLength = names.stream()
    .collect(Collectors.groupingBy(String::length));
// Result: {3=[Bob], 5=[Alice], 6=[David], 7=[Charlie]}Code language: JavaScript (javascript)
  Group with a Downstream Collector Count the number of strings per length instead of listing them:
Map<Integer, Long> countByLength = names.stream()
    .collect(Collectors.groupingBy(String::length, Collectors.counting()));
// Result: {3=1, 5=1, 6=1, 7=1}Code language: JavaScript (javascript)
 Custom Map Implementation Use a TreeMap to sort keys (lengths):
Map<Integer, List<String>> sortedNamesByLength = names.stream()
    .collect(Collectors.groupingBy(
        String::length,
        TreeMap::new,
        Collectors.toList()
    ));
// Result: {3=[Bob], 5=[Alice], 6=[David], 7=[Charlie]} (keys sorted)Code language: JavaScript (javascript)

Multi-Level Grouping

You can nest groupingBy for hierarchical grouping. For example, group by length, then by first letter:

List<String> names = List.of("Alice", "Bob", "Charlie", "David", "Amy");
Map<Integer, Map<Character, List<String>>> namesByLengthThenFirstLetter = names.stream()
    .collect(Collectors.groupingBy(
        String::length,
        Collectors.groupingBy(name -> name.charAt(0))
    ));
// Result: {3={B=[Bob]}, 5={A=[Alice, Amy]}, 7={C=[Charlie], D=[David]}}Code language: JavaScript (javascript)

Common Downstream Collectors

Downstream collectors transform the grouped elements. Popular ones include:

  • toList(): Collects elements into a List (default).
  • toSet(): Collects elements into a Set (removes duplicates).
  • counting(): Counts elements in each group.
  • mapping(): Transforms elements before collecting (e.g., extract a property).
  • reducing(): Reduces elements in each group (e.g., sum, min, max).
  • summarizingInt/Long/Double(): Computes statistics (e.g., sum, average, min, max).

Examples:

Extract a Property with mapping Group by length, collecting only the first letter of each name:

Map<Integer, List<Character>> firstLettersByLength = names.stream()
    .collect(Collectors.groupingBy(
        String::length,
        Collectors.mapping(name -> name.charAt(0), Collectors.toList())
    ));
// Result: {3=[B], 5=[A, A], 7=[C, D]}Code language: PHP (php)
 Sum with reducing Group by length, summing the lengths (redundant here, but illustrative):
Map<Integer, Integer> totalLengthByLength = names.stream()
    .collect(Collectors.groupingBy(
        String::length,
        Collectors.reducing(0, String::length, Integer::sum)
    ));
// Result: {3=3, 5=10, 7=14}Code language: PHP (php)

 Statistics with summarizingInt Group by first letter, computing length statistics:

Map<Character, IntSummaryStatistics> statsByFirstLetter = names.stream()
    .collect(Collectors.groupingBy(
        name -> name.charAt(0),
        Collectors.summarizingInt(String::length)
    ));
// Result: {A=IntSummaryStatistics{count=2, sum=10, min=5, average=5.0, max=5}, ...}Code language: JavaScript (javascript)

Partitioning (Special Case of Grouping)

Collectors.partitioningBy groups elements into two categories (true and false) based on a predicate. It’s like groupingBy with a boolean classifier.

Example:

List<String> names = List.of("Alice", "Bob", "Charlie");
Map<Boolean, List<String>> longNames = names.stream()
    .collect(Collectors.partitioningBy(name -> name.length() > 3));
// Result: {false=[Bob], true=[Alice, Charlie]}Code language: JavaScript (javascript)

With a downstream collector:

Map<Boolean, Long> countLongNames = names.stream()
    .collect(Collectors.partitioningBy(
        name -> name.length() > 3,
        Collectors.counting()
    ));
// Result: {false=1, true=2}Code language: JavaScript (javascript)

When to Use Grouping

  • When you need to categorize data based on one or more criteria.
  • For aggregating data (e.g., counting, summing, or computing statistics).
  • When building hierarchical or summarized views of data.

Grouping operations in Java Streams provide a robust and flexible way to partition and categorize data. By using Collectors.groupingBy(), you can easily group elements based on various criteria and apply downstream collectors for additional aggregation. This capability is essential for data analysis and transformation tasks, allowing for complex data manipulations with concise and readable code.

Scroll to Top