Normalizer

he Normalizer class (from java.text in Java 5 and later java.text.Normalizer or java.text.Normalizer.Form) is used to transform Unicode text into a normalized form. This is crucial when comparing, storing, or searching strings that may be visually identical but have different internal Unicode representations (e.g., accented characters).

Key Features:

Ensures consistent Unicode representation
Supports multiple normalization forms (NFC, NFD, NFKC, NFKD)
Makes string comparison and storage reliable across languages and inputs

Commonly Used Methods

Normalization Forms(Normalization.Form)

Simple Program

import java.text.Normalizer;

public class SimpleNormalizerExample {
    public static void main(String[] args) {
        String accented = "é"; // U+00E9
        String decomposed = "e\u0301"; // U+0065 + U+0301

        System.out.println("Are the strings equal (==)? " + (accented.equals(decomposed)));
        String normalized = Normalizer.normalize(decomposed, Normalizer.Form.NFC);
        System.out.println("Are they equal after normalization? " + accented.equals(normalized));
    }
}
/*
Are the strings equal (==)? false
Are they equal after normalization? true
*/

Problem Statement

Paani and Mahesh are building a search engine for international names. Users might input names using keyboards that decompose characters (e + ´ instead of é). To ensure all names match correctly, the system uses Normalizer to unify representations before storing and searching.

import java.text.Normalizer;
import java.util.*;

public class NameSearchEngine {
    private static final List<String> database = Arrays.asList(
        Normalizer.normalize("José", Normalizer.Form.NFC),
        Normalizer.normalize("Renée", Normalizer.Form.NFC),
        Normalizer.normalize("Björk", Normalizer.Form.NFC)
    );

    public static void main(String[] args) {
        List<String> userInputs = Arrays.asList(
            "Jose\u0301",     // decomposed form of José
            "Rene\u0301e",    // decomposed form of Renée
            "Bj\u00f6rk"      // composed Björk
        );

        for (String input : userInputs) {
            String normalizedInput = Normalizer.normalize(input, Normalizer.Form.NFC);
            System.out.println("\nSearching for: " + input);
            boolean found = false;
            for (String name : database) {
                if (name.equals(normalizedInput)) {
                    System.out.println("Match found: " + name);
                    found = true;
                    break;
                }
            }
            if (!found) {
                System.out.println("No match found.");
            }
        }
    }
}
/*
Searching for: José
Match found: José

Searching for: Renée
Match found: Renée

Searching for: Björk
Match found: Björk
*/

The Normalizer class is essential for text normalization, especially in globalized applications:

Comparing strings from different sources (e.g., user input vs. DB)
Storing names, text, or identifiers that involve accents or diacritics
Building search engines, authentication systems, or data deduplication tools

java.text package

Text Formatting and Display

Collation and Sorting

Text Normalization and Transformation

Text Boundary Detection

Bidirectional Text Support

Parsing and Formatting Custom Data

Normalizer

Key Features:

Commonly Used Methods

Normalization Forms(Normalization.Form)

Simple Program

Problem Statement

java.text package

Text Formatting and Display

Collation and Sorting

Text Normalization and Transformation

Text Boundary Detection

Bidirectional Text Support

Parsing and Formatting Custom Data

Key Features:

Commonly Used Methods

Normalization Forms(Normalization.Form)

Simple Program

Problem Statement

Related Posts