Normalizer

he Normalizer class (from java.text in Java 5 and later java.text.Normalizer or java.text.Normalizer.Form) is used to transform Unicode text into a normalized form. This is crucial when comparing, storing, or searching strings that may be visually identical but have different internal Unicode representations (e.g., accented characters).

Key Features:

  • Ensures consistent Unicode representation
  • Supports multiple normalization forms (NFC, NFD, NFKC, NFKD)
  • Makes string comparison and storage reliable across languages and inputs

Commonly Used Methods

Normalization Forms(Normalization.Form)

Simple Program

import java.text.Normalizer;

public class SimpleNormalizerExample {
    public static void main(String[] args) {
        String accented = "é"; // U+00E9
        String decomposed = "e\u0301"; // U+0065 + U+0301

        System.out.println("Are the strings equal (==)? " + (accented.equals(decomposed)));
        String normalized = Normalizer.normalize(decomposed, Normalizer.Form.NFC);
        System.out.println("Are they equal after normalization? " + accented.equals(normalized));
    }
}
/*
Are the strings equal (==)? false
Are they equal after normalization? true
*/

Problem Statement

Paani and Mahesh are building a search engine for international names. Users might input names using keyboards that decompose characters (e + ´ instead of é). To ensure all names match correctly, the system uses Normalizer to unify representations before storing and searching.

import java.text.Normalizer;
import java.util.*;

public class NameSearchEngine {
    private static final List<String> database = Arrays.asList(
        Normalizer.normalize("José", Normalizer.Form.NFC),
        Normalizer.normalize("Renée", Normalizer.Form.NFC),
        Normalizer.normalize("Björk", Normalizer.Form.NFC)
    );

    public static void main(String[] args) {
        List<String> userInputs = Arrays.asList(
            "Jose\u0301",     // decomposed form of José
            "Rene\u0301e",    // decomposed form of Renée
            "Bj\u00f6rk"      // composed Björk
        );

        for (String input : userInputs) {
            String normalizedInput = Normalizer.normalize(input, Normalizer.Form.NFC);
            System.out.println("\nSearching for: " + input);
            boolean found = false;
            for (String name : database) {
                if (name.equals(normalizedInput)) {
                    System.out.println("Match found: " + name);
                    found = true;
                    break;
                }
            }
            if (!found) {
                System.out.println("No match found.");
            }
        }
    }
}
/*
Searching for: José
Match found: José

Searching for: Renée
Match found: Renée

Searching for: Björk
Match found: Björk
*/

The Normalizer class is essential for text normalization, especially in globalized applications:

  • Comparing strings from different sources (e.g., user input vs. DB)
  • Storing names, text, or identifiers that involve accents or diacritics
  • Building search engines, authentication systems, or data deduplication tools
Scroll to Top