BreakIterator

The BreakIterator class (in java.text) is used to identify text boundaries such as words, sentences, lines, and characters. It is locale-sensitive and critical in text-processing applications like editors, search tools, and word counters.

Key Features:

  • Locale-aware breaking of text into Words,Sentences,Lines,Characters
  • Supports internationalized text segmentation
  • Abstract class with factory methods for different break types

Commonly Used Methods

Simple Program

import java.text.BreakIterator;
import java.util.Locale;

public class SimpleBreakIteratorExample {
    public static void main(String[] args) {
        String text = "LotusJavaPrince is teaching Java. They love it!";

        BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
        iterator.setText(text);

        int start = iterator.first();
        for (int end = iterator.next(); end != BreakIterator.DONE; start = end, end = iterator.next()) {
            System.out.println("Sentence: " + text.substring(start, end));
        }
    }
}
/*
Sentence: LotusJavaPrince is teaching Java.
Sentence: They love it!
*/

Problem Statement

Paani and Mahesh are creating an AI-based writing assistant. To provide suggestions and feedback, they need to analyze each word and sentence in a given article. The system should split the text into sentences and then into words, using BreakIterator for accurate boundary detection that works across different locales.

import java.text.BreakIterator;
import java.util.Locale;

public class TextAnalyzer {
    public static void main(String[] args) {
        String article = "Mahesh loves Java. LotusJavaPrince teaches him well. They build amazing programs.";

        // Sentence segmentation
        BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(Locale.ENGLISH);
        sentenceIterator.setText(article);
        
        int sentenceStart = sentenceIterator.first();
        for (int sentenceEnd = sentenceIterator.next(); sentenceEnd != BreakIterator.DONE; sentenceStart = sentenceEnd, sentenceEnd = sentenceIterator.next()) {
            String sentence = article.substring(sentenceStart, sentenceEnd);
            System.out.println("\nSentence: " + sentence.trim());

            // Word segmentation within the sentence
            BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.ENGLISH);
            wordIterator.setText(sentence);
            int wordStart = wordIterator.first();
            for (int wordEnd = wordIterator.next(); wordEnd != BreakIterator.DONE; wordStart = wordEnd, wordEnd = wordIterator.next()) {
                String word = sentence.substring(wordStart, wordEnd).trim();
                if (!word.isEmpty() && Character.isLetterOrDigit(word.charAt(0))) {
                    System.out.println("  Word: " + word);
                }
            }
        }
    }
}
/*
Sentence: Mahesh loves Java.
  Word: Mahesh
  Word: loves
  Word: Java

Sentence: LotusJavaPrince teaches him well.
  Word: LotusJavaPrince
  Word: teaches
  Word: him
  Word: well

Sentence: They build amazing programs.
  Word: They
  Word: build
  Word: amazing
  Word: programs
*/

The BreakIterator class is vital for text boundary analysis, especially for internationalized and language-aware applications.

  • Building editors, translators, summarizers, or NLP tools
  • Need locale-sensitive sentence or word segmentation
  • Handling user input, search indexing, or grammar checks
Scroll to Top