RuleBasedCollator

The RuleBasedCollator class, a subclass of Collator in java.text, allows custom string comparison rules. While Collator uses default locale-based rules, RuleBasedCollator gives you explicit control by defining your own rules.

This is useful when:

  • Locale rules are insufficient or undesired
  • You need to sort based on domain-specific conventions (e.g., chemical names, custom dictionaries)

Key Features:

  • User-defined sorting rules (e.g., “b” < “c” < “ch” < “d”)
  • Supports accent, case, and symbol handling
  • Overrides locale default sorting behavior

Commonly Used Methods

Simple Program

import java.text.RuleBasedCollator;
import java.text.Collator;

public class SimpleRuleBasedCollatorExample {
    public static void main(String[] args) throws Exception {
        String rule = "< a < b < c < ch < d < e";
        RuleBasedCollator collator = new RuleBasedCollator(rule);

        String[] words = { "charm", "cut", "camel" };
        java.util.Arrays.sort(words, collator);

        for (String word : words) {
            System.out.println(word);
        }
    }
}
/*
camel
charm
cut
*/

Problem Statement

Paani and Mahesh are building a Sanskrit Dictionary App. Sanskrit has unique sorting rules, such as:

  • “k” < “kh” < “g” < “gh” < “ṅ”
  • “c” < “ch” < “j” < “jh” < “ñ”

Default locale-based sorting doesn’t apply. They decide to use RuleBasedCollator to enforce custom Sanskrit collation.

import java.text.RuleBasedCollator;
import java.util.Arrays;

public class SanskritDictionarySorter {
    public static void main(String[] args) throws Exception {
        String rules = 
            "< a < ā < i < ī < u < ū < ṛ < ṝ < ḷ < ḹ"
          + "< k < kh < g < gh < ṅ"
          + "< c < ch < j < jh < ñ"
          + "< ṭ < ṭh < ḍ < ḍh < ṇ"
          + "< t < th < d < dh < n"
          + "< p < ph < b < bh < m"
          + "< y < r < l < v < ś < ṣ < s < h";

        RuleBasedCollator sanskritCollator = new RuleBasedCollator(rules);

        String[] words = {
            "ghosha", "khaga", "gita", "katha", "ñana", "chitra", "bhakti", "karma"
        };

        Arrays.sort(words, sanskritCollator);

        System.out.println("Sanskrit Dictionary Sorted Words:");
        for (String word : words) {
            System.out.println(word);
        }
    }
}
/*
Sanskrit Dictionary Sorted Words:
karma
katha
khaga
gita
ghosha
chitra
ñana
bhakti
*/

The RuleBasedCollator is ideal when default locale-based sorting doesn’t meet the needs of your domain:

  • Sorting is based on custom alphabet orders
  • You’re dealing with non-Latin scripts (like Sanskrit, Devanagari, etc.)
  • Locale-based sorting (Collator) isn’t accurate enough
Scroll to Top