Posted in

How to Sort Thai Strings Correctly in PHP, JavaScript, and Python

Sorting Thai text in the correct alphabetical order can be a real adventure—especially when vowels, diacritics, and tone marks play hide and seek with the base consonants! I faced this challenge while building a sorted province dropdown for a project, and I want to share how PHP, JavaScript, and even Python can help you create a flawless Thai sorting experience. This article is perfect for high school and college students who love coding puzzles and want to learn something new!

The Thai Alphabet Sorting Challenge

Unlike Latin-based languages, Thai sorting is all about the base consonants (like “ก, ข, ค, ง…”) while vowels, diacritics, and tone marks take a backseat. For example, even though the word “เชียงใหม่” starts with a vowel sound, it should be sorted based on its underlying consonant “ช”. Many default sorting functions (like PHP’s sort() or JavaScript’s Array.prototype.sort()) simply compare Unicode code points and miss this nuance.

The expected sorted order for a sample list of Thai provinces should be:

  • กระบี่
  • ขอนแก่น
  • เชียงใหม่
  • เชียงราย
  • นครราชสีมา
  • แพร่
  • เพชรบุรี
  • เพชรบูรณ์
  • บุรีรัมย์
  • สุราษฎร์ธานี

This order reflects the proper Thai dictionary rules. Even though “เชียงใหม่” and “เชียงราย” both start with “เชียง”, their placement comes after “กระบี่” and “ขอนแก่น” because the sorting process prioritizes the base consonants. Essentially, when the Intl Collator (or equivalent libraries in Python) performs a locale-aware comparison, it ignores secondary differences such as vowels and diacritics unless the base consonants are identical.

What is Intl Collator?

The Intl Collator is a powerful tool available in PHP and JavaScript that lets you perform locale‑aware string comparisons. Rather than comparing raw Unicode code points, the collator follows language‑specific rules when comparing strings. For Thai, using a locale such as th_TH (or th-TH in JavaScript) ensures that strings are compared based on their underlying consonants first, with vowels and diacritics considered only as secondary differences. This results in a sorted list that reflects the natural dictionary order.

Sorting Thai Strings in PHP

PHP’s Intl extension provides the Collator class, which is ideal for sorting Thai strings. Here’s an example where province names are fetched from a database and then sorted using the Thai locale (th_TH):

<?php
require_once 'db.php'; // Database connection

header('Content-Type: application/json');

try {
    // Retrieve unsorted province names from the database
    $stmt = $conn->query("SELECT DISTINCT province FROM org_info WHERE province IS NOT NULL AND province != ''");
    $provinces = $stmt->fetchAll(PDO::FETCH_COLUMN);

    if ($provinces) {
        // Create a Collator for the Thai locale
        $collator = new Collator('th_TH');
        // Sort the array according to Thai dictionary rules
        $collator->sort($provinces);
        
        echo json_encode(["success" => true, "provinces" => $provinces], JSON_UNESCAPED_UNICODE);
    } else {
        echo json_encode(["success" => false, "error" => "No provinces found"], JSON_UNESCAPED_UNICODE);
    }
} catch (PDOException $e) {
    echo json_encode(["success" => false, "error" => $e->getMessage()]);
}
?>

PHP Example Explained

  • Data Fetching: Province names are retrieved unsorted from the database, ensuring that no default collation interferes.
  • Intl Collator: A Collator instance is created with the locale th_TH, and its sort() method arranges the strings according to Thai dictionary rules, prioritizing base consonants.
  • Result: You get a properly sorted list—ideal for a province dropdown—such as: กระบี่, ขอนแก่น, เชียงใหม่, เชียงราย, นครราชสีมา, แพร่, เพชรบุรี, เพชรบูรณ์, บุรีรัมย์, สุราษฎร์ธานี.

Sorting Thai Strings in JavaScript

JavaScript provides a similar solution using the Intl.Collator object. This is especially useful for client-side sorting of an array of Thai province names.

// Sample array of Thai province names
const provinces = [
  "เชียงใหม่",
  "เชียงราย",
  "แพร่",
  "เพชรบุรี",
  "เพชรบูรณ์",
  "กระบี่",
  "ขอนแก่น",
  "นครราชสีมา",
  "บุรีรัมย์",
  "สุราษฎร์ธานี"
];

// Create an Intl.Collator for Thai with base sensitivity (ignores diacritics)
const collator = new Intl.Collator('th-TH', { sensitivity: 'base' });

// Sort the array using the collator's compare function
provinces.sort(collator.compare);

console.log(provinces);
// Expected sorted order: 
// ["กระบี่", "ขอนแก่น", "เชียงใหม่", "เชียงราย", "นครราชสีมา", "แพร่", "เพชรบุรี", "เพชรบูรณ์", "บุรีรัมย์", "สุราษฎร์ธานี"]

JavaScript Example Explained

  • Intl.Collator: An Intl.Collator is created with the locale th-TH and the option { sensitivity: 'base' }, ensuring that sorting is based on base characters only.
  • Sorting: The array is sorted using the collator’s compare function, which handles the locale-aware comparison.
  • Result: The output correctly follows Thai dictionary order, yielding: กระบี่, ขอนแก่น, เชียงใหม่, เชียงราย, นครราชสีมา, แพร่, เพชรบุรี, เพชรบูรณ์, บุรีรัมย์, สุราษฎร์ธานี.

Sorting Thai Strings in Python

Python also offers robust tools for locale-aware sorting. Using the PyICU package, which wraps the ICU library, you can sort Thai strings correctly. Here’s an example:

import icu

# Sample list of Thai province names
provinces = [
    "เชียงใหม่",
    "เชียงราย",
    "แพร่",
    "เพชรบุรี",
    "เพชรบูรณ์",
    "กระบี่",
    "ขอนแก่น",
    "นครราชสีมา",
    "บุรีรัมย์",
    "สุราษฎร์ธานี"
]

# Create a Collator for the Thai locale
collator = icu.Collator.createInstance(icu.Locale('th_TH'))

# Sort using the collator's getSortKey function
provinces.sort(key=collator.getSortKey)

print(provinces)
# Expected sorted order:
# ['กระบี่', 'ขอนแก่น', 'เชียงใหม่', 'เชียงราย', 'นครราชสีมา', 'แพร่', 'เพชรบุรี', 'เพชรบูรณ์', 'บุรีรัมย์', 'สุราษฎร์ธานี']

Python Example Explained

  • PyICU: The icu package provides ICU’s locale-aware sorting functionality in Python.
  • Collator Instance: A Thai locale collator (th_TH) is created to perform the comparison.
  • Sorting: The list is sorted using the collator’s getSortKey function as the sort key, ensuring the proper Thai dictionary order.
  • Result: The sorted list appears as: [‘กระบี่’, ‘ขอนแก่น’, ‘เชียงใหม่’, ‘เชียงราย’, ‘นครราชสีมา’, ‘แพร่’, ‘เพชรบุรี’, ‘เพชรบูรณ์’, ‘บุรีรัมย์’, ‘สุราษฎร์ธานี’].

Conclusion

Sorting Thai strings correctly is not only about arranging text—it’s an adventure in understanding language-specific rules! When I was creating a sorted province dropdown for a project, I discovered that using PHP’s Collator, JavaScript’s Intl.Collator, or Python’s PyICU made all the difference. These tools help you ignore secondary marks and focus on the base consonants, resulting in the correct Thai dictionary order.

Happy coding, and สู้ๆ นะครับ!

#ThaiSorting #PHP #JavaScript #Python #CodingAdventure #Internationalization #ThaiAlphabet

Leave a Reply

Your email address will not be published. Required fields are marked *