How to Sort Strings Locale-Aware in Go with collate

When byte order breaks your list

You have a slice of customer names. You call sort.Strings. The output looks wrong. "Ångström" jumps to the end of the list. "café" sits after "zoo". The user complains the list is broken.

Computers sort by bytes. Humans sort by language. sort.Strings compares Unicode code points. Code points are numeric values assigned by the Unicode standard. They do not follow alphabetical rules for any language. The character "Å" has a code point far higher than "Z". In English, "Å" belongs near "A". In Swedish, "Å" belongs at the end of the alphabet. Byte sorting cannot know this. You need a collator.

The golang.org/x/text/collate package implements the Unicode Collation Algorithm. It assigns weights to characters based on locale rules. The collator knows that "Å" is a variant of "A" in English, and that "é" should sort like "e" in French. It handles case sensitivity, diacritics, and numeric strings according to the conventions of the target language.

How locale-aware sorting works

Locale-aware sorting relies on a collator. A collator is a configuration object that holds the rules for a specific language. You create a collator once, then use it to compare strings. The collator does not modify the strings. It reads them and returns an order based on the weights defined by the Unicode standard and the locale.

Think of the collator as a dictionary editor. When you ask the editor to compare two words, the editor looks up each character, applies the rules for the language, and decides which word comes first. The editor knows that in Spanish, "ch" used to be a distinct letter, and that in Turkish, "I" and "ı" are different characters. The collator encodes these rules so you don't have to write them manually.

The underlying mechanism uses a multi-level comparison. The Unicode Collation Algorithm defines primary, secondary, and tertiary weights. Primary weights distinguish base letters. "A" and "B" have different primary weights. Secondary weights distinguish accents. "A" and "Á" share a primary weight but differ in secondary weight. Tertiary weights distinguish case. "A" and "a" share primary and secondary weights but differ in tertiary weight. The collator compares levels in order. It only looks at the next level if the current level is equal. This structure allows you to control the sort behavior. You can ignore case by skipping the tertiary level. You can ignore accents by skipping the secondary level.

Minimal example

Here's the minimal setup: create a collator for your locale, then use it inside sort.Slice.

package main

import (
	"fmt"
	"sort"

	"golang.org/x/text/collate"
	"golang.org/x/text/language"
)

func main() {
	// words contains characters like Å that sit far from A in Unicode code points
	words := []string{"apple", "banana", "Ångström", "Zebra"}

	// language.English tells the collator to treat Å as a variant of A
	c := collate.New(language.English)

	// CompareString returns negative if i < j, zero if equal, positive if i > j
	sort.Slice(words, func(i, j int) bool {
		return c.CompareString(words[i], words[j]) < 0
	})

	fmt.Println(words)
}

The collator instance c is thread-safe. You can share it across goroutines. The CompareString method is safe to call concurrently. The collator holds immutable data after construction. Creating a collator involves loading locale data and building internal tables. This cost is paid once during collate.New. Subsequent comparisons are fast.

Under the hood: weights and levels

The collator assigns a weight vector to each string. The vector contains integers representing the primary, secondary, and tertiary levels. When you call CompareString, the collator generates these vectors and compares them element by element. The comparison stops at the first difference.

You can inspect the weights using collator.Weighting. This method returns a collate.Weighter interface. You can use this to sort complex structures where you need to pre-compute weights for performance. Pre-computing weights avoids re-scanning the string during every comparison. This matters when you sort the same data repeatedly or when the sort key is expensive to extract.

The language package provides constants for common locales. language.English, language.French, language.Swedish, and language.Japanese are available. You can also construct custom locales using language.Make. The collator supports fallback. If you request a locale that lacks specific data, the collator falls back to a parent locale or the default Unicode rules. This ensures the sort never fails, even for obscure locales.

Convention note: The golang.org/x/text package is the de facto standard for text processing in Go. It is maintained by the Go team and follows Go's stability guarantees. Import it from golang.org/x/text, not from a third-party mirror. Most projects that handle user-facing text depend on this package.

Realistic example: structs and case

Real lists often live inside structs. You also usually want case-insensitive sorting for user-facing lists. The collate.IgnoreCase() option adjusts the collator to treat uppercase and lowercase as equivalent.

package main

import (
	"fmt"
	"sort"

	"golang.org/x/text/collate"
	"golang.org/x/text/language"
)

type Product struct {
	Name string
	ID   int
}

func main() {
	products := []Product{
		{Name: "Café Latte", ID: 1},
		{Name: "apple pie", ID: 2},
		{Name: "Åland Cheese", ID: 3},
		{Name: "Zebra Cake", ID: 4},
	}

	// IgnoreCase makes the sort treat uppercase and lowercase as equal
	// The collator still respects locale rules for Å and é
	c := collate.New(language.English, collate.IgnoreCase())

	sort.Slice(products, func(i, j int) bool {
		// CompareString handles the locale logic; the lambda just checks order
		return c.CompareString(products[i].Name, products[j].Name) < 0
	})

	for _, p := range products {
		fmt.Printf("%d: %s\n", p.ID, p.Name)
	}
}

The output places "Åland Cheese" near "apple pie" because English rules treat "Å" as a variant of "A". "Café Latte" sorts near "apple pie" as well, because "é" is a variant of "e". The case-insensitive option ensures "apple pie" and "Café Latte" sort by their base letters, not by the ASCII difference between lowercase and uppercase.

You can combine options. collate.IgnoreCase() and collate.Numeric() work together. The numeric option makes the collator treat sequences of digits as numbers. "file2" sorts before "file10" because 2 is less than 10. Without the numeric option, "file10" sorts before "file2" because "1" is less than "2" in byte order.

Pitfalls and performance

Creating a collator inside a tight loop is a performance trap. The collator construction loads data and builds tables. If you create a new collator for every sort call, you pay this cost repeatedly. Create the collator once, store it in a package-level variable or a struct field, and reuse it.

// package level collator is built once when the package initializes
var englishCollator = collate.New(language.English, collate.IgnoreCase())

func sortNames(names []string) {
	// Reuse the pre-built collator for every sort
	sort.Slice(names, func(i, j int) bool {
		return englishCollator.CompareString(names[i], names[j]) < 0
	})
}

Forgetting to import the x/text package leads to a compile error. The compiler rejects the program with undefined: collate if the import is missing. The package is not part of the standard library. You must run go get golang.org/x/text/collate and go get golang.org/x/text/language to fetch the dependencies. Modern Go modules handle this automatically when you import the paths.

Using sort.Strings on locale-sensitive data produces silent bugs. The code compiles and runs. The output is just wrong. There is no compiler warning. The only way to catch this is testing with data that contains accents, diacritics, or non-Latin characters. Include such data in your test cases. Verify the sort order matches user expectations for the target locale.

The collator is heavier than byte comparison. If you are sorting millions of ASCII identifiers, sort.Strings is faster. Locale-aware sorting adds overhead for weight calculation. Use locale-aware sorting only when the data is displayed to humans. Internal IDs, file paths, and machine-generated tokens should use byte order.

Convention note: Receiver names in Go are usually one or two letters. If you wrap the collator in a struct, name the receiver s or c, not self or this. The community expects short receiver names. This keeps the code readable and consistent with the standard library.

Decision: when to use collate

Use sort.Strings when you are sorting internal identifiers, file paths, or ASCII-only data where byte order is sufficient and speed is the priority. Use collate.New with a specific locale when you are sorting text that users will read, especially if it contains accents, diacritics, or characters outside the basic Latin alphabet. Use collate.IgnoreCase() when the sort order must treat uppercase and lowercase letters as equivalent, which is the standard expectation for contact lists and product catalogs. Use collate.Numeric() when the strings contain embedded numbers that should be compared as integers, such as version strings or file names with sequence numbers. Use a pre-built collator instance across multiple sort calls when you are sorting many slices in the same process, since constructing a collator carries a non-trivial cost. Use collator.Weighting when you need to sort a large dataset repeatedly and want to avoid re-computing weights on every comparison.

Collators are thread-safe. Share them across goroutines. Don't fight the type system. Wrap the collator in a service struct if you need to inject it into handlers. Trust the Unicode rules. They cover edge cases you haven't thought of.

Where to go next

Standard sorting compares characters by their raw computer codes, which often ignores language rules like accents or special characters. This method uses a specialized tool that understands specific languages to arrange words in the correct dictionary order. It's like using a dictionary for a specific language instead of just lining up words alphabetically by their shape.