How to Get the Length of a String in Go (Bytes vs Runes)

Use `len(s)` to get the byte count of a string, but use `utf8.RuneCountInString(s)` to get the actual number of characters (runes), which is critical for handling non-ASCII text correctly.

Use len(s) to get the byte count of a string, but use utf8.RuneCountInString(s) to get the actual number of characters (runes), which is critical for handling non-ASCII text correctly.

In Go, strings are immutable sequences of bytes, not characters. The built-in len() function returns the number of bytes, which works fine for ASCII but fails for Unicode characters that require multiple bytes (like emojis or Chinese characters). To get the human-readable character count, you must decode the UTF-8 sequence into runes.

Here is a practical example demonstrating the difference:

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	s := "Hello δΈ–η•Œ" // "δΈ–η•Œ" is 2 characters but 6 bytes

	// Byte length (total bytes in memory)
	byteLen := len(s)

	// Character length (actual runes)
	runeLen := utf8.RuneCountInString(s)

	fmt.Printf("String: %s\n", s)
	fmt.Printf("Byte length (len): %d\n", byteLen)
	fmt.Printf("Rune length (utf8): %d\n", runeLen)
}

Output:

String: Hello δΈ–η•Œ
Byte length (len): 11
Rune length (utf8): 8

In this example, "Hello" takes 5 bytes, and the two Chinese characters "δΈ–η•Œ" take 3 bytes each, totaling 11 bytes. However, there are only 8 distinct characters. If you are validating input length for a user interface or a database field that limits character count, len() will give you a false positive for non-ASCII input.

If you need to iterate over the string character by character, use a range loop, which automatically handles rune decoding:

s := "Go 语言"
count := 0
for range s {
	count++
}
// count is now 4 (G, o, θ―­, 言)

Key Takeaways:

  • Use len(s) for performance-critical operations where byte size matters (e.g., network transmission, file I/O).
  • Use utf8.RuneCountInString(s) or a range loop when logic depends on the number of visible characters (e.g., truncating text, password length limits).
  • Remember that utf8.RuneCountInString is slightly slower than len() because it must parse the UTF-8 encoding, so avoid it in tight loops unless necessary.