Use len(s) to get the byte count of a string, but use utf8.RuneCountInString(s) to get the actual number of characters (runes), which is critical for handling non-ASCII text correctly.
In Go, strings are immutable sequences of bytes, not characters. The built-in len() function returns the number of bytes, which works fine for ASCII but fails for Unicode characters that require multiple bytes (like emojis or Chinese characters). To get the human-readable character count, you must decode the UTF-8 sequence into runes.
Here is a practical example demonstrating the difference:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
s := "Hello δΈη" // "δΈη" is 2 characters but 6 bytes
// Byte length (total bytes in memory)
byteLen := len(s)
// Character length (actual runes)
runeLen := utf8.RuneCountInString(s)
fmt.Printf("String: %s\n", s)
fmt.Printf("Byte length (len): %d\n", byteLen)
fmt.Printf("Rune length (utf8): %d\n", runeLen)
}
Output:
String: Hello δΈη
Byte length (len): 11
Rune length (utf8): 8
In this example, "Hello" takes 5 bytes, and the two Chinese characters "δΈη" take 3 bytes each, totaling 11 bytes. However, there are only 8 distinct characters. If you are validating input length for a user interface or a database field that limits character count, len() will give you a false positive for non-ASCII input.
If you need to iterate over the string character by character, use a range loop, which automatically handles rune decoding:
s := "Go θ―θ¨"
count := 0
for range s {
count++
}
// count is now 4 (G, o, θ―, θ¨)
Key Takeaways:
- Use
len(s)for performance-critical operations where byte size matters (e.g., network transmission, file I/O). - Use
utf8.RuneCountInString(s)or arangeloop when logic depends on the number of visible characters (e.g., truncating text, password length limits). - Remember that
utf8.RuneCountInStringis slightly slower thanlen()because it must parse the UTF-8 encoding, so avoid it in tight loops unless necessary.