The byte trap
You are building a user profile feature. You need to extract the first character of a name to generate an avatar. You write name[0]. You test it with "Alice". It returns 'A'. You test it with "José". It returns 'J'. Everything looks fine.
Then a user signs up with the name "你好". You run name[0]. You expect the character '你'. Instead, you get the number 228. That is not a character. That is the first byte of a multi-byte sequence. If you try to print it, you see garbage. If you loop through the string using len(name) as your limit and access name[i], you might panic with runtime error: string index out of range because your index logic assumes one character equals one index, which is false for non-ASCII text.
The panic happens because Go strings are sequences of bytes, not characters. Indexing a string accesses bytes. If you treat a string like an array of characters, the program breaks the moment Unicode enters the picture.
Strings are bytes, not characters
A Go string is an immutable sequence of bytes. The bytes are UTF-8 encoded. UTF-8 is a variable-width encoding. ASCII characters like 'a' or '1' take one byte. Characters like 'é' take two bytes. Chinese characters like '你' take three bytes. Emojis like '👋' take four bytes.
When you call len(s), Go returns the number of bytes. When you write s[i], Go returns the byte at position i. Go does not decode UTF-8 during indexing. It does not know about characters. It only knows about bytes.
Think of a string as a roll of film. Each frame is a byte. Some images fit on one frame. Some images span four frames. Indexing counts frames. If you want the second image, you cannot just look at frame 2. You have to count frames until you find the start of the second image.
This design keeps strings cheap. A string header is just a pointer to memory and a length. Slicing a string creates a new header pointing to the same memory. No copying happens. If strings were arrays of characters, slicing would require decoding and reallocating, which would be slow.
Minimal example
Byte indexing works for ASCII. It fails for anything else.
package main
import "fmt"
func main() {
// ASCII text: one byte per character.
ascii := "hello"
fmt.Println(len(ascii)) // 5
fmt.Println(ascii[0]) // 104, which is the byte value of 'h'
// Multi-byte text: emoji is four bytes.
emoji := "👋"
fmt.Println(len(emoji)) // 4
fmt.Println(emoji[0]) // 240, first byte of the emoji
fmt.Println(emoji[1]) // 159, second byte
// Accessing index 4 panics because the length is 4.
// fmt.Println(emoji[4]) // panic: runtime error: string index out of range
}
The compiler allows s[i] because i might be valid. The check happens at runtime. If i is greater than or equal to len(s), the runtime panics. The panic message is runtime error: string index out of range.
The expression s[i] has type byte. You cannot assign it to a rune variable. The compiler rejects the code with cannot use s[i] (untyped byte constant) as rune value in assignment. A rune is an alias for int32 and represents a Unicode code point. A byte is an alias for uint8. They are different types.
Iterating correctly with range
When you need to process characters, use a range loop. The range loop over a string decodes UTF-8 automatically. It yields the byte index and the rune value on each iteration.
// CountRunes returns the number of Unicode characters in the string.
func CountRunes(s string) int {
count := 0
// Range over a string yields runes, handling UTF-8 decoding.
for _, r := range s {
count++
}
return count
}
The range loop handles multi-byte characters correctly. It skips the continuation bytes and yields the full rune. This is the idiomatic way to iterate over text in Go. It avoids manual decoding and prevents slicing errors.
Range loops are slightly slower than byte indexing because they must decode UTF-8. The cost is small for most applications. If you are processing massive files byte by byte, stick to byte indexing. If you are handling user input, use range loops.
Slicing and the UTF-8 boundary
Slicing a string with s[i:j] copies the header and points to the same underlying bytes. The slice contains bytes from index i to j-1. If i or j falls in the middle of a multi-byte character, the resulting string contains invalid UTF-8.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
s := "你好"
// len(s) is 6. Each character is 3 bytes.
// Slicing at index 3 cuts the second character in half.
bad := s[3:6]
fmt.Println(utf8.ValidString(bad)) // false
}
The unicode/utf8 package provides functions to work with UTF-8 safely. utf8.DecodeRuneInString returns the rune and its width in bytes. You can use this to find the correct byte index for a character position.
// SafeSubstring returns the first n characters, respecting UTF-8 boundaries.
func SafeSubstring(s string, n int) string {
if n <= 0 {
return ""
}
// Count runes and find the byte index.
count := 0
for i := range s {
count++
if count == n {
// i is the byte index of the nth rune.
// Return the slice up to the end of this rune.
// Range loop yields the start index of each rune.
// We need to return s[:i+width], but range doesn't give width directly here.
// A simpler approach is to convert to runes for random access.
return string([]rune(s)[:n])
}
}
// If n is larger than the rune count, return the whole string.
return s
}
Converting to []rune is safe but allocates memory. For very large strings, iterating with utf8.DecodeRuneInString avoids allocation. The trade-off is code complexity. Most codebases prefer []rune for simplicity unless profiling shows a bottleneck.
Pitfalls and compiler errors
Strings are immutable. You cannot modify a byte in place. The compiler stops you with cannot assign to s[i]. You must build a new string.
package main
func main() {
s := "hello"
// This line fails to compile.
// s[0] = 'H' // compile error: cannot assign to s[i]
}
To change a string, convert it to a []byte or []rune, modify the slice, and convert back.
// CapitalizeFirst returns the string with the first character capitalized.
func CapitalizeFirst(s string) string {
// Convert to byte slice for mutation.
// This works for ASCII. For Unicode, use []rune.
b := []byte(s)
if len(b) > 0 {
if b[0] >= 'a' && b[0] <= 'z' {
b[0] -= 32
}
}
return string(b)
}
Using len for character count is a common mistake. len returns bytes. If you pass len(s) to a function expecting a character count, the logic breaks. Use utf8.RuneCountInString(s) for character count.
Another pitfall is assuming s[i] is a character. If you compare s[i] == 'é', the comparison fails because 'é' is a rune and s[i] is a byte. The compiler may warn about constant comparison, but if you compare variables, the bug hides at runtime. Always compare runes or use string functions.
Decision matrix
Pick the tool that matches your access pattern.
Use byte indexing when you are processing ASCII-only text or binary data stored in a string and performance is critical.
Use a range loop over the string when you need to iterate over characters and handle Unicode correctly without allocating extra memory.
Use []rune(s) when you need random access to characters by index, such as s[i] where i is a character position.
Use the utf8 package functions when you need to decode runes manually, validate UTF-8, or count characters without converting the whole string.
Use strings.Builder when constructing a new string from parts to avoid quadratic allocation costs.
Where to go next
- How to Reverse a String in Go (Unicode-Safe)
- How to Trim Whitespace from a String in Go
- How to Validate Input with Regex in Go
Strings are bytes. Characters are runes. Range loops decode UTF-8 for free. Use them. Pick the tool that matches your access pattern.