When bytes break your text logic
You write a function to reverse a string. It works perfectly for "hello". You test it with "cafรฉ" and the output is garbled. You test it with a flag emoji and the result contains replacement characters. The bug isn't in your reversal logic. The bug is that you treated a Go string as a sequence of characters.
Go strings are sequences of bytes. When you index into a string, you get a byte. If a character uses multiple bytes in UTF-8, slicing by index tears the character apart. To work with actual characters, you need to decode the UTF-8 stream and convert the string to a slice of runes.
Runes decode the UTF-8 stream
A string in Go is immutable data stored as a sequence of bytes. The language does not enforce UTF-8, but the standard library assumes it. If you pass binary data as a string, you are fighting the ecosystem.
A rune is an alias for int32 that represents a Unicode code point. Converting a string to a []rune decodes the UTF-8 bytes and produces a slice where each element is one logical code point. This handles multi-byte characters correctly. The conversion allocates a new slice on the heap. It copies the data. It is not a view; it is a transformation.
Here is the conversion. It takes a string and returns a slice of runes.
package main
import "fmt"
func main() {
// String literal contains ASCII characters.
s := "hello"
// Convert to rune slice. Each rune is one Unicode code point.
r := []rune(s)
// Print length to show rune count matches character count for ASCII.
fmt.Println(len(r)) // prints: 5
}
The compiler knows []rune is a type conversion. It generates code to decode UTF-8 at runtime. The runtime decoder iterates over the bytes of the string, decodes each UTF-8 sequence, and appends the resulting int32 value to a new slice. If the string contains invalid UTF-8, the decoder replaces bad sequences with the Unicode replacement character \uFFFD. The result is a new slice. Modifying the slice does not affect the original string because strings are immutable and the slice is a copy.
The conversion allocates and copies
The []rune(s) conversion has a cost. It allocates a new slice and copies data into it. The size of the allocation depends on the content of the string.
For ASCII text, each character is one byte. The rune slice uses four bytes per character. Converting a 100KB string of ASCII produces a 400KB rune slice. For multi-byte text, the ratio changes. A string of emoji uses four bytes per character. The rune slice still uses four bytes per rune. The memory usage stays roughly the same as the string.
The allocation happens on the heap. The garbage collector must reclaim the slice when it goes out of scope. In a tight loop, repeated conversions create pressure on the allocator. Profile your code if you convert strings to runes in hot paths.
Realistic example: reversing text safely
Here is a function that reverses a string. It converts to runes, reverses the slice, and converts back. This preserves emojis and accented characters.
package main
import "fmt"
// ReverseString returns a new string with runes in reverse order.
func ReverseString(s string) string {
// Convert to rune slice to handle multi-byte characters correctly.
runes := []rune(s)
// Reverse the slice in place.
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
// Convert back to string.
return string(runes)
}
func main() {
// Test with emoji to verify multi-byte handling.
input := "Go ๐"
result := ReverseString(input)
fmt.Println(result) // prints: ๐ oG
}
The function allocates a rune slice, modifies it, and allocates a new string. This is safe and correct for most use cases. The string(runes) conversion encodes the runes back to UTF-8 bytes. It also allocates. If you need to reverse text in a performance-critical path, consider writing a custom decoder that writes bytes directly to a buffer.
Range loops avoid allocation
If you only need to iterate over the characters, you do not need a rune slice. The range loop over a string yields runes without allocating a slice.
package main
import "fmt"
func main() {
s := "cafรฉ"
// Range decodes UTF-8 on the fly. No slice allocation.
for i, r := range s {
// i is the byte index. r is the rune value.
fmt.Printf("byte %d: %c\n", i, r)
}
}
The range loop calls the UTF-8 decoder internally. It yields the byte index and the rune value. It does not create a slice. It is lazy. Use range for iteration. Use []rune when you need random access or mutation.
Runes are not always characters
Runes are Unicode code points. They are not always what a user considers a character. Some visual characters consist of multiple code points.
A flag emoji like ๐ฌ๐ง is two runes: a regional indicator symbol for G and one for B. A family emoji like ๐จโ๐ฉโ๐งโ๐ฆ is a sequence of base emoji joined by zero-width joiners. []rune splits these sequences. If you count runes, you get the wrong length. If you reverse runes, you break the sequence.
To handle grapheme clusters correctly, use the golang.org/x/text/unicode/norm package for normalization or the golang.org/x/text/segment package for segmentation. These packages understand combining marks and joiners. They group code points into user-perceived characters.
Runes are code points. Graphemes are characters. Pick the right abstraction for your problem.
Pitfalls and compiler errors
The compiler rejects conversions from non-string types. If you pass an integer, you get cannot convert 123 (untyped int constant) to type []rune. You must pass a string.
The []rune conversion does not panic on invalid UTF-8. It replaces invalid bytes with the replacement character. If you need to validate input, checking the result for \uFFFD is a common pattern. Alternatively, use utf8.DecodeRuneInString to inspect the next rune and its byte length without allocating.
The rune type is identical to int32. You can use them interchangeably. The compiler treats them as the same type. Use rune for readability when dealing with text. Use int32 for numeric calculations.
When to use runes versus bytes
Use []rune(s) when you need random access to characters or need to modify the sequence of characters.
Use range s when you only need to iterate over the string once and don't need indexing or mutation.
Use utf8.DecodeRuneInString when you need to validate UTF-8 or inspect the byte length of the next character without allocating a slice.
Use byte slicing s[i:j] when you are working with ASCII-only data or fixed-width byte formats and performance is critical.
Use the golang.org/x/text/unicode/norm package when you need to handle grapheme clusters like combined emoji or accented characters correctly.
Strings are bytes. Runes are characters. Know the difference.