The rune type in Go is simply an alias for int32 that represents a Unicode code point, allowing you to handle individual characters from any language correctly. Unlike strings, which are byte slices, rune values ensure that multi-byte UTF-8 characters (like emojis or Chinese characters) are treated as single logical units.
You typically encounter rune when iterating over a string or when you need to perform arithmetic on character codes. If you iterate over a string using a for loop, Go automatically decodes the UTF-8 bytes into runes, giving you the actual character rather than raw bytes. This is crucial because a single character in UTF-8 can span 1 to 4 bytes; treating it as a byte slice would split the character incorrectly.
Here is a practical example showing the difference between iterating by byte and by rune:
package main
import "fmt"
func main() {
s := "Hello δΈη"
// Iterating by byte (incorrect for multi-byte chars)
fmt.Println("Bytes:")
for i, b := range s {
fmt.Printf("Index %d: %d (char: %c)\n", i, b, b)
}
// Iterating by rune (correct for all Unicode)
fmt.Println("\nRunes:")
for i, r := range s {
fmt.Printf("Index %d: %d (char: %c)\n", i, r, r)
}
}
In the output above, the byte loop splits the Chinese characters "δΈ" and "η" into multiple indices because they are multi-byte. The rune loop correctly identifies them as single entities at their respective starting byte indices.
You can also explicitly convert between strings, bytes, and runes when needed. To get a string from a rune, use string(r). To get a rune from a string, ensure the string contains exactly one character, or iterate to find the first one. Be careful with conversions: if a string contains multiple characters, casting it directly to rune only takes the first character's code point, which can lead to bugs if the string is longer than expected.
package main
import "fmt"
func main() {
// Converting rune to string
r := 'A'
s := string(r)
fmt.Printf("Rune to string: %s\n", s) // Output: A
// Converting string to rune (only works for single char)
single := "A"
r2 := rune(single[0])
fmt.Printf("String to rune: %c\n", r2) // Output: A
// Handling multi-byte character
multi := "δΈ"
r3 := rune(multi[0]) // This is WRONG if you just take index 0
// Correct way: iterate or ensure single char
for _, r := range multi {
fmt.Printf("Correct rune: %c\n", r) // Output: δΈ
}
}
Use rune whenever you need to manipulate individual characters in a Unicode-aware way, but remember that it is just an integer under the hood, so it can participate in standard arithmetic operations if you need to shift character codes.