To reverse a string in Go while preserving Unicode characters, you must iterate over rune values instead of byte values, as Go strings are UTF-8 encoded and a single character can span multiple bytes. Simply reversing the byte slice will corrupt multi-byte characters like emojis or non-ASCII letters.
Here is the standard approach using a rune slice:
package main
import (
"fmt"
"unicode/utf8"
)
func reverseString(s string) string {
// Convert string to a slice of runes to handle multi-byte characters
runes := []rune(s)
// Reverse the rune slice in place
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
func main() {
// Test with ASCII and Unicode (emoji, Chinese characters)
input := "Hello δΈη π"
fmt.Println("Original:", input)
fmt.Println("Reversed:", reverseString(input))
}
If you prefer a more explicit iteration without converting the whole string to a slice first (though the slice method is generally more idiomatic and performant for this specific task), you can use utf8.DecodeRuneInString, but the slice conversion is the most robust and readable pattern.
Output:
Original: Hello δΈη π
Reversed: π ηδΈ olleH
Key Considerations:
- Memory Allocation: Converting to
[]runeallocates a new slice. For extremely large strings where memory is critical, you might need a more complex in-place byte manipulation strategy, but for 99% of use cases, the rune slice approach is the correct trade-off for safety and simplicity. - Normalization: This method reverses the grapheme clusters as represented by the input runes. If your string contains combining characters (e.g.,
e+Β΄=Γ©), Go treats them as separate runes. Reversing them might separate the base character from its accent. For complex text processing involving combining marks, you would need thegolang.org/x/text/transformpackage withunicode/normto handle grapheme clusters correctly, but for standard string reversal, the rune slice is sufficient.
Avoid using bytes.Reverse or manual byte indexing, as these will split multi-byte UTF-8 sequences and result in invalid UTF-8 strings (often displaying as replacement characters οΏ½). Always work with rune types when character semantics matter.