How to Iterate Over a String in Go with range

The string loop that breaks on emojis

You write a text processor. It handles English perfectly. You test "hello world" and the loop counts 11 characters. You test "café" and the count jumps to 5, but the byte length is 6. You test "🚀" and the rocket explodes into four random bytes that look like garbage. Your slicing logic turns accented characters into broken fragments. You assumed a string is a list of characters. In Go, a string is a list of bytes. The range keyword is the bridge that translates those bytes back into meaning.

Strings are bytes. Range is the decoder.

Go strings are immutable UTF-8 byte sequences. UTF-8 is a variable-width encoding designed to be efficient and compatible with ASCII. ASCII characters take one byte. Accented characters like é take two bytes. Emojis and rare symbols take three or four bytes. The len() function returns the byte count, not the character count.

When you use range over a string, the compiler inserts UTF-8 decoding logic. You do not get raw bytes. You get the byte index and the rune. A rune is a Unicode code point, represented as an int32. The index is the byte offset where the character starts. The rune is the decoded value.

This design keeps strings cheap to pass around and store, while range provides the abstraction needed to work with text. You get the performance of byte arrays and the correctness of Unicode iteration.

Minimal iteration

Here's the basic loop. Range handles the decoding automatically. The index jumps by the number of bytes each character consumes.

package main

import "fmt"

func main() {
    // "café" has 5 bytes: c, a, f, é (2 bytes)
    s := "café"
    // range decodes UTF-8 automatically
    for i, r := range s {
        // i is byte offset, r is the rune value (int32)
        fmt.Printf("Index: %d, Rune: %c\n", i, r)
    }
}

The output shows the index jumping from 3 to 5. The character é starts at byte 3 and occupies two bytes. The next iteration starts at byte 5, which is past the end of the string, so the loop terminates. The variable r holds the rune value. The %c verb prints the character representation.

If you only need the runes and not the indices, discard the index with an underscore. This is the standard convention.

for _, r := range s {
    // r is the rune, index is ignored
    fmt.Printf("%c", r)
}

What happens under the hood

The range keyword over a string compiles to a loop that calls UTF-8 decoding functions. The runtime scans the byte sequence, identifies the start of each character, decodes the code point, and yields the index and rune. The index is always the byte position. It is never the character count.

This matters when you try to slice the string. Slicing uses byte offsets. If you try to extract a character using s[i:i+1], you get one byte. If i points to the second byte of a multi-byte character, you get a broken fragment. The compiler cannot detect this error because the types match. It is a logic bug.

The rune variable r is an int32. It is not a string. It is not a byte. If you try to assign r to a byte variable, the compiler rejects the program with cannot use r (variable of type int32) as byte value in assignment. If you need a string containing that single character, convert it with string(r). This conversion allocates a new string. Avoid doing this inside a tight loop if you are building a larger result. Use strings.Builder or a []rune slice instead.

The index trap and slicing

The most common mistake is treating the index as a character index. Developers write loops that collect characters into a slice using the index, or they try to reverse a string by swapping indices. Both approaches fail on multi-byte characters.

Consider a function that tries to extract every character into a slice of strings.

// BAD: assumes 1 byte per character
func extractChars(s string) []string {
    var chars []string
    for i := range s {
        // s[i:i+1] gets one byte, not one character
        // This breaks on multi-byte runes
        chars = append(chars, s[i:i+1])
    }
    return chars
}

This function returns garbage for any non-ASCII text. The correct approach uses the rune directly. If you need a slice of runes, iterate and append the rune. If you need a slice of strings, convert the rune.

// GOOD: uses runes for correctness
func extractRunes(s string) []rune {
    // Pre-allocate capacity to avoid reallocations
    runes := make([]rune, 0, len(s))
    for _, r := range s {
        runes = append(runes, r)
    }
    return runes
}

Index is bytes. Rune is meaning. Never slice a string using a rune index.

Handling garbage data

Strings in Go can contain invalid UTF-8. This happens when you read binary data, process untrusted input, or concatenate bytes from different sources. The range keyword handles this gracefully. When it encounters invalid UTF-8, it yields the replacement character \uFFFD (U+FFFD) and advances the index by one byte.

This behavior allows you to iterate over corrupted data without panicking. The replacement character signals that the byte sequence was invalid. You can detect it and decide how to handle the error.

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    // Invalid UTF-8: 0xFF is not a valid start byte
    s := "hello\xFFworld"
    for i, r := range s {
        if r == utf8.RuneError {
            fmt.Printf("Invalid UTF-8 at byte %d\n", i)
        } else {
            fmt.Printf("Valid rune at byte %d: %c\n", i, r)
        }
    }
}

The replacement character is a signal, not a bug. If you need to distinguish between a valid replacement character in the input and an actual decoding error, you must use the utf8 package directly. The range keyword cannot tell the difference because it only yields the rune value.

Realistic example: counting characters vs bytes

Real code often needs the character count, not the byte count. User interfaces display character limits. Text analysis counts words and letters. The len() function is wrong for these cases.

Here's a function that counts runes and bytes, highlighting the difference.

// Stats returns the byte count and rune count of s.
// Byte count comes from len(). Rune count requires iteration.
func Stats(s string) (bytes, runes int) {
    bytes = len(s)
    // range iterates runes, ignoring byte complexity
    for range s {
        runes++
    }
    return
}

This function is fast and correct. The range loop decodes UTF-8 efficiently. The compiler optimizes the loop to avoid creating intermediate values. The runes counter increments for each valid code point. Invalid UTF-8 sequences also increment the counter because range yields a replacement rune for each invalid byte. If you need to exclude invalid sequences, check for utf8.RuneError.

Performance and allocation

Range over a string is efficient. It decodes UTF-8 in place without allocating memory. The index and rune are local variables. There is no heap allocation per iteration.

Casting a string to a byte slice allocates memory. []byte(s) creates a new slice and copies the data. Ranging over the slice iterates bytes, not runes. This is faster for ASCII-only data because it skips decoding, but the allocation cost usually outweighs the benefit. Use range s unless you have a profiler showing that decoding is the bottleneck and the data is guaranteed to be ASCII.

If you need to modify the string, you cannot do it in place. Strings are immutable. You must build a new string. Use strings.Builder for performance.

// UppercaseRunes returns a new string with all runes uppercased.
// Uses strings.Builder to avoid repeated allocations.
func UppercaseRunes(s string) string {
    var b strings.Builder
    // Grow to avoid reallocations
    b.Grow(len(s))
    for _, r := range s {
        // WriteRune handles UTF-8 encoding
        b.WriteRune(rune(unicode.ToUpper(r)))
    }
    return b.String()
}

The WriteRune method encodes the rune back to UTF-8 bytes. This is the safe way to reconstruct a string from runes. String concatenation in a loop creates a new string on every iteration, leading to quadratic performance.

When to use range vs alternatives

Use range s when you need to iterate over Unicode characters. This is the default choice for text processing. Use range []byte(s) when you need raw bytes or are processing binary data. Use strings.Fields or regexp when you need tokens or words, not individual characters. Use utf8.DecodeRuneInString when you need manual control over decoding and must distinguish between valid replacement characters and decoding errors. Use a simple index loop for i := 0; i < len(s); i++ only when you are certain the data is ASCII and you need maximum performance, but prefer range for correctness.

Range decodes UTF-8. Trust the rune, verify the index. Strings are bytes. Range is the decoder.

Where to go next

The range keyword lets you loop through every character in a string one by one. It automatically gives you the position (index) and the character itself (rune) at each step. Think of it like reading a book line by line while keeping track of the page number.