The Range Over String Returns Runes, Not Bytes

You are building a text logger that truncates messages to a maximum length. You write a helper function that slices the string at index 100. Most of the time, it works fine. Then a user submits a message full of emojis or Chinese characters. The output is garbled. The truncation cut a multi-byte character in half, leaving a broken glyph and a panic downstream when the database rejects invalid UTF-8. Or you are calculating a checksum and your loop skips bytes because the range loop jumped over continuation bytes, treating them as part of the previous character.

The issue is the same in both cases. You assumed the string is a sequence of characters where each character occupies one slot. In Go, strings are sequences of bytes. Those bytes encode text in UTF-8, which is a variable-width encoding. Some characters take one byte. Others take two, three, or four. The range loop is smart enough to decode UTF-8. It yields full Unicode code points, called runes, not raw bytes. If you need bytes, range is the wrong tool.

Runes versus bytes

Go strings are immutable sequences of bytes. The language does not enforce a specific encoding, but the convention is UTF-8. UTF-8 is designed so that ASCII characters (values 0 to 127) use a single byte. Characters outside ASCII use multi-byte sequences. The emoji 🚀 is four bytes. The letter é is two bytes. The character 中 is three bytes.

When you use range on a string, the compiler generates code that decodes the UTF-8 stream. The loop variable is a rune, which is an alias for int32. You get the full Unicode code point value. The loop advances by the width of the character. If the character is one byte, the index moves by one. If it is four bytes, the index jumps by four. You never see the internal bytes of a multi-byte sequence.

If you need the raw bytes, you must iterate over a []byte slice. Converting a string to a byte slice copies the data. The loop then yields uint8 values, one per byte. You see every byte, including the continuation bytes that make up multi-byte characters.

Here's the difference in action. A string with an emoji reveals the gap between runes and bytes.

package main

import "fmt"

func main() {
	// String contains ASCII and a 4-byte emoji.
	s := "Hi 🚀"

	// Range over string yields runes (int32).
	// The loop runs three times: 'H', 'i', '🚀'.
	for _, r := range s {
		fmt.Printf("Rune: %c, Value: %d\n", r, r)
	}

	// Range over byte slice yields bytes (uint8).
	// The loop runs six times: 'H', 'i', space, then four bytes of the emoji.
	for _, b := range []byte(s) {
		fmt.Printf("Byte: %02x\n", b)
	}
}

Range decodes. Indexing peeks. Pick the tool that matches your data.

How the decoder works

When the compiler sees for _, r := range s, it emits a call to a runtime decoder. The decoder reads the first byte. If the high bit is zero, the byte is a single-byte ASCII character. The decoder yields that value and moves the index forward by one.

If the high bits indicate a multi-byte sequence, the decoder reads the continuation bytes. UTF-8 uses a specific pattern: the leading byte tells you how many bytes follow, and the continuation bytes start with 10. The decoder assembles the code point from the bits, yields the rune, and jumps the index forward by the sequence length.

This decoding happens on every iteration. It adds overhead compared to a raw byte loop. For ASCII-only strings, the decoder is fast because the check is simple. For strings with many multi-byte characters, the cost adds up. The trade-off is safety. You never get a partial character. You never have to manually check byte patterns. The language handles the complexity.

If the string contains invalid UTF-8, the decoder yields the replacement character 0xFFFD and advances by one byte. It never panics on bad input. This behavior makes range robust against malformed data. You can process a string even if it has errors, and the loop will continue.

The index variable tracks bytes, not runes. This trips up developers who expect i to be a sequential counter.

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	// String with mixed width characters.
	s := "A🚀B"

	// i is the byte offset, not the rune index.
	// i jumps by the width of the previous rune.
	for i, r := range s {
		// RuneLen returns byte width without allocating a string.
		width := utf8.RuneLen(r)
		fmt.Printf("Byte index: %d, Rune: %c, Width: %d\n", i, r, width)
	}
}

The index is a byte pointer, not a counter. Use it for slicing, not for counting characters.

Truncating text safely

Truncation is a common task. You want to limit a string to a maximum number of characters. Using s[:n] cuts at byte index n. If n lands in the middle of a multi-byte sequence, you break the character. The result is invalid UTF-8.

To truncate safely, you need to count runes and find the byte index where the last allowed rune ends. You can use range to count runes and track the byte position. The unicode/utf8 package provides helpers to measure rune width without allocating.

Here's how to truncate text without breaking multi-byte characters.

package main

import (
	"fmt"
	"unicode/utf8"
)

// TruncateRune returns the first n runes of s.
// It handles multi-byte characters correctly.
func TruncateRune(s string, n int) string {
	// Count runes and find the byte index where the n-th rune ends.
	count := 0
	byteIndex := 0
	for _, r := range s {
		count++
		if count == n {
			// RuneLen returns the width of the rune in bytes.
			byteIndex += utf8.RuneLen(r)
			break
		}
		// Accumulate byte width for runes before the limit.
		byteIndex += utf8.RuneLen(r)
	}

	// If the string has fewer than n runes, return the whole string.
	if count < n {
		return s
	}

	// Slice at the byte index to avoid cutting a rune in half.
	return s[:byteIndex]
}

func main() {
	// "Hello 🌍" has 6 runes but 8 bytes.
	msg := "Hello 🌍"
	fmt.Println(TruncateRune(msg, 5)) // Prints "Hello"
	fmt.Println(TruncateRune(msg, 7)) // Prints "Hello 🌍"
}

Cut by runes, not bytes. Garbled text is a bug, not a feature.

Pitfalls and errors

Strings are immutable. You cannot modify a string in place. If you try to assign to an index, the compiler rejects the code with cannot assign to s[i]. You must convert the string to a []byte or []rune slice, modify the slice, and convert back to a string.

Another trap is len(s). This returns the byte length, not the rune count. For "🚀", len is 4. If you use len to allocate a buffer for characters, you will overflow. If you use len to check if a string is empty, it works fine because an empty string has zero bytes and zero runes. But for any other length check, len measures bytes.

Indexing s[i] returns a byte. If i points to a continuation byte of a multi-byte sequence, you get a fragment that does not represent a valid character. Use range to get valid runes, or use the unicode/utf8 package to decode safely.

Conversions allocate memory. []byte(s) creates a new slice and copies the bytes. []rune(s) creates a new slice of int32 values and decodes the string. If you call these conversions in a tight loop, the allocator gets busy. Prefer range over the string when possible. If you need byte access, consider indexing s[i] to avoid allocation, or pass a []byte slice if the caller already has one.

The underscore discards a value intentionally. In for _, r := range s, the underscore says "I considered the index and chose to drop it." This is the standard idiom when you only need the value. The compiler warns if you declare a variable and never use it, but the underscore suppresses that warning.

Strings don't change. Convert to mutate.

When to use runes, bytes, or indices

Go gives you multiple ways to access string data. Each has a specific use case. Pick the right one based on what you need.

Use range over a string when you need to process valid Unicode characters and handle multi-byte sequences automatically. This is the default choice for text processing, parsing, and display logic.

Use range over a []byte slice when you need raw byte access for binary protocols, checksums, or performance-critical loops where decoding overhead matters. Be aware that the conversion allocates memory.

Use indexing s[i] when you need a single byte at a known offset and you are certain the offset aligns with a byte boundary, such as checking a magic number at the start of a buffer. Indexing avoids allocation but gives you bytes, not runes.

Use []rune(s) conversion when you need random access to characters by index, such as chars[5], and you accept the allocation cost of creating a slice of int32 values. This is useful for small strings where convenience outweighs performance.

Use the unicode/utf8 package functions when you need to measure rune length, decode runes at specific offsets, or validate UTF-8 without iterating the whole string. Functions like utf8.DecodeRuneInString let you inspect a rune at a given byte index without a loop.

Runes for text. Bytes for data. Know the difference.

Where to go next

Go strings are sequences of characters, not just raw data, so looping over them gives you the actual letters or symbols (runes) regardless of how many bytes they take up in memory. If you need to process the raw underlying data byte-by-byte, you must explicitly convert the string into a byte slice first. Think of it like reading a book: ranging over a string reads the words, while ranging over bytes reads the ink marks on the page.