How to Iterate Over Characters in a Go String

Iterate over Go string characters using a range loop to get correct Unicode rune handling.

The string is bytes, the character is a rune

You write a function to reverse a string. It works for "hello". You test it with "cafΓ©" and it works. You test it with "πŸš€" and the rocket ship turns into garbage bytes. Or you try to count the characters in a user's bio and the limit check fails because one emoji counts as four bytes instead of one. This happens because Go strings are not arrays of characters. They are sequences of bytes.

When you index into a string, you get a byte. When you check the length, you get the byte count. If your text contains anything outside ASCII, treating a string as a list of characters breaks immediately. Go solves this with a clear separation: the string type holds the raw UTF-8 bytes, and the rune type represents a Unicode character. The range loop is the bridge between them.

Bytes, runes, and UTF-8

Go strings are immutable sequences of bytes. The language assumes strings are encoded in UTF-8. UTF-8 is a variable-length encoding. An ASCII character like 'A' takes one byte. A Latin character like 'Γ©' takes two bytes. Most common scripts like Chinese or Japanese take three bytes. Emojis and rare symbols take four bytes.

A rune in Go is a 32-bit integer that holds a Unicode code point. It corresponds to what other languages call a "character" or "char". When you iterate over a string and want characters, you want runes. The range loop decodes the UTF-8 bytes and yields runes automatically. You don't need to write bit manipulation logic to parse the encoding. The loop handles the variable lengths, assembles the code points, and hands you the runes in order.

Strings are bytes. Runes are characters. range bridges the gap.

The minimal iteration

Here's the standard way to iterate over characters in a Go string. The range keyword walks the string, decodes UTF-8, and yields the byte index and the rune.

s := "Hello, δΈ–η•Œ"
for i, r := range s {
	// i is the byte offset where the rune starts
	// r is the decoded Unicode code point
	fmt.Printf("Index: %d, Rune: %c\n", i, r)
}
# output:
Index: 0, Rune: H
Index: 1, Rune: e
Index: 2, Rune: l
Index: 3, Rune: l
Index: 4, Rune: o
Index: 5, Rune: ,
Index: 6, Rune:  
Index: 7, Rune: δΈ–
Index: 10, Rune: η•Œ

The index i jumps by the byte length of the previous rune. 'δΈ–' starts at byte 7 and takes three bytes, so the next rune starts at byte 10. The loop tracks this for you. If you don't need the index, discard it with the blank identifier _. Using _ signals to readers that you intentionally skipped the value.

for _, r := range s {
	// process rune r
}

How the loop works under the hood

The range loop reads the first byte of the string. UTF-8 uses the top bits of the first byte to indicate the sequence length. If the top bit is 0, the byte is a single ASCII character. The loop yields that rune and advances the index by one.

If the top bits are 110, the byte starts a two-byte sequence. The loop reads the next byte, which must start with 10, and combines them into a single rune. It yields the rune and advances the index by two. This pattern continues for three-byte and four-byte sequences. The loop never yields partial characters. It always returns complete runes.

This decoding has a cost. Iterating over runes is slower than iterating over bytes because the loop must check prefixes and assemble values. If your data is guaranteed to be ASCII, iterating over a []byte slice is faster. For general text, the correctness of range outweighs the performance difference.

Realistic usage: building and transforming

Real code often needs to transform text or count characters. Here's a function that censors a string by replacing every rune with an asterisk. It uses strings.Builder to construct the result efficiently.

import "strings"

// Censor replaces every rune in s with an asterisk.
func Censor(s string) string {
	// Builder avoids repeated allocations during string construction
	var b strings.Builder
	// Grow reserves capacity to reduce reallocations
	b.Grow(len(s))

	for _, r := range s {
		// WriteRune encodes the rune as UTF-8 and appends it
		b.WriteRune('*')
	}
	// String returns the final result
	return b.String()
}

The Grow call reserves capacity. The argument is an estimate of the final size. Using len(s) is safe because the number of runes is at most the number of bytes. The builder allocates memory once and fills it. This is much faster than concatenating strings in a loop, which creates a new allocation on every iteration.

When you only need the character count, use the utf8 package. A manual loop works, but the standard library function is optimized and expresses intent clearly.

import "unicode/utf8"

// CountRunes returns the number of Unicode code points in s.
func CountRunes(s string) int {
	// RuneCountInString decodes the string and counts runes
	return utf8.RuneCountInString(s)
}

The community accepts that len(s) returns bytes. This is a convention that keeps len fast and consistent. If you need rune count, use utf8.RuneCountInString. Don't fight the convention; use the right tool.

Pitfalls and compiler errors

Indexing a string gives you a byte, not a character. The compiler won't stop you, but the result is wrong for non-ASCII text. Accessing s[7] on "Hello, δΈ–η•Œ" returns the integer 228, which is the first byte of the UTF-8 encoding for 'δΈ–'. Printing that number gives you garbage. If you need the character at a specific position, convert the string to a []rune slice first.

Strings are immutable. You cannot assign to an index. The compiler rejects s[0] = 'H' with cannot assign to s[0]. You must build a new string with the changes. This immutability allows strings to be shared safely across goroutines without locks.

Invalid UTF-8 input doesn't cause a panic. If the string contains malformed bytes, range yields the Unicode replacement character U+FFFD and advances the index by one byte. This keeps the loop safe and prevents infinite loops. It might hide data corruption, so validate input if you expect clean UTF-8.

Converting a string to []rune allocates memory. The conversion decodes every character and stores each rune as a 4-byte integer. For a 1MB ASCII string, []rune(s) allocates 4MB. This is fine for small strings or random access, but avoid it for large texts if you can iterate with range instead.

Indexing a string gives you bytes, not characters. Trust range to decode UTF-8.

When to use what

Use a range loop over the string when you need to process Unicode characters correctly and don't require random access by index.

Use a range loop over a []byte slice when you are parsing binary data or ASCII-only protocols where performance matters and you don't care about multi-byte characters.

Use utf8.RuneCountInString when you only need the character count and want to avoid the overhead of a loop.

Use a []rune conversion when you need random access to characters by index, such as r[i], and the string is small enough to fit in memory.

Use strings.Builder when you are constructing a new string from runes to avoid multiple allocations.

Pick the tool that matches your data. Bytes for speed, runes for text.

Where to go next