What Is the rune Type in Go

The `rune` type in Go is simply an alias for `int32` that represents a Unicode code point, allowing you to handle individual characters from any language correctly.

When bytes betray your string

You're building a text processor. A user pastes a string containing "Hello 世界". You ask for the length. You expect 9. Go says 13. You try to grab the last character with s[len(s)-1]. You get a byte that prints as garbage. The string isn't a list of characters. It's a list of bytes. UTF-8 encodes characters into variable-width sequences. ASCII takes one byte. Chinese characters take three. Emojis take four. Slicing by byte index breaks multi-byte characters. You need rune to treat text as logical units.

The rune type is an integer alias

The rune type is an alias for int32. It represents a Unicode code point. A code point is the unique integer assigned to a character in the Unicode standard. Strings in Go are UTF-8 encoded byte sequences. UTF-8 is a variable-width encoding. It packs code points into 1 to 4 bytes. A rune holds the decoded integer value, independent of the byte representation.

Think of bytes as individual tiles in a mosaic. A rune is the picture formed by a group of tiles. The mosaic stores tiles. The rune describes the picture. When you iterate a string, Go decodes the tiles into pictures automatically.

Convention aside: gofmt is mandatory. When you write loops or type conversions, gofmt standardizes the spacing. Don't argue about indentation; let the tool decide. Most editors run it on save.

Minimal example: iteration decodes UTF-8

Here's the simplest way to see runes in action: iterate a string and watch Go decode UTF-8 for you.

package main

import "fmt"

func main() {
	s := "Hi 世"
	// range over string decodes UTF-8 automatically
	// i is the byte index, r is the rune value
	for i, r := range s {
		// print byte offset, character, and code point
		fmt.Printf("Byte %d: %c (U+%04X)\n", i, r, r)
	}
}

# output:
Byte 0: H (U+0048)
Byte 1: i (U+0069)
Byte 2:   (U+0020)
Byte 3: 世 (U+4E16)

The loop advances the byte index by the length of each character. H is one byte. 世 is three bytes. The index jumps from 2 to 3, then the loop would jump to 6 if there were more text. The rune value U+4E16 is the code point. The byte index tells you where the character starts in the string.

Runes are integers. Treat them like math when you need to, like characters when you don't.

Walkthrough: what happens at runtime

The for range loop on a string is special. The compiler generates code that calls the UTF-8 decoder. It reads the first byte. If the high bit is zero, the character is ASCII. The decoder returns the byte as a rune and advances the index by one. If the high bit is set, the decoder reads more bytes to find the full code point. It checks the continuation bits. It assembles the integer value. It advances the index by the total byte length.

This decoding happens at runtime. There is no compile-time magic. The string remains a byte slice in memory. The loop produces runes on the fly. This keeps memory usage low. You don't allocate a new slice of runes unless you explicitly convert.

The compiler enforces type safety. If you try to use a byte where a rune is expected, you get an error. The compiler rejects s[0] as a rune with cannot use s[0] (untyped byte) as rune value in argument if you pass a byte index to a function expecting a rune. You must convert explicitly. rune(s[0]) works, but only for ASCII. For Unicode, iteration is the safe path.

Trust the range loop. It handles the decoding. You handle the logic.

Realistic example: validating input with unicode

Here's a realistic helper that checks if a string contains only letters and digits, using the unicode package for correct Unicode handling.

package main

import (
	"fmt"
	"unicode"
)

// isAlphanumeric checks if a string contains only letters and digits
// using the unicode package for correct Unicode handling
func isAlphanumeric(s string) bool {
	// range decodes each rune from the UTF-8 string
	for _, r := range s {
		// unicode.IsLetter and IsDigit handle all scripts, not just ASCII
		if !unicode.IsLetter(r) && !unicode.IsDigit(r) {
			return false
		}
	}
	return true
}

func main() {
	fmt.Println(isAlphanumeric("Go121"))     // true
	fmt.Println(isAlphanumeric("Go 121"))    // false, space fails
	fmt.Println(isAlphanumeric("世"))        // true, CJK is a letter
	fmt.Println(isAlphanumeric("Hello!"))    // false, punctuation fails
}

The unicode package provides functions like IsLetter, IsDigit, and IsSpace that work on runes. These functions consult the Unicode database. They recognize letters from every script, not just Latin. This is crucial for internationalization. If you check r >= 'a' && r <= 'z', you miss everything outside ASCII.

Convention aside: receiver names are usually one or two letters matching the type. If you wrap this logic in a method, name the receiver s for a string type, not self or this. Go style favors brevity and consistency.

Slicing a string by byte index is a bug waiting to happen on non-ASCII text.

Pitfalls: slicing, conversion, and grapheme clusters

Slicing a string by byte index is the most common mistake. s[0:1] extracts one byte. If the string starts with a multi-byte character, you get a fragment. Printing a fragment produces garbage or a replacement character. To slice by character, convert to []rune first.

package main

import "fmt"

func main() {
	s := "Hello 世界"
	// byte slice breaks the first Chinese character
	// this extracts only the first byte of the three-byte sequence
	bad := s[6:7]
	fmt.Printf("Byte slice: %q\n", bad)

	// convert to []rune for character-level slicing
	runes := []rune(s)
	// extract the last character safely
	good := string(runes[len(runes)-1:])
	fmt.Printf("Rune slice: %q\n", good)
}

# output:
Byte slice: "\xe4"
Rune slice: "界"

The byte slice "\xe4" is the first byte of 界. It's invalid UTF-8 on its own. The rune slice works correctly. Converting to []rune allocates memory and decodes the entire string. It's O(N) in time and space. Use it when you need random access to characters. Avoid it in tight loops over large strings.

Another pitfall is the grapheme cluster. A rune is a code point, not a visual character. Some visual characters are sequences of multiple code points. Combining accents, flags, and family emojis use zero-width joiners to combine runes. len([]rune(s)) counts code points, not visual characters.

package main

import "fmt"

func main() {
	// This flag is a sequence of regional indicator symbols
	// visually it looks like one character, but it is multiple runes
	flag := "🇬🇧"

	// converting to []rune splits on code points, not visual characters
	runes := []rune(flag)
	fmt.Printf("Visual length: 1, Rune count: %d\n", len(runes))

	// slicing runes by index breaks the flag into pieces
	// this produces partial output that may not render correctly
	fmt.Printf("First rune only: %s\n", string(runes[0]))
}

# output:
Visual length: 1, Rune count: 2
First rune only: 🇬

The flag is two regional indicator runes. Slicing by rune index breaks the flag. If you need visual character length or slicing, use the golang.org/x/text/unicode/norm package or a grapheme cluster library. Runes are the building blocks. Grapheme clusters are the visual units. Know the difference.

A rune is a code point, not a visual character. Grapheme clusters break the illusion.

Decision: when to use rune vs alternatives

Use string when you need to store text, pass data over the network, or interact with the file system. Strings are immutable byte slices. They are compact and efficient for UTF-8 data. Most Go APIs accept strings.

Use rune when you need to manipulate individual characters, check character properties, or perform arithmetic on code points. Runes are integers. They work with the unicode package. They represent logical characters in Unicode.

Use []byte when you are processing raw binary data, parsing protocols, or optimizing memory for ASCII-only text. Byte slices give you direct access to the underlying data. They avoid decoding overhead. They are faster for byte-level operations.

Use []rune when you need random access to characters by logical index, like slicing the third character from the end. Rune slices allow indexing and slicing by character. They allocate memory proportional to the number of code points. They are useful for text manipulation algorithms.

Use utf8.DecodeRuneInString when you need explicit control over decoding or want to handle invalid UTF-8 sequences gracefully. The range loop skips invalid sequences with a replacement rune. The decoder function returns the rune, its length, and an error. It lets you validate input or recover from corruption.

Context is plumbing. Runes are data. Keep them separate in your design.

Where to go next

A rune is Go's way of storing a single character from any language in the world. It ensures your program handles text correctly without worrying about complex encoding rules. Think of it as a universal ID number for every letter, symbol, or emoji.