How to Extract Substrings in Go

The slice that isn't a slice

You have a log line, a CSV row, or a URL path. You need the piece between two markers, or just the last few characters. In Python you slice with text[2:5]. In JavaScript you call substring() or slice(). Go gives you square brackets too, but the underlying mechanics are different. Go strings are not arrays of characters. They are immutable byte arrays. That distinction changes how you extract substrings, how you handle emojis, and how your program uses memory.

Bytes, runes, and the string header

Think of a Go string like a roll of film. Each frame is a byte. When you slice the roll, you do not cut out a new piece of plastic. You just mark where the new segment starts and where it ends. The camera still holds the original roll. This design makes slicing incredibly fast. It also means you cannot change the bytes inside a string. If you need to mutate text, you convert it to a slice of bytes or runes first.

Under the hood, a Go string is a small struct with two fields: a pointer to a byte array and a length. When you pass a string to a function, Go copies that struct. The copy is cheap. The backing array stays in place. Multiple string variables can point to the exact same memory region. Immutability is the safety net that makes this possible. If Go allowed you to modify a substring, you would accidentally modify every other variable sharing that array. The language prevents that by design.

The tricky part is that Go stores text in UTF-8. ASCII characters take one byte. Letters with accents take two. Emojis and many Asian characters take three or four. If you slice by byte index, you might cut a multi-byte character in half. The result is a broken string that prints as a replacement character or crashes your program. To slice by actual characters, you need to work with runes. A rune in Go is just an alias for int32, representing a single Unicode code point.

Byte slicing is fast. Rune slicing is safe. Pick the one that matches your data.

The minimal slice

Here is the simplest byte slice. It works perfectly for ASCII or when you know the exact byte boundaries.

package main

import "fmt"

func main() {
    // ASCII text: each character occupies exactly one byte
    s := "Hello, World!"

    // Extract "World" starting at byte 7, ending before byte 12
    sub := s[7:12]
    fmt.Println(sub)

    // Omit the end index to slice all the way to the last byte
    tail := s[7:]
    fmt.Println(tail)
}

The compiler translates s[7:12] into a low-level operation that creates a new string header. The slice operation updates the pointer to point seven bytes forward and sets the length to five. No memory is copied. The new string and the original string share the exact same backing array. This is why Go strings are immutable. If Go allowed you to modify sub, you would accidentally modify s too. Immutability keeps the shared memory safe.

When you omit the end index with s[7:], the compiler calculates the length by subtracting the start index from the total string length. The syntax is consistent with Go slices, but remember that strings are value types. Passing a sliced string to a function copies the header, not the bytes. The header copy is cheap. The shared backing array stays in place.

Slice boundaries are inclusive on the left and exclusive on the right. That rule applies to every slice in the language, including strings. Trust the syntax. Do not fight the bounds.

When the text speaks Unicode

Byte slicing breaks the moment your text contains multi-byte characters. The index 6 in "Hello 🚀 World" points to the first byte of the rocket emoji, not the character itself. Slicing at that position leaves the remaining three bytes of the emoji stranded. The runtime will happily give you a malformed string, and downstream code will choke on invalid UTF-8.

The fix is to convert the string to a slice of runes. This allocates a new array where each element is a full Unicode code point. You slice the rune slice, then convert it back to a string.

package main

import "fmt"

func main() {
    // Contains a multi-byte emoji: 🚀 takes four bytes in UTF-8
    s := "Hello 🚀 World"

    // Convert to runes so each index maps to one visible character
    r := []rune(s)

    // Slice the rune array, then cast back to a string
    emoji := string(r[6:7])
    fmt.Println(emoji)
}

Go developers rarely write []rune(s) manually in production code. The standard library provides utf8.RuneCountInString and strings.ToValidUTF8 for validation, but for simple extraction, the rune conversion is the idiomatic fallback. The community convention is to keep strings as strings for as long as possible. Only convert to runes when you actually need character-level indexing. Runes allocate memory and double or triple the footprint for heavy Unicode text.

Keep strings as bytes until you absolutely need character semantics.

Finding boundaries in the wild

Real code rarely knows exact indices. You usually search for delimiters, then slice around them. The strings package handles the searching. You still do the slicing.

package main

import (
    "fmt"
    "strings"
)

func main() {
    // Extract the domain from an email address
    email := "user@example.com"

    // Find the byte index of the delimiter
    atIdx := strings.Index(email, "@")
    dotIdx := strings.Index(email, ".")

    // Guard against missing delimiters before slicing
    if atIdx != -1 && dotIdx != -1 && dotIdx > atIdx {
        domain := email[atIdx+1 : dotIdx]
        fmt.Println(domain)
    }
}

strings.Index returns the byte position of the first occurrence. If the substring is missing, it returns -1. The guard clause prevents out-of-bounds panics. Notice the slice uses atIdx+1. The @ symbol is at index 5. Adding one skips the delimiter and starts the extraction at e. The end index dotIdx stops right before the .. The result is example. This pattern appears constantly in parsing URLs, CSV fields, and log formats.

Go 1.18 introduced strings.Cut, which simplifies this exact pattern. It splits a string at the first occurrence of a separator and returns the part before, the part after, and a boolean indicating whether the separator was found. It avoids manual index math and reduces the chance of off-by-one errors.

package main

import (
    "fmt"
    "strings"
)

func main() {
    // Extract the domain using the modern Cut function
    email := "user@example.com"

    // Cut returns before, after, and a found boolean
    _, afterAt, _ := strings.Cut(email, "@")
    domain, _, _ := strings.Cut(afterAt, ".")

    // Print the extracted domain
    fmt.Println(domain)
}

The strings package also offers strings.Split and strings.SplitN, which return slices of strings instead of requiring manual index math. Use Split when you want all pieces. Use manual indexing or Cut when you only need one specific segment and want to avoid allocating a slice of strings.

Let the standard library handle the searching. You handle the extraction.

Pitfalls and the panic you actually get

Slicing is fast until it isn't. The most common crash comes from an out-of-bounds index. If you calculate an end index that exceeds the string length, or if you use a negative index, the runtime stops the program immediately. The panic message reads runtime error: slice bounds out of range [?:?]. The runtime prints the exact indices that caused the fault. Always validate indices against len(s) before slicing.

Memory retention is the silent pitfall. Remember that a slice shares the backing array with the original string. If you read a 10-megabyte file into memory, extract a 50-byte token, and discard the original variable, the 10-megabyte array stays in memory. The garbage collector cannot free it because the small substring still points to it. This is called memory pinning. To break the reference, copy the substring explicitly. The strings.Clone function allocates a fresh backing array for the substring. Use it when you keep a small piece of text but want to release a large source string.

Another trap is assuming len(s) returns the number of characters. It returns the number of bytes. For "café", len returns 5, not 4. The accented é takes two bytes. If you write a loop that iterates from 0 to len(s) and slices one character at a time, you will corrupt multi-byte sequences. Use utf8.RuneCountInString(s) when you need the character count.

Never pass a *string to a function. Strings are already cheap to pass by value. The pointer adds indirection without saving memory. Follow the convention: accept interfaces, return structs, and pass strings by value.

Validate bounds before you slice. Clone when you need to break the memory reference.

Pick the right tool

Use byte slicing with s[start:end] when you work with ASCII, fixed-width formats, or known byte boundaries. Use []rune(s) conversion when you need character-level indexing on Unicode text and performance is not the bottleneck. Use strings.Index or strings.Split when the boundaries are defined by delimiters rather than fixed positions. Use strings.Cut when you need a single split and want to avoid manual index math. Use strings.Clone when you extract a small substring from a large source and need to free the original memory. Use plain sequential code when you don't need concurrency: the simplest thing that works is usually the right thing.

Where to go next

Extracting a substring means grabbing a specific part of a larger piece of text. You do this by telling the computer exactly where to start and where to stop counting characters. It is like cutting a specific section out of a long strip of paper.