How to Use Capture Groups in Go Regex

The problem with flat matches

You are parsing a configuration file. Each line looks like key=value. You write a regex to find lines that match, but the match returns the entire string. You need the key and the value separately. Or you are scraping a page and want to pull out just the price and the product name from a messy HTML block. A flat match tells you something exists. It does not tell you what the pieces are.

Regular expressions solve this with capture groups. You wrap the parts you care about in parentheses. The engine remembers exactly where those parts start and end inside the larger match. You get back a structured list of substrings instead of one big blob.

What capture groups actually do

Think of a regex match like a security camera recording a hallway. The camera captures everything. A capture group is the editor who crops the footage to show only the person walking and the timestamp on the wall. The raw video still exists, but you only extract the frames you actually need.

In Go, the regexp package handles this through the Submatch family of methods. When you define a group with (...), the engine tracks its boundaries during the scan. After the scan finishes, it hands you a slice of strings. Index zero is always the full match. Index one is the first group. Index two is the second group. If a group did not participate in the match, you get an empty string, not a null value.

Go deliberately avoids backtracking engines. The standard library uses RE2, which guarantees linear time execution. Capture groups do not change that guarantee. They simply tell the finite state machine to record extra boundary markers as it walks the input. The engine does not branch or retry. It records offsets, slices the string, and returns.

Parentheses carve the match. The rest is just plumbing.

The minimal pattern

Here is the simplest way to extract pieces from a string. The pattern uses three groups to split an email address into username, domain, and top-level domain.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// Compile once. MustCompile panics on invalid syntax, which is fine for static patterns.
	re := regexp.MustCompile(`(\w+)@(\w+)\.(\w+)`)

	// FindStringSubmatch returns a slice of strings.
	match := re.FindStringSubmatch("user@example.com")

	// Check if the pattern matched anything at all.
	if match != nil {
		// Index 0 is the full match. Indices 1, 2, 3 are the capture groups.
		fmt.Println("Full:", match[0])
		fmt.Println("User:", match[1])
		fmt.Println("Domain:", match[2])
		fmt.Println("TLD:", match[3])
	}
}

The output prints the full email, then user, example, and com. The slice length is always number of groups + 1. If the input string does not match the pattern, FindStringSubmatch returns nil. You must check for nil before indexing, or the runtime will panic with an index out of range error.

The engine tracks boundaries, not copies. Trust the slice indices.

How the engine tracks boundaries

Go's regexp package compiles patterns into a deterministic finite automaton. When you call FindStringSubmatch, the automaton walks the input string exactly once. It maintains an internal array of start and end positions for every group. When the automaton enters a (, it writes the current byte offset to the start slot. When it hits the matching ), it writes the current offset to the end slot. If the group is optional and never matches, both slots remain -1.

After the walk completes, the engine allocates new strings by slicing the original input at those recorded offsets. This allocation is cheap for short strings, but it becomes measurable overhead in tight loops. The regexp package provides FindStringSubmatchIndex to skip the allocation entirely. It returns a slice of integers representing byte offsets instead of copied strings.

// FindStringSubmatchIndex returns byte offsets instead of allocated strings.
offsets := re.FindStringSubmatchIndex("user@example.com")

// The slice has 2 * (groups + 1) elements. Each pair is [start, end].
if offsets != nil {
	// Extract the second group (domain) using the original string.
	domainStart := offsets[4]
	domainEnd := offsets[5]
	fmt.Println("Domain:", "user@example.com"[domainStart:domainEnd])
}

The offset slice pairs are laid out sequentially. Pair zero is the full match. Pair one is group one. Pair two is group two. If a group did not match, both offsets in the pair are set to -1. This lets you check for missing groups without relying on empty string comparisons. You can also pass these offsets directly to bytes or strings functions that accept slice bounds, keeping your hot path allocation-free.

Go's error handling convention applies here too. regexp.Compile returns an error instead of panicking. Use Compile when the pattern comes from user input or a configuration file. Use MustCompile only for patterns hardcoded in your source. The community accepts the if err != nil { return err } boilerplate because it makes failure paths visible. If you forget to handle the error, the compiler rejects the program with err declared and not used.

Real-world extraction with named groups

Hardcoding indices works for simple patterns. It falls apart when the pattern grows. Adding a fourth group forces you to update every match[4] reference across your codebase. Go supports named capture groups using the (?P<name>...) syntax. The engine stores the names alongside the indices, and you can look them up by string.

// Named groups make the pattern self-documenting and immune to reordering.
re := regexp.MustCompile(`(?P<protocol>https?)://(?P<host>[^/:]+)(?::(?P<port>\d+))?`)

match := re.FindStringSubmatch("https://api.example.com:8080")
if match != nil {
	// SubexpIndex returns the byte offsets for a named group.
	portOffsets := re.SubexpIndex("port")
	if portOffsets[0] != -1 {
		fmt.Println("Port:", match[portOffsets[0]:portOffsets[1]])
	}
}

Named groups do not change how the engine runs. They just add a lookup table to the compiled regex. You still get back a slice of strings or offsets. The SubexpIndex method translates the name to the correct slice index. If the name does not exist, it returns nil. If the group did not participate in the match, the offsets are -1.

When you only need one or two groups from a complex pattern, you can discard the rest using the underscore. result, _ := re.FindStringSubmatch(input) tells the compiler you intentionally ignored the second return value. Use this sparingly with errors, but it is perfectly idiomatic for regex slices when you only care about specific captures.

Where things break

Regex is powerful, but it has sharp edges. The most common mistake is assuming every group will always match. Optional groups, alternations, and nested patterns frequently leave gaps. If you write match[2] and group two was optional, you get an empty string. If you expected a number, parsing that empty string will fail later. Always check the return value or use SubexpIndex to verify the group participated.

Another trap is greedy matching. By default, * and + consume as much text as possible. If your pattern is (.*)=(.*) and your input is a=1=b=2, the first group grabs a=1=b and the second group grabs 2. You need non-greedy quantifiers like .*? to stop at the first =. The engine does not guess your intent. It follows the quantifier rules exactly.

Invalid patterns cause immediate failures. If you pass a malformed string to regexp.MustCompile, the program stops with a panic that looks like panic: regexp: Compile((?P<bad): error parsing regexp: missing closing ): .... This is intentional. A broken regex is a logic error, not a runtime condition. Catch it during development. If you use regexp.Compile, you get a standard error type at runtime. The compiler will also reject unused imports with imported and not used if you bring in regexp but never call it.

Panics on bad patterns are a feature, not a bug. Validate your regex early.

When to reach for capture groups

Use FindStringSubmatch when you need the actual substrings and the input size is small enough that string allocation does not matter. Use FindStringSubmatchIndex when you are processing large files or high-throughput streams and want to avoid garbage collection pressure. Use named groups when the pattern has more than three groups or when the pattern will be maintained by multiple developers. Use plain string splitting or strings.Index when the delimiter is fixed and predictable: regex adds compilation overhead and cognitive load that simple string methods do not. Use regexp.Compile over MustCompile when the pattern originates from external configuration or user input.

Match what you need. Leave the rest on the floor.

Where to go next

Capture groups let you isolate specific parts of a text match, like pulling just the username from an email address. Think of them as labeled buckets that catch specific pieces of information as the regex scans through the string. You use them whenever you need to extract data rather than just checking if a pattern exists.