How to Use Named Capture Groups in Go

When indices break your parser

You are parsing a log file. The format is 2023-10-25 14:30:00 ERROR disk full. You write a regex and grab the date from index 1 and the level from index 3. Six months later, the ops team changes the log format to ERROR 2023-10-25 14:30:00 disk full. Your code breaks because the error level is now index 1 and the date is index 2. You fix it by hardcoding new indices. Two weeks later, they add a hostname field. You are back to counting groups manually.

Named capture groups solve this by letting you refer to parts of the match by name, not by position. When the format changes, you update the regex names and your code keeps working as long as the names stay the same. The indices in the match slice shift, but your code never touches them.

How named groups work

Go's regexp package supports named capture groups using the syntax (?P<name>pattern). The P stands for Perl-style, a nod to the regex heritage. Go uses the RE2 engine, not PCRE, but it accepts this syntax for familiarity. Inside the pattern, you wrap a sub-expression with (?P<name>...). The name must be unique within the pattern.

When the regex compiles, the engine builds a map of names to indices. At runtime, you retrieve the captured text by looking up the index for the name and accessing the match slice. The name is metadata attached to a specific index. You can mix named and unnamed groups in the same pattern. The indices are assigned left-to-right based on the opening parenthesis, regardless of whether the group has a name.

Names are just labels for indices. The match slice is always an array of strings.

Minimal example

Here is the simplest way to extract a date using names instead of indices.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// (?P<name>...) names the group. The name must be unique within the pattern.
	re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)

	// FindStringSubmatch returns a slice where index 0 is the full match.
	// Subsequent indices correspond to capture groups in order.
	match := re.FindStringSubmatch("2023-10-25")

	if match != nil {
		// SubexpIndex returns the index for the named group.
		// This call happens at runtime. It returns -1 if the name is not found.
		yearIdx := re.SubexpIndex("year")
		monthIdx := re.SubexpIndex("month")
		dayIdx := re.SubexpIndex("day")

		fmt.Println(match[yearIdx]) // "2023"
		fmt.Println(match[monthIdx]) // "10"
		fmt.Println(match[dayIdx]) // "25"
	}
}

The map lives in the compiled regex. The slice holds the data.

Walkthrough

When you call regexp.MustCompile, the engine parses the pattern and assigns indices to every capture group. It also records the names. The resulting *regexp.Regexp holds the compiled machine code, the index map, and the name list.

FindStringSubmatch runs the match and returns a []string. Index 0 is always the full match. Index 1 is the first capture group, index 2 is the second, and so on. Named groups appear in the slice in the order they are defined. If you have three named groups, they occupy indices 1, 2, and 3.

SubexpIndex looks up the name in the internal map and returns the integer index. If the name does not exist, it returns -1. SubexpNames returns a slice of strings aligned with the match slice. The first element is always an empty string because the full match at index 0 has no name. You can iterate over SubexpNames to pair names with values dynamically.

// SubexpNames returns names aligned with match indices.
// Index 0 is empty because the full match is unnamed.
names := re.SubexpNames()
// names[0] == ""
// names[1] == "year"
// names[2] == "month"
// names[3] == "day"

This alignment lets you build maps or structs without hardcoding indices. You zip the names and matches together, skipping index 0.

Names are metadata. Indices are the reality.

Realistic example

Here is a URL parser that extracts the host and path using named groups.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// Named groups allow referencing parts by name.
	// The pattern captures host and path separately.
	re := regexp.MustCompile(`https?://(?P<host>[^/]+)(?P<path>/[^?]*)?`)

	match := re.FindStringSubmatch("https://gofaq.org/en/regex/")
	if match == nil {
		return
	}

	// SubexpIndex returns the array index for the named group.
	// Use the index to access the match slice safely.
	hostIdx := re.SubexpIndex("host")
	pathIdx := re.SubexpIndex("path")

	fmt.Println(match[hostIdx]) // "gofaq.org"
	fmt.Println(match[pathIdx]) // "/en/regex/"
}

In production code, you cache the indices outside the loop. Calling SubexpIndex inside a tight loop adds map lookup overhead. Compute the indices once during initialization and reuse them.

Cache the index. Panic on the typo.

Replacing with named groups

Named groups shine when transforming text. You can reference names directly in the replacement string using ${name} syntax. This avoids manual reconstruction of the output.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// The pattern names the first and last parts of a name.
	re := regexp.MustCompile(`(?P<first>\w+) (?P<last>\w+)`)

	// ReplaceAllString supports ${name} syntax in the replacement.
	// This swaps the order of the captured groups.
	result := re.ReplaceAllString("Alice Bob", "${last}, ${first}")

	fmt.Println(result) // "Bob, Alice"
}

You can also use $name without braces if the name is followed by a non-alphanumeric character. Braces are safer and more readable. The replacement string is a template. The engine substitutes the captured text for each named reference.

Replacement strings are templates. Names make them readable.

Pitfalls

The compiler cannot verify group names. If you pass a wrong name to SubexpIndex, the function returns -1. Accessing the match slice with -1 causes a runtime panic. The error message is runtime error: panic: index out of range [-1]. This happens at runtime, not compile time. Always check the index or cache it during initialization. If you cache the index and it is -1, you can fail fast with a clear error message.

regexp.MustCompile panics if the pattern is invalid. The panic message starts with panic: regexp: Compile(...): error parsing regexp: .... Use MustCompile only for static patterns known to be correct. If the pattern comes from user input or a config file, use regexp.Compile and handle the error. The community convention is MustCompile for literals and Compile for dynamic strings.

Duplicate names cause a compile panic. The engine rejects patterns with repeated group names. The panic message indicates the duplicate name. Names must be unique within the pattern.

Performance matters in hot paths. SubexpIndex performs a map lookup. It is fast, but calling it millions of times adds up. Cache the indices. Also, FindStringSubmatch allocates a new slice for every match. If you only need indices, use FindStringSubmatchIndex to avoid allocation. It returns a []int of start and end positions. You can extract substrings from the original string using these positions.

A typo in a name is a runtime bomb.

Decision matrix

Use named capture groups when the regex has many groups and the order might change. Use named capture groups when you need to map results to a struct or map dynamically. Use named capture groups when the replacement string references specific parts of the match. Use unnamed groups with indices when the pattern is simple and the order is stable. Use strings.Split or strings.Index when you are parsing fixed delimiters and regex is overkill. Use a dedicated parser library when the format is complex, like JSON or CSV.

Regex is a parser of last resort.

Where to go next

Named capture groups let you give a specific label to a part of a regular expression pattern so you can retrieve that matched text by name instead of remembering its position number. This makes your code easier to read and maintain because you refer to data like 'year' or 'month' directly rather than counting which slice index holds it. Think of it like labeling boxes in a moving truck so you know exactly where your kitchen items are without having to count every box.