How to use regexp package

Compile patterns with regexp.MustCompile and use MatchString to check if text matches your criteria.

Parsing strings with patterns

You are processing a batch of log files. Each line contains a timestamp, a severity level, and a message. The format looks consistent, but splitting by spaces fails the moment a message contains a space. You need a way to describe the structure of the line so the code can pull out exactly what you want.

The regexp package provides regular expressions. A regular expression is a pattern that describes a set of strings. You write the pattern once, compile it into a machine-readable form, and then use that compiled object to match, search, or replace text.

The stencil analogy

Think of a regex as a stencil. You cut the shape out of cardboard once. That cutting process takes effort. Once you have the stencil, you can stamp it over a stack of paper instantly. The stencil is the compiled regex. The paper is your text.

If you cut a new stencil for every single sheet, you waste time. The key to performance is compiling the pattern once and reusing the stencil. Go makes this explicit: you call Compile or MustCompile to cut the stencil, then call methods like MatchString or FindStringSubmatch to stamp it.

Minimal example

Here is the simplest way to check if a string matches a pattern.

package main

import (
	"fmt"
	"regexp"
)

// MustCompile panics if the pattern is invalid, which is safe for static patterns
// defined in your source code. It ensures you catch syntax errors at startup.
var phoneRe = regexp.MustCompile(`\d{3}-\d{3}-\d{4}`)

func main() {
	// MatchString returns true if the pattern matches anywhere in the text
	if phoneRe.MatchString("Call 555-123-4567 now") {
		fmt.Println("Found a phone number")
	}

	// MatchString returns false if there is no match
	if !phoneRe.MatchString("No numbers here") {
		fmt.Println("No phone number found")
	}
}

The pattern \d{3}-\d{3}-\d{4} matches three digits, a dash, three digits, a dash, and four digits. MatchString scans the input and returns a boolean. It does not return the matched text.

Compile the pattern once. Stamp many times.

How the engine works

The regexp package uses the RE2 engine. RE2 builds a finite state machine from your pattern. This machine is fast and guarantees linear time complexity. Matching a string takes time proportional to the length of the string, regardless of the pattern.

This design has trade-offs. RE2 does not support backreferences. You cannot write a pattern that matches <tag>content</tag> by capturing <tag> and referring back to it later. If you need backreferences, you need a different tool. The trade-off is worth it: RE2 is immune to catastrophic backtracking. A malicious user cannot craft a string that hangs your program with a complex regex.

The compiled *regexp.Regexp value is safe for concurrent use. Multiple goroutines can call MatchString, FindString, or any other method on the same compiled regex simultaneously. The package handles internal locking. You can define a regex at the package level and share it across your entire application.

// Package-level variables are the standard convention for static regex patterns.
// The pattern compiles once when the package initializes, not every time a function runs.
var emailRe = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)

// ValidateEmail checks if a string looks like an email address
// The function reuses the compiled regex from the package level
func ValidateEmail(s string) bool {
	return emailRe.MatchString(s)
}

Use MustCompile for patterns hardcoded in your source. Use Compile when the pattern comes from user input or a configuration file. Compile returns an error instead of panicking.

// Compile returns an error if the pattern is invalid
// Use this when the pattern is dynamic or user-supplied
re, err := regexp.Compile(userProvidedPattern)
if err != nil {
	// The compiler rejects invalid patterns with a descriptive error
	// error parsing regexp: missing closing ): ...
	return err
}

RE2 is safe. No backreferences, no hangs.

Extracting and replacing

Matching tells you if a pattern exists. Often you need the data inside the match. FindStringSubmatch returns the full match plus any captured groups.

package main

import (
	"fmt"
	"regexp"
)

// Named groups use the (?P<name>...) syntax
// This makes the code self-documenting and allows named references in replacements
var logPattern = regexp.MustCompile(`^\[(?P<date>\d{4}-\d{2}-\d{2})\] (?P<level>\w+): (?P<message>.+)$`)

func main() {
	line := "[2023-10-27] ERROR: connection refused"

	// FindStringSubmatch returns nil if no match is found
	// Otherwise it returns a slice of strings: full match, then each group
	matches := logPattern.FindStringSubmatch(line)
	if matches == nil {
		fmt.Println("No match")
		return
	}

	// SubexpNames returns the names of the groups
	// Index 0 is always empty, index 1 is the first group, etc.
	names := logPattern.SubexpNames()

	// Iterate over matches and names to extract values by name
	for i, name := range names {
		if name != "" && i < len(matches) {
			fmt.Printf("%s: %s\n", name, matches[i])
		}
	}
}

The output shows date: 2023-10-27, level: ERROR, and message: connection refused. The slice matches contains the full match at index 0, then the groups in order. SubexpNames maps indices to names. This avoids magic numbers like matches[2].

You can also replace text. ReplaceAllString swaps matches with a replacement string. Use $1, $2, or $name to refer to captured groups.

// SwapName reverses "First Last" to "Last, First"
// The pattern captures the first and last name in groups
re := regexp.MustCompile(`(\w+) (\w+)`)

// $1 and $2 refer to the captured groups in the replacement string
result := re.ReplaceAllString("John Doe", "$2, $1")
// result is "Doe, John"

Named groups work in replacements too. $date refers to the group named date. This makes replacement strings readable.

Pitfalls and errors

Dynamic patterns require error handling. If you pass a bad pattern to Compile, you get an error. The error message describes the problem.

// Compile returns an error for invalid syntax
// error parsing regexp: invalid escape sequence: \q
_, err := regexp.Compile(`\q`)

If you use MustCompile with a dynamic pattern, the program panics. MustCompile is only for static patterns where you control the source.

// MustCompile panics on invalid patterns
// panic: regexp: Compile(...): error parsing regexp: missing closing ): ...
regexp.MustCompile(`(`)

Go's regex engine is Unicode-aware by default. \w matches Unicode word characters, not just ASCII letters. \d matches Unicode digits. This is usually what you want. If you need ASCII-only matching, use [a-zA-Z0-9_] instead of \w.

Backreferences are not supported. You cannot match balanced tags or repeated substrings with a single regex. Use a parser or a different algorithm for those cases.

// This pattern fails because backreferences are not supported
// error parsing regexp: invalid or unsupported Perl syntax: \1
regexp.MustCompile(`(<\w+>)\1`)

The worst regex bug is the one that silently matches the wrong thing. Test your patterns with edge cases. Empty strings, very long strings, and strings with special characters.

Package-level variables for static patterns. Check errors for dynamic patterns.

When to use regexp

Regular expressions are powerful, but they are not always the right tool. Picking the simplest tool makes your code easier to read and faster to run.

Use strings.Contains when you need to check for a literal substring without any pattern logic.

Use strings.HasPrefix or strings.HasSuffix when checking the start or end of a string.

Use regexp when you need to match patterns, extract groups, or validate complex structures.

Use encoding/json when parsing structured data like JSON; regex is the wrong tool for structured formats.

Use strconv when converting strings to numbers or booleans.

Use bufio.Scanner with a custom split function when processing large streams of text line by line.

Where to go next