Parsing text with patterns
You have a log file full of noise. Somewhere in the mess are transaction IDs that look like TXN-88234. You need to pull them out. You could write a loop checking character by character, but that gets tedious fast. Regular expressions exist to describe patterns so you don't have to write the parser yourself. Go's regexp package gives you that power, but it behaves differently than the regex engines you might know from Python or JavaScript.
The stencil analogy
A regular expression is a compact language for describing text patterns. Think of it like a stencil. You cut a shape out of cardboard, hold it up to a wall, and wherever the light shines through, you paint. The regex is the stencil. The string is the wall. The match is the paint.
Go compiles your stencil into a fast state machine. Once compiled, matching is very efficient. The cost is in the compilation step. You write the pattern once, pay the compilation price, and then reuse the compiled regex thousands of times. This design makes Go's regex safe and predictable. The engine guarantees linear time execution. Your regex will never hang on a pathological input, even if the pattern is complex.
Compile once, match many
Here's the basic workflow: compile a pattern, then use it to test or extract.
package main
import (
"fmt"
"regexp"
)
func main() {
// Compile once. MustCompile panics if the pattern is invalid,
// which is safe for hardcoded patterns known to be correct.
re := regexp.MustCompile(`\d+`)
// MatchString returns true if the pattern exists anywhere in the text.
if re.MatchString("order #12345") {
fmt.Println("Found digits")
}
// FindAllString extracts every non-overlapping match.
// The -1 argument means "no limit on the number of results."
ids := re.FindAllString("orders: 12, 34, 56", -1)
fmt.Println(ids) // [12 34 56]
}
The regexp package separates compilation from matching. regexp.MustCompile parses your pattern string and builds an internal state machine. If the pattern has a syntax error, MustCompile panics immediately. This is the right choice for patterns hardcoded in your source code. You want the program to crash at startup if the pattern is broken, not silently fail later.
If you pass a bad pattern to MustCompile, the program crashes with a panic message like panic: regexp: Compile([invalid): error parsing regexp: missing closing ): ....
When the pattern comes from user input, a configuration file, or a database, use regexp.Compile instead. It returns a *Regexp and an error. Check the error. If it's nil, you have a valid regex. This keeps your program safe when patterns aren't under your direct control. The idiomatic error check is verbose by design. Write if err != nil { return err }. The boilerplate makes the unhappy path visible.
How the engine works
Go uses the RE2 engine. RE2 prioritizes safety and speed over features. It guarantees that matching takes time proportional to the length of the input. This prevents catastrophic backtracking, a bug where certain regex patterns cause the engine to run for hours on specific inputs.
The trade-off is missing features. Go does not support backreferences or lookarounds. If you try to use (?<=...) or \1, the compiler rejects the pattern. The compiler complains with error parsing regexp: invalid or unsupported Perl syntax: (?<= if you paste a Python regex with lookbehinds into Go. You have to rewrite the pattern using Go's supported syntax.
The *Regexp type is thread-safe. You can compile a regex once and share it across hundreds of goroutines. The internal state machine handles concurrent matches efficiently. This makes package-level variables the perfect home for regex patterns. Store the compiled regex at the top of your file. Every function in the package can use it without locking.
package main
import (
"fmt"
"regexp"
)
// emailRe is compiled once when the package loads.
// It is safe for concurrent use by multiple goroutines.
var emailRe = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
func ValidateEmail(email string) bool {
// MatchString is safe to call from many goroutines simultaneously.
return emailRe.MatchString(email)
}
func main() {
fmt.Println(ValidateEmail("user@example.com")) // true
fmt.Println(ValidateEmail("bad-email")) // false
}
Extracting data with capture groups
Real code often needs to extract specific parts of a match. Capture groups let you isolate data inside a larger pattern. Parentheses in the regex define groups. FindStringSubmatch returns the full match plus all captured groups.
package main
import (
"fmt"
"regexp"
)
// ExtractTransactionID finds a transaction ID in a log message.
// It returns the numeric part of a TXN-XXXX pattern.
func ExtractTransactionID(logLine string) string {
// Parentheses define capture group 1.
// \b prevents matching inside longer strings like "CTXN-123".
re := regexp.MustCompile(`\bTXN-(\d+)\b`)
// FindStringSubmatch returns nil if no match is found.
// matches[1] holds the content of the first capture group.
matches := re.FindStringSubmatch(logLine)
if matches == nil {
return ""
}
return matches[1]
}
func main() {
line := "Payment TXN-998877 completed"
fmt.Println(ExtractTransactionID(line)) // 998877
}
Magic numbers in matches[1] are fragile. If you add a group, indices shift. Named groups fix this. Use (?P<name>...) to name a group. Access it via SubexpIndex.
package main
import (
"fmt"
"regexp"
)
func main() {
// Named groups improve readability.
// (?P<ip>...) names the first group "ip".
re := regexp.MustCompile(`^(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<date>[^\]]+)\]`)
line := "192.168.1.55 - - [10/Oct/2023:13:55:36]"
matches := re.FindStringSubmatch(line)
if matches == nil {
return
}
// SubexpIndex returns the index for the named group.
// It returns 0 if the name is not found, which is a common bug source.
ipIdx := re.SubexpIndex("ip")
dateIdx := re.SubexpIndex("date")
fmt.Println("IP:", matches[ipIdx])
fmt.Println("Date:", matches[dateIdx])
}
Named groups make the code self-documenting. SubexpIndex("ip") is clearer than matches[1]. Check the index value. If SubexpIndex returns 0, the name doesn't exist. Index 0 is also the full match, so a typo in the name can silently return the wrong data.
Transforming text dynamically
Sometimes you need to transform matches based on their content. ReplaceAllStringFunc calls a function for every match. The function receives the matched string and returns the replacement.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
// Pattern matches words starting with a capital letter.
re := regexp.MustCompile(`\b[A-Z][a-z]+\b`)
// ReplaceAllStringFunc calls the function for every match.
// The function receives the matched string and returns the replacement.
text := "Go is fun. Python is fun. Rust is fun."
result := re.ReplaceAllStringFunc(text, strings.ToLower)
fmt.Println(result) // go is fun. python is fun. rust is fun.
}
This is useful for redacting sensitive data, normalizing formats, or building simple templating systems. The function runs in the context of the match. You can inspect the match and decide what to return.
Pitfalls and strict rules
Go's regex syntax is a subset of Perl-compatible regex. You lose backreferences, lookarounds, and some character classes. You gain linear time guarantees and thread safety. Adapt your patterns to the engine.
Use Match when you have a byte slice. Use MatchString when you have a string. Using the wrong one forces a conversion. Match works on []byte. MatchString works on string. If you're processing raw network data, stick to bytes. If you're handling user input as strings, use the string methods.
The compiler rejects patterns with unsupported syntax at compile time for MustCompile. For Compile, you get an error. Handle it. Don't ignore the error from Compile. An invalid pattern means your logic is broken.
Regex is slower than direct string operations for exact matches. If you're checking for a fixed substring, use strings.Contains. If you're checking a prefix, use strings.HasPrefix. Regex adds overhead. Use it only when you need pattern matching power.
RE2 is fast but strict. No backreferences. No lookarounds. Write the pattern the Go way.
Choosing the right tool
Use regexp.MustCompile when the pattern is hardcoded and known to be valid at compile time. Use regexp.Compile when the pattern comes from user input, configuration, or a database. Use MatchString when you only need to know if a pattern exists in the text. Use FindStringSubmatch when you need to extract specific parts of the match using capture groups. Use ReplaceAllStringFunc when you want to modify the text based on dynamic logic for each match. Use a simple string method like strings.Contains or strings.HasPrefix when you don't need regex power. Regex is slower than direct string operations for exact matches.
Regex is a hammer. Don't use it to drive a nail when a screwdriver works.