The cost of recompiling
You are writing a log parser. The loop processes thousands of lines per second. Inside the loop, you call regexp.Compile on every iteration to check if a line matches a pattern. The CPU usage spikes. Latency jumps. The profiler shows the regex engine consuming cycles. The problem isn't the regex pattern. The problem is that you are compiling the same pattern over and over.
Regex compilation is the step where the engine translates a human-readable pattern string into a machine-readable state machine. This translation takes work. If you compile inside a hot loop, you pay that cost on every iteration. The fix is to compile once, store the result, and reuse it. The compiled regex is cheap to pass around and safe to share across goroutines.
What compilation actually does
A regex pattern is just a string. The string ^\d{3}-\d{2}-\d{4}$ contains characters that describe a rule, but the computer cannot execute the string directly. The regex engine needs a data structure that represents the logic of the pattern so it can scan text efficiently.
Compilation parses the pattern string and builds that data structure. The result is a *regexp.Regexp value. This pointer holds the compiled state machine. Once you have the *regexp.Regexp, matching text is fast because the engine skips the parsing step and runs the state machine directly against the input.
Think of compilation like translating a recipe into muscle memory. The first time you cook a dish, you read the instructions, understand the order, and plan the actions. That takes mental effort. Once you have cooked it enough times, you can execute the steps without reading the words. regexp.Compile is the learning phase. The *regexp.Regexp is the trained skill. You learn once and cook many times.
Go's regex engine is based on RE2. This engine guarantees linear time execution. The matching speed scales with the length of the input, not the complexity of the pattern. This prevents catastrophic backtracking, a class of bugs where a regex hangs on malicious input. You can trust the engine to stay fast, provided you compile the pattern correctly.
Minimal example: compile and check
Here is the standard way to compile a pattern and handle syntax errors. The function returns a pointer to the compiled regex and an error. If the pattern has invalid syntax, the error is non-nil.
package main
import (
"fmt"
"regexp"
)
func main() {
// Compile returns a pointer to a Regexp and an error.
// The error is non-nil if the pattern has invalid syntax.
re, err := regexp.Compile(`^\d{3}-\d{2}-\d{4}$`)
if err != nil {
// Handle the error. In a real app, you might return or log.
// The error check is verbose by design.
// It makes the failure path visible and forces you to handle bad patterns.
fmt.Println("Bad pattern:", err)
return
}
// Use the compiled regex to match a string.
// MatchString returns true if the input matches the pattern.
fmt.Println(re.MatchString("123-45-6789"))
fmt.Println(re.MatchString("bad-input"))
}
The compiler rejects the program with undefined: regexp if you forget to import the package. The compiler also rejects imported and not used if you import regexp but never call it. Go's error messages are plain text. If you pass a malformed pattern, regexp.Compile returns an error like error parsing regexp: missing closing ): ... at runtime. The compiler cannot check the pattern string because it is just data. You must handle the error in your code.
Walk through the lifecycle
When you call regexp.Compile, the engine reads the pattern string character by character. It validates the syntax, resolves escapes, and builds an internal representation. This representation includes the state machine and metadata about the pattern. The function allocates memory for this structure and returns a pointer.
The *regexp.Regexp value is safe for concurrent use. Multiple goroutines can call MatchString on the same *regexp.Regexp simultaneously without locks. The engine is designed for this. You can store the compiled regex in a package-level variable or a struct field and share it across your entire application.
If the pattern is invalid, regexp.Compile returns a non-nil error. The error message describes the syntax problem. Common errors include missing closing parentheses, invalid escape sequences, or unsupported features. Go's regex engine does not support backreferences or lookarounds. If you try to use those, you get an error like error parsing regexp: invalid or unsupported Perl syntax: (?=). This is a feature, not a bug. The engine sacrifices some features to guarantee linear time performance.
Realistic example: cached validator
In production code, you rarely compile inside a function that gets called often. You compile once and store the result. This pattern appears in HTTP handlers, database drivers, and log parsers. The compiled regex lives as long as the application runs.
Here is a validator struct that compiles the pattern in its constructor. The receiver name is one letter, matching the type. This is standard Go style. The receiver is (v *Validator), not (self *Validator) or (this *Validator).
package main
import (
"fmt"
"regexp"
)
// Validator holds a pre-compiled regex for checking SSNs.
type Validator struct {
// ssnRegex is compiled once and reused for every check.
// Regexp is safe for concurrent use, so no mutex is needed.
ssnRegex *regexp.Regexp
}
// NewValidator creates a Validator with a compiled pattern.
// It returns an error if the pattern is invalid.
func NewValidator() (*Validator, error) {
re, err := regexp.Compile(`^\d{3}-\d{2}-\d{4}$`)
if err != nil {
// Wrap the error to add context.
// The %w verb allows callers to unwrap the error later.
return nil, fmt.Errorf("failed to compile ssn regex: %w", err)
}
return &Validator{ssnRegex: re}, nil
}
The method uses the cached regex. It does not compile anything. It just runs the state machine. This is fast. The memory allocation happened once in NewValidator.
// Check validates a string against the SSN pattern.
func (v *Validator) Check(input string) bool {
// MatchString runs the pre-compiled state machine.
// This is fast because the pattern was already parsed.
return v.ssnRegex.MatchString(input)
}
func main() {
v, err := NewValidator()
if err != nil {
panic(err)
}
fmt.Println(v.Check("123-45-6789"))
fmt.Println(v.Check("bad-input"))
}
Error wrapping with %w is the convention. It preserves the original error while adding context. Callers can use errors.Is or errors.As to inspect the wrapped error. The receiver name v is short and matches the type. The field ssnRegex is private because it starts with a lowercase letter. Public names start with a capital letter. Private names start lowercase. This controls visibility.
Pitfalls and runtime errors
Compiling a regex inside a loop is the most common mistake. If you call regexp.Compile in a loop, you allocate memory and run the parser on every iteration. This causes GC pressure and slows down the program. The fix is to move the compilation outside the loop. Store the result in a variable and reuse it.
Using regexp.MustCompile on a dynamic pattern is dangerous. MustCompile calls Compile and panics if the pattern is invalid. The panic crashes the program. If the pattern comes from user input or a config file, you cannot predict whether it is valid. Use Compile and handle the error. Reserve MustCompile for hardcoded strings that you control.
Forgetting to check the error from Compile hides bugs. If you ignore the error and use the nil regex, you get a panic like runtime error: invalid memory address or nil pointer dereference when you call MatchString. The error check is boilerplate, but it is necessary. Go does not hide the failure. You must handle it.
Using regex for literal string matching is overkill. If you just need to check if a string contains a substring, use strings.Contains. It is much faster and uses less memory. Regex adds overhead for parsing and state machine execution. Use the right tool for the job.
MustCompile for constants
When the pattern is a constant string in your source code, you can use regexp.MustCompile. This function compiles the pattern and panics if it fails. Since the pattern is hardcoded, a syntax error is a developer bug. The panic ensures you notice the bug immediately during testing.
package main
import (
"fmt"
"regexp"
)
// ssnPattern is a constant pattern.
// MustCompile panics if the pattern is invalid, which is fine for
// constant strings because the error will be caught during development.
var ssnPattern = regexp.MustCompile(`^\d{3}-\d{2}-\d{4}$`)
func main() {
fmt.Println(ssnPattern.MatchString("123-45-6789"))
}
The variable is declared at package level. It compiles when the package initializes. If the pattern is wrong, the program crashes at startup. This is desirable for constants. The name MustCompile signals intent. It tells the reader this pattern is trusted and hardcoded. The convention is clear. If you see MustCompile, the pattern should not change.
Decision: when to use what
Use regexp.Compile when the pattern comes from user input or a config file and might be invalid. Use regexp.MustCompile when the pattern is a hardcoded string in your source code and you want the program to crash early if you made a typo. Use a cached *regexp.Regexp instance when the regex runs more than once. Use strings.Contains or strings.HasPrefix when you are matching literal text without wildcards or groups. Use regexp.Compile with error wrapping when you need to return a descriptive error to the caller.
Compile once. Match many. If you compile in a loop, you are paying for the same translation twice.