Fix

"invalid or unsupported Perl syntax" in Go Regex

Fix the Perl syntax error in Go regex by replacing unsupported Perl features like lookaheads with RE2-compatible patterns or multiple regex checks.

The regex you pasted works everywhere else. Go disagrees.

You copy a regular expression from a Python script, a JavaScript tutorial, or a StackOverflow answer. It validates passwords, extracts emails, or parses logs. You paste it into Go, run the program, and the terminal explodes.

panic: regexp: Compile((?=.*[A-Z])): error parsing regexp: invalid or unsupported Perl syntax: (?)`

The error message calls it "Perl syntax." That's the clue. Go does not use Perl-Compatible Regular Expressions. Go uses RE2. RE2 is a different engine with a different philosophy. It rejects features that threaten performance guarantees. When you see this error, Go isn't being difficult. It's enforcing a design choice that protects your application from catastrophic slowdowns.

RE2 trades features for linear time

Regular expressions can be slow. In engines like PCRE (used by Perl, Python, PHP, and JavaScript), complex patterns can trigger catastrophic backtracking. If a pattern has nested quantifiers like (a+)+b and the input is a long string of as without a b, the engine tries every possible combination of matches. The runtime grows exponentially. A 30-character input can freeze a thread for minutes. This is called a regex bomb.

RE2 was designed by Russ Cox to eliminate this risk. RE2 compiles patterns into a finite state machine that runs in linear time. If the input size doubles, the matching time doubles. No backtracking. No exponential worst cases. No regex bombs.

To guarantee linear time, RE2 drops features that require backtracking. Lookaheads, lookbehinds, backreferences, and atomic groups are unsupported. These features need the engine to jump ahead, check a condition, and jump back without consuming characters. That breaks the linear-time model. RE2 refuses to compile them.

The trade-off is clear. You lose expressive power in the pattern. You gain predictable performance. In Go, performance guarantees usually win.

Minimal example: lookahead rejection

Lookaheads are the most common trigger for this error. A lookahead like (?=...) asserts that a pattern exists ahead without consuming characters. Password validation often uses lookaheads to require uppercase, lowercase, and digits in a single pass.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// This pattern uses a lookahead (?=...) to check for an uppercase letter.
	// RE2 does not support lookaheads.
	// MustCompile panics if the pattern is invalid.
	re := regexp.MustCompile(`(?=.*[A-Z])`)

	fmt.Println(re)
}

Running this code produces a panic:

panic: regexp: Compile((?=.*[A-Z])): error parsing regexp: invalid or unsupported Perl syntax: (?)`

The compiler rejects the pattern at runtime because MustCompile compiles the regex when the function executes. The error message points to the unsupported syntax. The (?) is the start of the lookahead assertion. RE2 sees it and stops.

RE2 trades features for safety. You lose lookaheads, you gain immunity to regex bombs.

Walkthrough: how RE2 compiles patterns

When you call regexp.MustCompile or regexp.Compile, Go parses the pattern string and builds a state machine. RE2 uses a deterministic approach. It creates a set of states and transitions. Each character in the input moves the machine from one state to another. If the machine reaches an accepting state, the pattern matches.

Because the machine is deterministic, there is only one path for each input. The engine never needs to backtrack and try a different path. This ensures linear time.

Lookaheads break this model. A lookahead requires the engine to save its position, run a sub-match, and restore the position. That implies branching and backtracking. RE2 cannot build a linear-time machine for that. The compiler detects the lookahead during parsing and rejects the pattern.

If you use regexp.Compile instead of MustCompile, you get an error value rather than a panic. This is useful when patterns come from user input or configuration files.

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// Compile returns an error instead of panicking.
	// This is safer for patterns that might be invalid.
	re, err := regexp.Compile(`(?=.*[A-Z])`)
	if err != nil {
		// Handle the error gracefully.
		// The error message explains the unsupported syntax.
		fmt.Println("Regex error:", err)
		return
	}

	fmt.Println(re)
}

This prints:

Regex error: error parsing regexp: invalid or unsupported Perl syntax: (?)``

The error object contains the same message. You can log it, return it, or handle it based on your application logic.

If the pattern comes from a user, use Compile, not MustCompile. Panics are for bugs, not input.

Realistic example: password validation without lookaheads

Password validation is the classic use case for lookaheads. A common Perl regex checks for uppercase, lowercase, digits, and minimum length in one pattern:

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$

This pattern fails in Go. The lookaheads are unsupported. You need a different approach.

The Go way is to use multiple simple patterns or explicit checks. Simple patterns are easier to read and debug. RE2 handles them efficiently.

package main

import (
	"fmt"
	"regexp"
)

// ValidatePassword checks if s meets complexity requirements.
// It uses multiple regex matches instead of lookaheads.
func ValidatePassword(s string) bool {
	// Compile patterns once and reuse them.
	// Global variables are better for hot paths, but this example keeps it local.
	hasUpper := regexp.MustCompile(`[A-Z]`)
	hasLower := regexp.MustCompile(`[a-z]`)
	hasDigit := regexp.MustCompile(`\d`)

	// Check each condition independently.
	// This is readable and runs in linear time.
	return len(s) >= 8 && hasUpper.MatchString(s) && hasLower.MatchString(s) && hasDigit.MatchString(s)
}

func main() {
	passwords := []string{
		"short",
		"nouppercase1",
		"NOLOWERCASE1",
		"NoDigits!!",
		"ValidPass1",
	}

	for _, p := range passwords {
		if ValidatePassword(p) {
			fmt.Printf("%q is valid\n", p)
		} else {
			fmt.Printf("%q is invalid\n", p)
		}
	}
}

This code checks each requirement separately. MatchString scans the input for a match. RE2 runs each scan in linear time. The total time is still linear, just with a small constant factor for multiple passes. For short strings like passwords, this is negligible.

The logic is explicit. A developer reading hasUpper.MatchString(s) knows exactly what is happening. There is no magic in the pattern.

Compile regexes once. Cache them globally. Don't compile inside a hot loop.

Pitfalls and workarounds

Backreferences

Backreferences like \1 refer to a previously captured group. A pattern like (\w+)\s+\1 matches duplicate words. RE2 does not support backreferences.

re := regexp.MustCompile(`(\w+)\s+\1`)

This panics with:

panic: regexp: Compile((\w+)\s+\1): error parsing regexp: invalid or unsupported Perl syntax: \1``

To find duplicate words, use code logic. Split the string and compare adjacent tokens.

package main

import (
	"fmt"
	"strings"
)

// HasDuplicateWords returns true if s contains adjacent duplicate words.
func HasDuplicateWords(s string) bool {
	// Split by whitespace.
	// This handles multiple spaces and tabs.
	words := strings.Fields(s)

	for i := 1; i < len(words); i++ {
		// Compare current word with previous word.
		// Case-sensitive comparison matches the regex behavior.
		if words[i] == words[i-1] {
			return true
		}
	}

	return false
}

func main() {
	tests := []string{
		"hello world",
		"hello hello world",
		"foo bar baz",
	}

	for _, t := range tests {
		fmt.Printf("%q has duplicates: %v\n", t, HasDuplicateWords(t))
	}
}

This approach is often faster than regex for simple comparisons. It avoids the overhead of pattern compilation and matching.

Non-greedy quantifiers

RE2 supports non-greedy quantifiers like *? and +?, but the implementation differs from PCRE. In RE2, non-greedy quantifiers match the shortest possible string that allows the overall pattern to match. This is usually what you want. However, if you rely on specific backtracking behavior, the results may differ.

Test your patterns carefully. If a non-greedy quantifier behaves unexpectedly, rewrite the pattern to be explicit.

Third-party PCRE libraries

If you absolutely must use lookaheads or backreferences, you can use a third-party library that wraps PCRE. Packages like github.com/google/re2 are just the standard library. You need something like github.com/alpernezzet/pcre or similar.

These libraries restore PCRE syntax. They also restore the risk of exponential runtime. Use them only when necessary. Document the risk. Validate input lengths. Set timeouts.

Go prefers explicit code over magic patterns. Write a loop if the regex is too complex.

Decision matrix

Use RE2 via regexp when you need predictable performance and linear-time matching. Use multiple simple regexes when you need to check independent conditions like password complexity. Use string functions like strings.Contains when you are searching for a literal substring without pattern matching. Use a third-party PCRE library when you absolutely must support backreferences or lookaheads and can accept the risk of exponential runtime. Use manual parsing when the logic is complex and regex makes the code harder to read.

Where to go next