How to Validate Input with Regex in Go

Validate input in Go by compiling a regex pattern with regexp.MustCompile and checking matches using MatchString.

How to Validate Input with Regex in Go

You're building a signup handler. The frontend sends a username, and you need to ensure it contains only letters, numbers, and underscores, with a length between 3 and 20 characters. A simple string check won't cut it when the rules get specific. Regular expressions let you describe the shape of valid text with a compact pattern, and Go's regexp package makes checking that pattern fast and safe.

The stencil analogy

Think of a regex pattern as a stencil. You hold the stencil up against the input string. If the ink shows through the holes in exactly the right places, the input matches. The regexp package compiles that stencil into a machine-readable state machine and checks your strings against it. Go's engine is based on RE2, which guarantees linear time execution. Unlike regex engines in Python or JavaScript, Go will never suffer from catastrophic backtracking where a malicious input freezes your server. The engine trades some advanced features for safety and speed.

Minimal example

Here's the core pattern: compile a regex, then test a string against it.

package main

import (
	"fmt"
	"regexp"
)

// ValidateUsername checks if the input matches the allowed pattern.
func ValidateUsername(input string) bool {
	// Compile once for performance; MustCompile panics on bad syntax.
	re := regexp.MustCompile(`^[a-zA-Z0-9_]{3,20}$`)
	return re.MatchString(input)
}

func main() {
	fmt.Println(ValidateUsername("valid_user")) // true
	fmt.Println(ValidateUsername("bad!"))       // false
}

Compile once, match many times.

Raw strings and escaping

Regex patterns often contain characters that conflict with Go's string escaping rules. Backslashes are the main culprit. A pattern like \d+ means one or more digits, but inside a double-quoted string, \d is an invalid escape sequence. The compiler rejects this with invalid character sequence \d. Always wrap regex patterns in backticks. Backtick strings are raw literals that preserve every character exactly as written. This keeps patterns readable and prevents subtle bugs where escapes get consumed by the string parser.

Backticks for regex. Double quotes for prose.

Compile once, match many times

The regexp.MustCompile function takes a raw string literal and builds the internal state machine. If the pattern has a syntax error, MustCompile panics immediately. This is safe when the pattern is hardcoded in your source code, because a bad pattern means your program is broken at compile time. If you load patterns from a config file or user input, use regexp.Compile instead. It returns an error instead of panicking, letting you handle the failure gracefully.

The regexp package caches compiled regular expressions. If you call MustCompile with the same string multiple times, Go returns the cached instance. This means you don't strictly need to store the result in a variable if you only use the pattern once in a hot path, though storing it is clearer. The cache has a limit, so compiling unique patterns dynamically in a loop can evict useful entries. Compile static patterns once and reuse them.

If you pass an invalid pattern to Compile, you get an error like error parsing regexp: missing closing ): ^[a-z+``. You can return this error to the caller or log it. The community accepts verbose error checks because they make the unhappy path visible. Returning fmt.Errorf with %q quotes the value, making logs easier to parse.

Realistic validation with errors

Validation functions often need to return errors with context. Check simple conditions first, then run the regex. Wrap the error to include the bad value so the caller knows what failed.

package main

import (
	"errors"
	"fmt"
	"regexp"
)

// ValidateEmail checks format and returns a descriptive error.
func ValidateEmail(email string) error {
	// Check for empty input before regex to provide a specific error.
	if email == "" {
		return errors.New("email is required")
	}

	// Compile pattern inline for this function scope.
	re := regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
	if !re.MatchString(email) {
		// Wrap the error to add context for the caller.
		return fmt.Errorf("invalid email format: %q", email)
	}
	return nil
}

Methods and receiver naming

Methods on structs follow the same validation rules. The receiver name is conventionally one or two letters matching the type, like (c *Config). This keeps signatures readable and aligns with the standard library.

package main

import (
	"errors"
	"regexp"
)

// Config holds application settings.
type Config struct {
	Host string
	Port int
}

// Validate checks the host format.
func (c *Config) Validate() error {
	// Receiver name is c, matching the type Config.
	if !regexp.MustCompile(`^https?://\w+`).MatchString(c.Host) {
		return errors.New("invalid host")
	}
	return nil
}

Trust gofmt. It handles indentation and spacing. Regex patterns in backticks are preserved as-is, so your patterns stay readable.

Extracting data with groups

When you need to extract data, use capturing groups. Named groups make the code self-documenting. FindStringSubmatchMap returns a map where keys are group names and values are the matched text.

package main

import (
	"fmt"
	"regexp"
)

// ParseKeyValue extracts a key and value from a configuration line.
func ParseKeyValue(line string) (string, string, error) {
	// Pattern captures key and value into named groups.
	re := regexp.MustCompile(`(?P<key>\w+)=(?P<value>.+)`)
	matches := re.FindStringSubmatchMap(line)

	// FindStringSubmatchMap returns nil if there is no match.
	if matches == nil {
		return "", "", fmt.Errorf("line does not match key=value format: %q", line)
	}

	// Access groups by name for readability.
	return matches["key"], matches["value"], nil
}

Named groups are a Go extension to the standard regex syntax. They let you reference groups by name in the result map, which is safer than relying on numeric indices that shift if you reorder the pattern.

Unicode and the \w trap

Go's regex engine supports Unicode properties out of the box. The pattern \p{L} matches any letter in any script, while \p{N} matches any number. This is crucial for validating user input in a global application. A pattern like ^\p{L}[\p{L}\p{N}_-]{2,19}$ allows usernames with accented characters or non-Latin scripts. If you restrict validation to [a-zA-Z], you silently reject valid names from users around the world. Use Unicode properties unless you have a strict ASCII requirement.

Go's \w shorthand matches only ASCII word characters: [0-9A-Za-z_]. It does not include accented letters or non-Latin scripts. If you want Unicode-aware matching, you must use \p{L} for letters or \p{Alnum} for alphanumeric. This differs from some other languages where \w is Unicode-aware by default. Always check your shorthands against the documentation to avoid rejecting international input.

Unicode properties make your app global.

Pitfalls and compiler errors

Performance matters in validation loops. MatchString operates directly on the string data without copying. If you have a byte slice, use Match. Converting a string to bytes with []byte(input) allocates a new slice. In a high-throughput handler, avoiding that allocation per request reduces GC pressure. Pick the method that matches your input type to keep the hot path allocation-free.

The Match function accepts a byte slice, while MatchString accepts a string. They are distinct methods. If you have a string and call Match, the compiler stops you with cannot use input (variable of type string) as []byte value in argument. If you have bytes and call MatchString, you get a similar type mismatch.

Go regex does not support backreferences. A backreference allows a pattern to match a previous group, like (\w+)\s+\1 to find repeated words. If you try this, the compiler rejects the pattern with error parsing regexp: invalid or unsupported Perl syntax: \1. This limitation exists because backreferences require exponential backtracking in the worst case. Go prioritizes linear performance over this feature. If you need backreferences, you must write custom parsing logic.

No backreferences. Linear time always.

Decision matrix

Use regexp.MustCompile when the pattern is hardcoded in your source code and a syntax error should crash the program immediately. Use regexp.Compile when the pattern comes from external configuration or user input, so you can return an error instead of panicking. Use MatchString when you only need a boolean result to check if a string matches the pattern. Use Match when your input is a byte slice and you want to avoid allocation. Use FindStringSubmatch when you need to extract the first match and its capture groups as a slice. Use FindStringSubmatchMap when you use named groups and want to access results by name. Use ReplaceAllString when you want to sanitize input by swapping matched patterns with a replacement value. Use strings.Contains or strings.HasPrefix when you are checking for a literal substring without any pattern logic; these functions are faster and simpler than regex. Use strconv.ParseInt when you need to validate and convert a numeric string. Use a custom validation function when the rules involve cross-field dependencies, like checking that a password matches a confirmation field.

Where to go next