Finding every match in a string
You have a configuration file with hundreds of lines. Each line contains a key, a colon, and a value. You only care about the values that look like email addresses. You write a quick regular expression, call FindString, and get exactly one result. The rest of the file is ignored. You realize the standard library gives you a single match by default, but finding every occurrence requires a slightly different approach. The regexp package handles this with a family of FindAll functions that keep scanning until the text runs out.
How the scanner actually works
Regular expression engines read text from left to right. When they find a match, they record it and continue scanning from the byte immediately after the match ended. Go's implementation balances speed and correctness by compiling patterns into an internal bytecode representation. The FindAll family wraps this scanning loop and collects every hit into a slice. The second parameter controls how many hits to collect. Passing -1 tells the engine to run until the end of the string. Passing a positive number stops the scan after that many matches. The engine does not backtrack across matches. Each match is independent and non-overlapping.
Think of it like a metal detector walking down a beach. The operator sweeps the ground, beeps when they find a coin, marks the spot, and keeps walking from where the last coin was found. They do not walk backward to check the same patch twice. They do not dig up overlapping coins. They just keep moving forward until they decide to stop.
The engine scans left to right. It never looks back unless capture groups force it to.
Minimal example
Here is the simplest way to grab every sequence of digits from a string.
package main
import (
"fmt"
"regexp"
)
func main() {
// Compile once. MustCompile panics on invalid syntax, which is fine for hardcoded patterns.
re := regexp.MustCompile(`\d+`)
text := "Order 101 shipped. Order 205 pending. Order 309 cancelled."
// -1 means "collect every match until the end of the string"
matches := re.FindAllString(text, -1)
// matches is a []string containing ["101", "205", "309"]
fmt.Println(matches)
}
What happens under the hood
When the program runs, regexp.MustCompile parses the pattern \d+ and builds a finite automaton. This compilation step takes microseconds but is not free. The FindAllString call then walks through text byte by byte. The first match starts at index 6. The engine records 101, advances the cursor to index 9, and continues. It finds 205, advances, finds 309, advances, and hits the end of the string. The function returns a slice with three elements. If you pass 2 instead of -1, the engine stops after finding 101 and 205, leaving the rest of the string unscanned. This early exit saves CPU cycles when you only need a sample or a fixed batch.
Memory allocation follows Go's standard slice growth rules. When you pass -1, the engine does not know how many matches will exist. It starts with a small backing array and doubles its capacity as needed. Each reallocation copies the existing pointers to a larger block. If you are parsing a multi-megabyte log file and expect thousands of matches, pre-allocating a slice and using FindAllStringSubmatchIndex can reduce allocation pressure. For most applications, the default growth strategy is fast enough.
Go also handles UTF-8 transparently. If your pattern matches multi-byte characters, the engine advances by rune boundaries, not raw bytes. Empty matches do not cause infinite loops. When the engine encounters a zero-width match, it forces the cursor forward by one byte (or one rune) to guarantee progress. This prevents the scanner from getting stuck on the same position repeatedly.
Compilation happens once. Scanning happens every time. Cache the compiled object.
Realistic example
Real code rarely needs just the raw matched text. You usually want structured data. Capture groups let you extract specific parts of a match. FindAllStringSubmatch returns a slice of slices. The outer slice holds one entry per match. The inner slice holds the full match at index 0, followed by each capture group.
package main
import (
"fmt"
"regexp"
)
func parseLogEntries(text string) []map[string]string {
// Pattern captures date, log level, and message body.
// Go does not support named groups, so we rely on positional indices.
pattern := `(\d{4}-\d{2}-\d{2}) \[(\w+)\] (.+)`
re := regexp.MustCompile(pattern)
// FindAllStringSubmatch returns [][]string.
// Each inner slice is [fullMatch, date, level, message]
rawMatches := re.FindAllStringSubmatch(text, -1)
// Pre-allocate the result slice to avoid repeated reallocations.
entries := make([]map[string]string, 0, len(rawMatches))
for _, m := range rawMatches {
// m[0] is the full line, m[1] is date, m[2] is level, m[3] is message
entries = append(entries, map[string]string{
"date": m[1],
"level": m[2],
"message": m[3],
})
}
return entries
}
func main() {
log := `2024-01-15 [INFO] Server started
2024-01-15 [ERROR] Connection timeout
2024-01-16 [WARN] High memory usage`
for _, entry := range parseLogEntries(log) {
fmt.Printf("[%s] %s: %s\n", entry["date"], entry["level"], entry["message"])
}
}
The output prints each log line with its components separated. Notice how FindAllStringSubmatch handles the heavy lifting. You iterate over the outer slice, pull out the indices you need, and build your own structures. This pattern appears constantly in log parsers, data extractors, and template preprocessors. The inner slice always includes the full match at index 0, even if you only care about the groups. This design keeps the API predictable and avoids off-by-one confusion when patterns change.
Capture groups give you structure. Flat matches give you speed. Choose based on what you actually need to do with the data.
Pitfalls and runtime boundaries
Regex compilation is expensive. If you create a new regexp.MustCompile call inside a hot loop or an HTTP handler, you will burn CPU cycles on pattern parsing instead of scanning text. Compile the pattern once at package initialization or store it in a struct field. The community convention is to use regexp.MustCompile for patterns known at compile time, and regexp.Compile for dynamic patterns that might come from user input. regexp.Compile returns an error instead of panicking.
If you pass a malformed pattern to MustCompile, the program crashes with panic: regexp: Compile([unclosed): error parsing regexp: missing closing ]. This is a hard failure. Validate user-supplied patterns with regexp.Compile and handle the error gracefully. Never trust external input for regex compilation.
Go protects against catastrophic backtracking. The engine enforces a step limit during execution. If a pathological pattern triggers exponential backtracking on a long input, the runtime stops the scan and panics with a step-limit exceeded message. This is a defense against regular expression denial of service attacks. You can adjust the limit using regexp.Simplify or by rewriting greedy quantifiers, but most developers should just avoid unbounded repetition on untrusted input.
Overlapping matches are not supported. If your pattern is aa and your text is aaa, you get one match, not two. The engine advances past the entire matched substring. If you need overlapping matches, you must use a lookahead pattern like (?=aa) and extract the match manually, or write a custom scanner.
Panics on bad patterns are a developer trap. Validate or compile safely.
Choosing the right variant
The regexp package offers four FindAll variants. Picking the right one saves memory and simplifies your iteration logic.
Use FindAllString when you only need the full matched text and want the simplest possible return type.
Use FindAllStringSubmatch when you need capture groups alongside the full match and prefer working with strings over byte indices.
Use FindAllIndex when you need the start and end byte offsets for each match, which is faster for highlighting or slicing without allocating new string objects.
Use FindAllStringSubmatchIndex when you need both capture groups and their exact byte positions, useful for building syntax highlighters or AST builders.
Use a positive integer limit instead of -1 when you only need the first N matches to save memory and CPU cycles on large inputs.
Pick the variant that returns exactly what you need. Extra slices cost memory.