When LIKE isn't enough
You built a blog. Users type "Go concurrency" into the search bar. The database returns posts about "Goats" and "Going to the store". Worse, the query takes two seconds on ten thousand rows. You need a search that understands words, handles plurals, ignores "the", and runs fast. That's full-text search.
PostgreSQL has built-in full-text search that rivals external engines for most applications. It handles tokenization, stemming, stop-word removal, and relevance ranking. The database does the heavy lifting. Your Go code just sends the query and reads the results.
The search index lives in the row
PostgreSQL full-text search relies on two types. A tsvector stores the processed words from a document. A tsquery stores the processed words from the user's search. The @@ operator checks if the query matches the vector.
Think of a library card catalog. You don't read every book to find "Go". You look up the card that lists every book mentioning "Go", sorted by relevance. PostgreSQL builds this card automatically. The card is the tsvector. The lookup is the tsquery. The @@ operator is the librarian checking the card.
The to_tsvector function turns raw text into a vector. It lowercases words, removes stop words like "the" and "and", and reduces words to stems. "Running" becomes "run". "Database" becomes "databas". The plainto_tsquery function does the same processing on the user's input. This ensures "Running" matches "run" because both reduce to the same stem.
A GIN index on the tsvector column makes the lookup instant. Without the index, PostgreSQL scans every row. With the index, it jumps straight to matching documents.
Schema setup with generated columns
Here's the schema setup and a basic query using generated columns. The GENERATED ALWAYS clause keeps the index in sync without triggers.
package main
import (
"database/sql"
"fmt"
"log"
_ "github.com/jackc/pgx/v5/stdlib" // stdlib adapter for database/sql
)
// setupDB creates the table with a generated search vector.
// GENERATED ALWAYS computes the vector on write.
// This avoids the bug where the index drifts from the content.
func setupDB(db *sql.DB) error {
_, err := db.Exec(`
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
// to_tsvector reduces words to stems and strips stop words.
// The 'english' config handles stemming and stop lists.
// GENERATED ALWAYS ensures the vector updates automatically.
search_vector TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED
);
// GIN indexes are optimized for composite types like tsvector.
// They turn full scans into logarithmic lookups.
CREATE INDEX IF NOT EXISTS idx_search_vector ON documents USING GIN(search_vector);
`)
return err
}
func main() {
db, err := sql.Open("pgx", "postgres://user:pass@localhost/dbname?sslmode=disable")
if err != nil {
log.Fatal(err)
}
defer db.Close()
if err := setupDB(db); err != nil {
log.Fatal(err)
}
// Insert a sample document.
// The search_vector column is computed automatically.
_, err = db.Exec(`INSERT INTO documents (title, content) VALUES ($1, $2)`,
"Go Concurrency", "Goroutines run concurrently. Channels coordinate them.")
if err != nil {
log.Fatal(err)
}
// Query using the @@ operator.
// plainto_tsquery processes the input to match the vector format.
var id int
err = db.QueryRow(`
SELECT id FROM documents
WHERE search_vector @@ plainto_tsquery('english', $1)
`, "concurrency").Scan(&id)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Found document ID: %d\n", id)
}
The GENERATED ALWAYS clause is the modern way to manage search vectors. Older tutorials recommend database triggers. Triggers are error-prone. If the trigger fails, the index silently drifts from the content. Generated columns enforce the invariant at the storage engine level. The column is read-only for application code. PostgreSQL calculates the value whenever title or content changes.
Convention aside: gofmt formats this code. Run gofmt -w main.go or configure your editor to format on save. The Go community trusts the tool. Don't argue about indentation.
Querying with relevance ranking
Basic matching finds rows. Real search ranks them. The ts_rank function calculates a score based on term frequency and proximity. Higher scores mean better matches.
Here's a query that returns ranked results. The LIMIT clause protects the database from returning millions of rows.
// searchRanked finds documents and orders them by relevance.
// ts_rank computes a score based on term frequency and position.
func searchRanked(db *sql.DB, query string) ([]Document, error) {
rows, err := db.Query(`
SELECT id, title, ts_rank(search_vector, plainto_tsquery('english', $1)) AS rank
FROM documents
WHERE search_vector @@ plainto_tsquery('english', $1)
ORDER BY rank DESC
LIMIT 20
`, query)
if err != nil {
return nil, fmt.Errorf("search query: %w", err)
}
defer rows.Close()
var results []Document
for rows.Next() {
var doc Document
var rank float64 // rank is used for sorting, discarded in the struct
if err := rows.Scan(&doc.ID, &doc.Title, &rank); err != nil {
return nil, fmt.Errorf("scan row: %w", err)
}
results = append(results, doc)
}
// Check for errors that occurred during iteration.
// rows.Err() catches network drops or constraint violations.
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("row iteration: %w", err)
}
return results, nil
}
type Document struct {
ID int
Title string
}
The ts_rank function takes the vector and the query. It returns a float. ORDER BY rank DESC puts the best matches first. The LIMIT clause is essential. Without it, a popular query could return the entire table.
Convention aside: if err != nil is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. Always check the error. Wrap it with fmt.Errorf and %w to preserve the chain.
Realistic service with context
Production code needs timeouts and cancellation. HTTP requests can hang. Long-running queries block connections. context.Context solves this.
Here's a search service that wraps the database logic and respects context cancellation. The driver aborts the query if the context expires.
import (
"context"
"database/sql"
"fmt"
)
// SearchService handles document retrieval.
type SearchService struct {
db *sql.DB
}
// Search finds documents matching the query.
// Context controls timeout and cancellation.
func (s *SearchService) Search(ctx context.Context, q string) ([]Document, error) {
// Use QueryContext to pass cancellation to the driver.
// The driver sends a cancel signal to PostgreSQL.
rows, err := s.db.QueryContext(ctx, `
SELECT id, title, ts_rank(search_vector, plainto_tsquery('english', $1)) AS rank
FROM documents
WHERE search_vector @@ plainto_tsquery('english', $1)
ORDER BY rank DESC
LIMIT 20
`, q)
if err != nil {
return nil, fmt.Errorf("search query: %w", err)
}
defer rows.Close()
var results []Document
for rows.Next() {
var doc Document
var rank float64
if err := rows.Scan(&doc.ID, &doc.Title, &rank); err != nil {
return nil, fmt.Errorf("scan row: %w", err)
}
results = append(results, doc)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("row iteration: %w", err)
}
return results, nil
}
The QueryContext method accepts a context. If the context has a deadline, the driver sets a timeout. If the context is cancelled, the driver aborts the query. This prevents goroutine leaks and connection exhaustion.
Convention aside: context.Context always goes as the first parameter. Name it ctx. Functions that take a context should respect cancellation and deadlines. The receiver name is usually one or two letters matching the type: (s *SearchService), not (this *SearchService).
Pitfalls and compiler errors
Full-text search has traps. Missing one returns empty results or slow queries.
Language mismatch kills relevance. If you build the vector with to_tsvector('english') but query with plainto_tsquery('simple'), the stems won't match. "Running" becomes "run" in English but stays "running" in simple. The query returns nothing. Always use the same language config for both sides.
The compiler rejects invalid operators. If you write search_vector @@ 'query', the compiler rejects the program with operator does not exist: tsvector @@ text. You must wrap the text in plainto_tsquery. The database needs a tsquery, not a string.
Forgetting rows.Err() causes silent data loss. Network errors can happen during iteration. If you only check the error from Query, you miss errors that occur while reading rows. Always call rows.Err() after the loop.
Convention aside: Use _ to discard values intentionally. If you only need IDs, write rows.Scan(&id, &_, &_). This tells the reader you considered the other columns and chose to drop them. Use it sparingly with errors. Never discard an error with _.
Advanced query parsing
plainto_tsquery treats the input as a bag of words. It joins terms with AND. Searching for "Go Python" finds documents containing both "Go" and "Python". Users often want more control. They want exact phrases or exclusions.
websearch_to_tsquery parses input like a search engine. It understands quotes and minus signs. "exact phrase" -exclude matches the phrase exactly and excludes documents with "exclude". This function is surprisingly powerful. It handles the syntax users expect from Google.
Ah-ha reveal: websearch_to_tsquery handles boolean logic automatically. You don't need to write custom SQL for OR or NOT. The function parses the user's input and builds the correct query tree. Use it when you want to expose advanced search syntax to users.
Decision matrix
Pick the right tool for the job. Search requirements vary by use case.
Use LIKE '%term%' when you need exact substring matching on small tables under a thousand rows.
Use full-text search with tsvector and GIN when you need relevance ranking, stemming, and fast lookups on large datasets.
Use an external search engine like Elasticsearch when you need faceted search, fuzzy matching beyond typos, or distributed scaling across multiple nodes.
Use pg_trgm extension when you need prefix matching or similarity search for autocomplete fields.
PostgreSQL handles the index. You handle the query. Rank matters. Sort by relevance, not ID. Context is plumbing. Pass it to every query. The worst search bug is the one that returns results in the wrong order.