How to Implement Embeddings and Similarity Search in Go

Implement embeddings and similarity search in Go by initializing an embedding model, connecting to a vector database, and using the store's SimilaritySearch method to retrieve relevant documents.

When keywords fail

You have a folder of internal documentation. A new hire asks your chatbot how to rotate database credentials. A keyword search for "rotate" returns zero results. The actual guide uses the phrase "cycle access tokens." The computer misses the connection because it only matches exact strings. Humans understand meaning. Computers need coordinates.

Embeddings solve this gap. They convert text into fixed-length lists of numbers. Similar concepts land near each other in that numerical space. You measure the distance between coordinates to find matches. The math handles synonyms, paraphrases, and related ideas without you writing a single rule.

How embeddings actually work

An embedding model reads a sentence and outputs a vector. A vector is just a slice of floating-point numbers. The model was trained on massive text corpora to place semantically similar phrases close together. "Database credential rotation" and "cycle access tokens" might both land near coordinates like [0.12, -0.45, 0.88, ...]. "Pizza delivery" lands far away.

Similarity search measures the distance between two vectors. Cosine similarity is the standard metric. It calculates the angle between vectors, ignoring their magnitude. A smaller angle means higher similarity. The vector database indexes millions of these slices and uses approximate nearest neighbor algorithms to find the closest matches in milliseconds.

You do not need a mathematics degree to use this. You only need to know that text becomes numbers, numbers live in a database, and distance equals relevance.

The minimal setup

Here is the simplest way to wire an embedder to a vector store. The code uses langchaingo to handle the model communication and Weaviate to store the vectors.

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/tmc/langchaingo/embeddings"
	"github.com/tmc/langchaingo/llms/googleai"
	"github.com/tmc/langchaingo/vectorstores/weaviate"
)

func main() {
	ctx := context.Background()
	// Pass context first so cancellation propagates to the HTTP client.
	client, err := googleai.New(ctx, googleai.WithAPIKey("YOUR_API_KEY"))
	if err != nil {
		log.Fatal(err)
	}

	// Wrap the LLM client in an embedder that handles text-to-vector conversion.
	emb, err := embeddings.NewEmbedder(client)
	if err != nil {
		log.Fatal(err)
	}

	// Connect to Weaviate and attach the embedder so it can index documents automatically.
	store, err := weaviate.New(
		ctx,
		weaviate.WithEmbedder(emb),
		weaviate.WithHost("localhost:9035"),
		weaviate.WithIndexName("Docs"),
	)
	if err != nil {
		log.Fatal(err)
	}

	// Add a single document to verify the pipeline works.
	_, err = store.AddDocuments(ctx, []weaviate.Document{{Text: "Rotate credentials weekly"}})
	if err != nil {
		log.Fatal(err)
	}

	// Query the store and print the top match.
	results, err := store.SimilaritySearch(ctx, "cycle access tokens", 1)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(results[0].Text)
}

Run this and you get your document back. The embedder sent the text to the model, received a vector, stored it, converted the query to a vector, and returned the closest match. The pipeline works.

Embedding models are remote services. Treat them like any other network dependency.

What happens under the hood

When you call AddDocuments, the vector store splits the text into chunks if you configured a splitter. Each chunk goes to the embedding model. The model returns a slice of floats. The database stores those floats alongside metadata like the original text, source file, and creation timestamp. The index builds a mathematical map of the vector space.

When you call SimilaritySearch, the same embedder converts your query into a vector. The database runs a nearest-neighbor search across the index. It calculates distances, ranks the results, and returns the top N documents. The LLM never sees the raw database. It only receives the ranked text snippets you pass to it.

This separation is intentional. The vector store handles retrieval. The LLM handles generation. You chain them together for retrieval-augmented generation. The pattern scales because the heavy lifting happens in the database index, not in your application memory.

Keep the embedder consistent. Switching models mid-project breaks the coordinate system.

A production-ready pipeline

Real applications need context propagation, structured error handling, and explicit timeouts. Here is how that looks in a service layer.

package service

import (
	"context"
	"fmt"
	"time"

	"github.com/tmc/langchaingo/embeddings"
	"github.com/tmc/langchaingo/llms/googleai"
	"github.com/tmc/langchaingo/vectorstores/weaviate"
)

// Store holds the vector database connection and embedder.
type Store struct {
	db  weaviate.Store
	emb embeddings.Embedder
}

// NewStore initializes the embedding pipeline with explicit timeouts.
func NewStore(ctx context.Context, apiKey string) (*Store, error) {
	// Derive a child context so database operations cannot hang indefinitely.
	ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
	defer cancel()

	client, err := googleai.New(ctx, googleai.WithAPIKey(apiKey))
	if err != nil {
		return nil, fmt.Errorf("create llm client: %w", err)
	}

	emb, err := embeddings.NewEmbedder(client)
	if err != nil {
		return nil, fmt.Errorf("create embedder: %w", err)
	}

	db, err := weaviate.New(
		ctx,
		weaviate.WithEmbedder(emb),
		weaviate.WithHost("localhost:9035"),
		weaviate.WithIndexName("KnowledgeBase"),
	)
	if err != nil {
		return nil, fmt.Errorf("connect to vector store: %w", err)
	}

	return &Store{db: db, emb: emb}, nil
}

// Search retrieves the top k documents matching the query.
func (s *Store) Search(ctx context.Context, query string, k int) ([]string, error) {
	// Propagate the incoming context so cancellation stops the HTTP call.
	results, err := s.db.SimilaritySearch(ctx, query, k)
	if err != nil {
		return nil, fmt.Errorf("similarity search: %w", err)
	}

	// Extract just the text content for downstream processing.
	texts := make([]string, len(results))
	for i, r := range results {
		texts[i] = r.Text
	}
	return texts, nil
}

The receiver name is s, matching the type Store. Go convention favors short, predictable receiver names. The context.Context parameter sits first in every public method. Functions that accept a context must respect cancellation and deadlines. The if err != nil blocks look verbose. The community accepts the boilerplate because it makes the failure path impossible to ignore.

Run gofmt before committing. The tool enforces consistent indentation and spacing. Most editors run it on save. Argue logic, not formatting.

Context is plumbing. Run it through every long-lived call site.

Where things break

Vector pipelines fail in predictable ways. The compiler catches type mismatches early. Runtime failures usually involve network timeouts, dimension mismatches, or unhandled context cancellation.

If you pass a string where a slice of floats is expected, the compiler rejects the program with cannot use "query" (untyped string constant) as []float32 value in argument. If you forget to import the vector store package, you get undefined: weaviate. If you import it but never use it, the build fails with imported and not used. These are standard Go errors. Fix them by checking your types and removing dead imports.

Runtime panics happen when you access a nil store or an empty results slice. Always check the length before indexing. results[0] panics if the search returns zero matches. Guard against it with a length check or a default fallback.

Dimension mismatch is the silent killer. If you index documents with a 768-dimensional model and query with a 1536-dimensional model, the database cannot calculate distance. The vectors live in different mathematical spaces. You will get garbage results or a database error. Lock your embedding model version in code or configuration. Do not swap models without reindexing.

Goroutine leaks occur when background indexing jobs wait on channels that never close. Always attach a context to long-running tasks and cancel it when the parent scope exits. The worst goroutine bug is the one that never logs.

Choosing the right approach

Use a vector database when you need to search thousands of documents with semantic understanding and low latency. Use an in-memory slice when your dataset fits in RAM and you are prototyping or running a single-user tool. Use traditional full-text search when you need exact keyword matching, boolean operators, or faceted filtering on structured metadata. Rely on the LLM context window when your documents are short, the user expects immediate answers, and you can afford the token cost. Stick to sequential code when you do not need concurrency: the simplest thing that works is usually the right thing.

Where to go next