How to Implement Idempotent Job Processing in Go

The retry that broke the bank

You send a payment request to your backend. The network hiccups. The client doesn't get a response. The client retries. Your server receives the same request twice. Without protection, you charge the user twice. The user complains. The support ticket arrives. You realize the job processor isn't idempotent.

Idempotency means doing the same thing multiple times has the same effect as doing it once. In job processing, this usually means tracking job IDs and refusing duplicates. If a job ID comes in, you process it. If that same ID comes in again, you return the previous result or skip it. The state of the world doesn't change on the second attempt.

This protects against network retries, duplicate messages from a queue, or a crashed worker that the scheduler thinks failed. Idempotency is the shield between your logic and the chaos of distributed systems.

What idempotency actually means

Think of idempotency like a light switch. You can flip the switch on ten times in a row. The result is always the same: the light is on. You don't get brighter light or a second bulb. Job processing needs that property.

In Go, idempotency usually relies on an idempotency key. This is a unique identifier attached to the request. The system stores this key. When a request arrives, the system checks the key. If the key exists, the system knows the work is done. It returns the stored result or skips execution. If the key is new, the system stores it and runs the work.

The key must be unique per logical operation. A payment ID works. A user ID plus a timestamp might not, if the user can make multiple payments. The key represents the intent, not the attempt.

Idempotency is a contract. Enforce it at the boundary, not in the middle.

In-memory guard with sync.Map

Here's the simplest in-memory guard: use a sync.Map to track IDs and reject duplicates instantly.

package main

import (
	"fmt"
	"sync"
)

// jobTracker holds completed job IDs to prevent re-processing within this process.
var jobTracker sync.Map

// ProcessJob runs the function only if the ID hasn't been seen.
func ProcessJob(jobID string, fn func()) {
	// LoadOrStore is atomic. It checks and sets in one step.
	// It returns the existing value if present, or stores the new one.
	// The loaded bool tells us if the value was already there.
	// struct{}{} is an empty struct with zero size. It saves memory compared to storing a bool.
	if _, loaded := jobTracker.LoadOrStore(jobID, struct{}{}); loaded {
		fmt.Println("Skipping duplicate job:", jobID)
		return
	}

	// First time seeing this ID. Run the work.
	fn()
}

sync.Map is a concurrent-safe map optimized for cases where keys are written once and read many times, or where different goroutines write to distinct keys. LoadOrStore is atomic. It checks and sets in one step. If the key exists, it returns the value and true. If not, it stores the value and returns the zero value and false.

The struct{}{} is an empty struct. It takes zero bytes of memory. Using it as the value saves space compared to storing a boolean or a pointer. With millions of job IDs, that memory savings adds up. The loaded variable captures the boolean return. If loaded is true, the job was already in the map. You skip execution. If false, you stored the ID and proceed.

This pattern is fast and safe for a single process. It vanishes when the process restarts. If the server crashes and restarts, the map is empty. Old job IDs are forgotten. New retries will run again. This is acceptable for transient data or when the cost of re-running is low.

Goroutines are cheap. Channels are not magic. sync.Map is a tool, not a silver bullet.

How sync.Map beats a mutex map

Developers often reach for a plain map wrapped in a sync.Mutex. That works. sync.Map is different. It uses internal sharding and read-write locks to allow concurrent reads without blocking. When keys are distinct per goroutine, sync.Map avoids contention entirely. Each goroutine writes to its own shard.

If many goroutines write to the same keys, sync.Map falls back to a global lock. In that case, a map with a sync.Mutex might be faster. Profile your workload. The Go documentation warns about this trade-off. sync.Map shines when you have high read concurrency or distinct writes. Job tracking often fits this pattern. Each job ID is unique. Workers process different IDs. Reads happen to check existence. Writes happen to record completion.

Run gofmt on this code. It aligns the struct fields and fixes indentation. Don't argue about braces. The tool decides. Most editors run it on save. Trust gofmt. Argue logic, not formatting.

Durable tracking with a database

In production, jobs survive restarts. You need a durable store. Here's a pattern using a database to track completion across process boundaries.

package main

import (
	"context"
	"database/sql"
	"fmt"
)

// JobProcessor manages durable idempotency using a database.
type JobProcessor struct {
	db *sql.DB
}

// NewJobProcessor creates a processor bound to a database connection.
func NewJobProcessor(db *sql.DB) *JobProcessor {
	return &JobProcessor{db: db}
}

// Process attempts to insert the job ID. If the unique constraint fails, the job is skipped.
func (p *JobProcessor) Process(ctx context.Context, jobID string, fn func(context.Context) error) error {
	// Context is always the first parameter. It carries cancellation and deadlines.
	// Try to insert the job ID. The unique index on job_id prevents duplicates.
	// ON CONFLICT DO NOTHING ensures the insert fails silently if the ID exists.
	result, err := p.db.ExecContext(ctx, `
		INSERT INTO processed_jobs (job_id)
		VALUES ($1)
		ON CONFLICT (job_id) DO NOTHING
	`, jobID)

	if err != nil {
		return fmt.Errorf("insert job tracking: %w", err)
	}

	// RowsAffected returns 1 if inserted, 0 if conflict.
	rows, err := result.RowsAffected()
	if err != nil {
		return fmt.Errorf("check rows affected: %w", err)
	}

	if rows == 0 {
		// ID already exists. Job was processed before.
		return nil
	}

	// New ID. Execute the work.
	if err := fn(ctx); err != nil {
		// If the job fails, you might want to delete the tracking row
		// to allow retries, or mark it as failed depending on your retry policy.
		return fmt.Errorf("execute job %s: %w", jobID, err)
	}

	return nil
}

The database unique constraint is the safety net. Without it, the check-then-act pattern breaks under concurrency. Two goroutines can check for the ID simultaneously, both see it's missing, and both insert. The unique constraint forces the database to reject the second insert. The application handles the rejection.

The receiver name is usually one or two letters matching the type: (p *JobProcessor), not (this *JobProcessor). This is Go convention. Keep it short.

if err != nil { return err } is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. Check the error immediately. Return it. Don't hide it.

The Process function takes ctx first. This is the Go convention. Pass it down. Respect cancellation. Functions that take a context should respect deadlines and cancellation signals.

The database unique constraint is the only truth. Application logic is a suggestion.

The zombie job problem

The simple insert pattern has a flaw. If the process inserts the ID and then crashes before running the work, the ID is stuck in the database. The work never happened. The queue might retry. The retry sees the ID and skips. The job is dead. This is the zombie job problem.

The fix depends on your tolerance for duplicates versus lost jobs. One approach is to insert with a status='pending'. If the work succeeds, update to status='done'. If the work fails, delete the row or mark it failed. On retry, check the status. If pending, you can re-run the work. If done, skip. If failed, decide whether to retry or alert.

This adds complexity. You need transactions. You need to handle the gap between insert and update. You might need a background cleanup job to expire old pending rows. Idempotency is easy when the work is instant. It gets hard when the work takes time and crashes happen.

Another approach is to accept the risk. If the job is cheap to re-run, let it run twice. The idempotency key prevents infinite loops. Two runs are better than zero runs. If the job is expensive or destructive, you need the status tracking.

Track the ID before the work. Clean up the ID after the work. Manage the gap carefully.

Memory leaks and eviction

sync.Map grows forever. If you process millions of jobs, the map consumes memory. You need eviction. sync.Map doesn't have built-in TTL. You need a background goroutine to clean up old IDs.

One pattern is to store a timestamp instead of struct{}{}. Use LoadOrStore to set the timestamp. A background goroutine iterates the map and deletes old entries. sync.Map supports Range and Delete. You can sweep the map periodically.

Alternatively, use a cache library like ristretto or bigcache. These provide TTL and size limits out of the box. They are optimized for high throughput. If you need eviction, don't roll your own. Use a library.

The worst goroutine bug is the one that never logs. If your cleanup goroutine panics, your memory leaks silently. Add recovery and logging to background workers.

Pitfalls and compiler errors

Race conditions are the enemy. If you check for existence and then insert, two goroutines can pass the check simultaneously. The database unique constraint prevents this. Without it, you get duplicates. The compiler won't catch this. You'll get data corruption in production.

If you use sync.Map in a distributed system, you get false negatives. Each node has its own map. Node A processes ID 123. Node B doesn't know. Node B processes ID 123. The job runs twice. Use a shared store for distributed systems.

If you forget to capture the loop variable, the compiler rejects the program with loop variable i captured by func literal (which became a hard error in Go 1.22+). This applies if you spawn goroutines in a loop. Always capture the variable.

If you try to use sync.Map with non-pointer types incorrectly, you might get data races. The race detector catches this. Run go run -race during development. It flags concurrent access to shared memory.

Public names start with a capital letter. Private start lowercase. No keywords like public or private. JobProcessor is public. db is private. This controls visibility.

Interfaces are accepted, structs are returned. "Accept interfaces, return structs" is the most common Go style mantra. Your processor returns errors and results, not interfaces.

Idempotency keys are cheap. Duplicates are expensive.

When to use what

Use sync.Map when you need fast in-memory deduplication within a single process and losing state on restart is acceptable.

Use a database with a unique constraint when jobs must survive restarts and multiple workers share the same job queue.

Use a distributed cache like Redis with SET NX when you need cross-process deduplication with lower latency than a relational database.

Use a message queue with exactly-once semantics if your broker supports it and you want to offload idempotency to the infrastructure.

Use a local mutex-protected map when keys are written frequently by multiple goroutines, as sync.Map optimizes for read-heavy or distinct-key workloads.

Use a cache with TTL when you need automatic eviction of old job IDs to prevent memory leaks.

Use a status-tracking pattern when jobs are expensive and you cannot afford zombie jobs that block retries.

Use plain sequential code when you don't need concurrency: the simplest thing that works is usually the right thing.

Where to go next

Idempotent processing ensures that running a job multiple times produces the same result as running it once. You achieve this by recording the ID of every job you finish and skipping it if you see it again. Think of it like a to-do list where you cross off items immediately so you never do the same task twice.