How to Write Tests for CLI Applications in Go

Test Go CLI apps by injecting arguments into os.Args and calling main() within a testing function.

The problem with testing main()

You built a command line tool. It reads a file, transforms the data, and prints the result. Running it in a terminal works fine. Now you want to automate the verification. You open a _test.go file, write a test function, and call main() directly. The test passes. You celebrate. Then you run the test suite with the -parallel flag, and half your tests start failing with random output. Or worse, the test runner terminates abruptly without reporting the failure.

The issue is not your test logic. The issue is that main() is designed to run once per process, not dozens of times in parallel. It reads global variables. It writes to standard streams. It calls os.Exit() to terminate the program. Testing it directly means fighting the very design of a CLI entry point.

Why os.Args is the wrong place to start

os.Args is a slice of strings that holds the command line arguments passed to the current process. It lives in the os package, which means it is process-wide global state. When you modify it in one test, every other test running in the same process sees the mutation. Global state is the enemy of parallel execution. Go's test runner executes tests concurrently by default when you pass -parallel. If two tests overwrite os.Args at the same time, they race. The test runner detects the race and marks the suite as failed, or worse, the race goes undetected and your tests pass intermittently.

Think of os.Args like a shared configuration file on a network drive. If one developer edits it while another is reading it, both get corrupted data. The solution is not to lock the file. The solution is to stop reading from the shared location and pass the configuration directly to the function that needs it.

Go's testing package expects functions that accept parameters, not functions that reach into global variables. The standard pattern is to extract the core logic into a regular function, test that function in isolation, and keep main() as a thin wrapper that only handles argument parsing and process termination.

The direct approach: mocking os.Args

If you cannot refactor the code yet, you can still test by overriding os.Args. The trick is to save the original slice, inject your test arguments, run the code, and restore the original slice immediately after. You must also prevent os.Exit() from killing the test runner. The simplest way is to ensure your main() function does not call os.Exit(), or to intercept it with a custom exit handler.

Here is the minimal pattern for overriding arguments safely:

package main

import (
	"os"
	"testing"
)

func TestMainWithArgs(t *testing.T) {
	// Preserve the original arguments to restore them later
	originalArgs := os.Args
	defer func() { os.Args = originalArgs }()

	// Inject test arguments before calling the entry point
	os.Args = []string{"mycli", "--verbose", "data.json"}
	main()
}

This works for quick sanity checks. It does not scale. The defer statement guarantees restoration even if main() panics. The slice assignment copies the reference, so you are swapping the pointer, not the underlying array. That is fast and safe as long as no other goroutine touches os.Args simultaneously.

Goroutines are cheap. Channels are not magic. Global variables are neither.

What happens under the hood

When the testing package runs your suite, it spawns a goroutine for each test function. Each goroutine shares the same process memory space. os.Args lives in that shared space. Modifying it affects every goroutine that reads it. The defer closure captures originalArgs by value at the time the test starts. When the test function returns, the closure executes and writes the saved slice back to os.Args.

If you run tests sequentially, the restore happens before the next test starts. The mutation window is narrow. If you run tests in parallel, the window overlaps. Two tests might read os.Args at the same time, see each other's injected arguments, and produce wrong results. The Go race detector will flag this with WARNING: DATA RACE in the test output. The fix is not to disable parallel execution. The fix is to stop mutating process-wide state.

Go's convention for test cleanup is t.Cleanup() instead of defer. t.Cleanup() registers a function that runs after the test completes, even if other cleanup functions are registered. It integrates with the test framework's lifecycle and prints clearer failure messages. You can swap defer for t.Cleanup(func() { os.Args = originalArgs }) to align with modern Go testing practices.

The compiler will reject unused variables with declared and not used. If you save originalArgs but forget to restore it, the variable stays unused and the build fails. That is a feature, not a bug. It forces you to handle the cleanup path explicitly.

A realistic CLI test setup

Real CLI tools read from os.Stdin, write to os.Stdout, and log to os.Stderr. Testing them requires swapping those streams with in-memory buffers. os.Stdout is a global *os.File. You can replace it with any io.Writer, including *bytes.Buffer. The buffer captures everything written to standard output during the test. After the test runs, you read the buffer and assert the contents.

Here is a complete test that swaps arguments and standard output:

package main

import (
	"bytes"
	"os"
	"testing"
)

func TestCLICapturesOutput(t *testing.T) {
	// Save original globals to restore them after the test
	oldArgs := os.Args
	oldStdout := os.Stdout
	t.Cleanup(func() {
		os.Args = oldArgs
		os.Stdout = oldStdout
	})

	// Redirect stdout to a buffer for assertion
	var buf bytes.Buffer
	os.Stdout = &buf

	// Inject arguments and run the program logic
	os.Args = []string{"mycli", "hello"}
	main()

	// Verify the captured output matches expectations
	if got := buf.String(); got != "hello\n" {
		t.Errorf("unexpected output: %q", got)
	}
}

The buffer implements io.Writer, so fmt.Println and os.Stdout.Write both route into it. The t.Cleanup call ensures restoration happens in the correct order, even if you add more cleanup steps later. The assertion uses t.Errorf to report the mismatch without stopping the test early. That matches Go's convention of collecting all failures in a single test run.

Don't pass a *string. Strings are already cheap to pass by value. The same principle applies to streams: swap the pointer, not the underlying file descriptor.

Pitfalls and compiler traps

The most common trap is os.Exit(). If your main() function calls os.Exit(0) on success or os.Exit(1) on failure, the test runner terminates immediately. The testing package never gets a chance to report the result. You will see FAIL in the terminal, but no test output. The fix is to refactor main() into a run() function that returns an error, and call os.Exit(1) only in the actual main() wrapper.

Another trap is forgetting to restore streams. If a test crashes or panics before cleanup runs, os.Stdout stays pointing at the buffer. Subsequent tests write to memory instead of the terminal. The test output becomes silent. The compiler will not catch this. The runtime will not warn you. You will stare at a blank terminal wondering why your tests stopped printing. Always use t.Cleanup or defer to restore globals.

If you accidentally assign a non-pointer buffer to os.Stdout, the compiler rejects the program with cannot use buf (variable of type bytes.Buffer) as *os.File value in assignment. The type system enforces the swap. You must use &buf to pass a pointer.

Receiver naming matters when you extract logic into methods. Use one or two letters matching the type: (c *CLI) Run(args []string) error, not (this *CLI) or (self *CLI). The community standard keeps method signatures readable and consistent across packages.

The worst goroutine bug is the one that never logs. If your CLI spawns background workers that wait on channels, and the test exits before closing those channels, the workers leak. The test passes, but the process holds memory. Always close channels when the producer finishes, or use context.Context with a deadline to force cancellation.

When to mock, when to refactor, when to spawn

Use direct os.Args mutation when you are prototyping a simple script and need a quick sanity check before investing in architecture. Use os.Stdout and os.Stdin swapping when you must verify the exact binary behavior without touching the source code. Use a run(args []string) refactoring when you want clean, parallel-safe tests that accept parameters instead of reading globals. Use exec.Command to spawn the actual compiled binary when you need to test environment variables, file permissions, signal handling, or integration with external tools.

Trust the test runner. Isolate the state. Refactor the entry point.

Where to go next