How to Read and Write Gzipped Files in Go

The problem with uncompressed logs

Your application generates a JSON log every second. After a week, the directory is three gigabytes. You need compression, but you do not want to rewrite your logging pipeline or pull in a third-party library. Go's standard library already ships with everything you need. The compress/gzip package handles the heavy lifting while keeping your code clean and idiomatic.

How the io interfaces shape compression

Gzip is a compression algorithm. Go's implementation treats it as a stream filter. Think of it like a pipe adapter. You have a raw data source on one end and a destination on the other. The gzip wrapper sits in the middle, transforming bytes as they flow through. It does not store data in memory. It does not care if the source is a file on disk, a network connection, or an in-memory buffer. It only cares about the io.Reader or io.Writer interface.

This design matches Go's philosophy. The language prefers composing small, focused types instead of building monolithic utilities. Any type that implements Read(p []byte) (n int, err error) can feed a gzip reader. Any type that implements Write(p []byte) (n int, err error) can receive gzip output. The compression logic stays isolated from I/O logic. You can swap a file for an HTTP response body without changing a single line of compression code.

Treat interfaces as contracts, not as implementation details. Wrap the stream, pass it along, and let the compiler enforce the boundaries.

Reading a gzipped stream

Here is the simplest way to decompress a file. You open the file, wrap it in a gzip reader, and pull the data out.

package main

import (
	"compress/gzip"
	"fmt"
	"io"
	"os"
)

func main() {
	// Open the compressed file on disk
	f, err := os.Open("data.gz")
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}
	defer f.Close()

	// Wrap the file handle in a decompression stream
	gz, err := gzip.NewReader(f)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}
	defer gz.Close()

	// Stream all decompressed bytes into memory
	data, err := io.ReadAll(gz)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}
	fmt.Println(string(data))
}

The os.Open call returns a file handle that implements io.Reader. You pass that handle to gzip.NewReader. The gzip reader keeps a small internal buffer for decompression state. When you call io.ReadAll, it repeatedly asks the gzip reader for bytes. The gzip reader pulls compressed chunks from the file, expands them, and returns the raw text. The defer gz.Close() call is mandatory. Closing the reader releases the underlying file descriptor and clears any decompression buffers. If you skip it, the file stays locked until the garbage collector eventually runs, which might be too late for your application.

Never assume a file will open successfully. Network mounts fail, permissions change, and disks fill up. Check the error immediately and return. The compiler will reject your code with declared and not used if you assign an error to a variable and never check it. Handle it or discard it with an underscore. Discarding errors in production code is a debugging nightmare.

Readers decompress on demand. Close them when the stream ends.

Writing a gzipped stream

Compression works the same way in reverse. You create a destination file, wrap it in a gzip writer, and push data through it.

package main

import (
	"compress/gzip"
	"os"
)

func main() {
	// Create a new file for the compressed output
	out, err := os.Create("output.gz")
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}
	defer out.Close()

	// Wrap the file in a compression stream
	gw := gzip.NewWriter(out)
	defer gw.Close()

	// Write raw bytes through the compressor
	gw.Write([]byte("Hello, compressed world!"))
}

gzip.NewWriter returns a writer that implements io.Writer. When you call gw.Write, the data goes into an internal compression buffer. It does not hit the disk immediately. The defer gw.Close() call is where the actual work finishes. Close() flushes the remaining compressed bytes, writes the gzip trailer, and releases the underlying file. The trailer stores a checksum and the original uncompressed size. If you forget to close the writer, the file will be truncated and unreadable by any standard gzip tool.

Go convention dictates that you handle errors explicitly. The if err != nil { return err } pattern is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. You cannot accidentally swallow a disk-full error or a permission denial.

Writers buffer by default. Close them to finalize the archive.

The buffering trap

The gzip writer does not write to disk on every Write call. It accumulates data in a buffer to maximize compression efficiency. The default buffer size is 32 kilobytes. This behavior saves CPU cycles and reduces system call overhead. It also creates a timing trap.

If your program crashes or exits before Close() runs, the buffer never flushes. The file on disk contains partial compressed data. You will see gzip: invalid header when you try to decompress it later. The runtime will not panic. The corruption is silent until you read the file.

You can force a flush with gw.Flush(). This pushes the current buffer to the underlying writer without terminating the stream. Use it when you need to guarantee that data is persisted before a long-running operation or a potential crash. Do not call Flush() on every small write. The overhead destroys compression ratios and slows down I/O.

Buffering trades memory for speed. Flush deliberately, close absolutely.

Streaming in production

Loading an entire file into memory with io.ReadAll works for small payloads. Production systems usually stream data to keep memory usage flat. The idiomatic pattern uses io.Copy to pipe data between readers and writers without manual buffer management.

package main

import (
	"compress/gzip"
	"io"
	"os"
)

func compressFile(srcPath, dstPath string) error {
	// Open the source file for reading
	src, err := os.Open(srcPath)
	if err != nil {
		return err
	}
	defer src.Close()

	// Create the destination file for writing
	dst, err := os.Create(dstPath)
	if err != nil {
		return err
	}
	defer dst.Close()

	// Wrap the destination in a gzip compressor
	gw := gzip.NewWriter(dst)
	defer gw.Close()

	// Stream data from source to compressed destination
	_, err = io.Copy(gw, src)
	return err
}

io.Copy reads from src in chunks, compresses each chunk through gw, and writes it to dst. It uses a fixed-size buffer internally, so memory usage never exceeds a few kilobytes regardless of whether the source file is ten megabytes or ten gigabytes. The function returns the first error it encounters, which matches Go's convention of surfacing failures immediately. You call this function from a background worker or an HTTP handler, and the streaming behavior protects your application from out-of-memory crashes.

HTTP handlers benefit from this pattern. You can wrap http.ResponseWriter in a gzip.Writer and stream JSON directly to the client. The browser decompresses it automatically. You save bandwidth without changing your serialization logic. Just remember to set the Content-Encoding: gzip header before writing.

Stream everything. Memory is finite, disks are cheap.

Common pitfalls and compiler complaints

Compression introduces a few subtle traps. The most common one is ignoring the writer's close call. Developers often rely on defer out.Close() to clean up the file handle, but that only closes the raw file. The gzip trailer never gets written. The runtime will not panic. The file just becomes corrupted. You will see gzip: invalid header when you try to decompress it later.

Another trap is compressing tiny files. Gzip adds a fixed header and footer overhead. If your payload is smaller than a few hundred bytes, the compressed file will actually be larger than the original. The algorithm needs repetitive patterns to find savings. Random data or very short strings will expand slightly. Check your payload size before wrapping it.

Error handling follows the standard Go pattern. Every os.Open, gzip.NewReader, and io.Copy call can fail. The compiler will reject your code with declared and not used if you assign an error to a variable and never check it. You must handle it or explicitly discard it with an underscore. Discarding errors in production code is a debugging nightmare. Check the error, return it, or log it.

Receiver naming follows a strict convention. If you build a custom wrapper type, name the receiver with one or two letters matching the type. Use (g *GzipLogger) Write(...) instead of (this *GzipLogger) or (self *GzipLogger). The community expects short receiver names. Long names signal that the code was written by someone coming from another language.

Trust the standard library. It handles edge cases you have not thought of.

When to reach for gzip

Use compress/gzip when you need universal compatibility and moderate compression ratios for text, JSON, or log files. Use plain uncompressed files when the data is already small or when CPU cycles are more expensive than disk space. Use archive/zip when you need to bundle multiple files into a single container with directory structures. Use compress/zstd when you are building high-throughput pipelines and need faster compression speeds with better ratios than gzip. Use compress/bzip2 only when you must interoperate with legacy systems that explicitly require it. Use io.Pipe when you need to connect a reader and writer across goroutines without intermediate files.

Pick the tool that matches your bottleneck. Compression is a trade-off, not a free lunch.

Where to go next

Reading and writing gzipped files in Go shrinks data to save space or expands it back to its original size. Think of it like a digital vacuum sealer: you pack data tightly to save room, then unzip it when you need to use it again. You use the standard library's built-in tools to handle the math automatically.