How to Hash Data with MD5 in Go (And Why You Shouldn't)

The checksum trap

You need to verify a downloaded file hasn't been corrupted. You remember MD5 from the days of torrenting or old checksum tools. It's fast, it's everywhere, and Go has a package for it. You write the code, run it, and get a hex string. It works.

The moment you use that hash for anything involving trust, you've introduced a silent vulnerability into your system. MD5 is broken. Not "slow" broken, not "needs more salt" broken. Cryptographically shattered. Using MD5 for security is like locking your house with a piece of tape. The tape holds against the wind, but anyone can peel it off without breaking the door.

What MD5 actually does

A hash function takes input of any size and produces a fixed-size output. It's a one-way street. You can't reverse a hash to get the original data. MD5 produces a 128-bit hash, usually displayed as 32 hexadecimal characters.

MD5 was designed in 1991. In 1991, computers were slow. Today, GPUs can generate billions of MD5 hashes per second. The math behind MD5 has flaws that allow attackers to craft two completely different inputs that produce the same hash. This is called a collision. Collisions are trivial to generate. Two different files can have the exact same MD5 hash. This breaks the fundamental promise of a hash: uniqueness for verification.

MD5 is still useful for one thing: detecting accidental corruption in trusted data. If you download a file from a source you already trust and want to check if the network dropped a bit, MD5 is fine. The moment an attacker can influence the data or the hash, MD5 fails.

Minimal example

Here's the code to compute an MD5 hash. It's straightforward because the API is simple.

package main

import (
	"crypto/md5"
	"fmt"
)

func main() {
	// Input data to hash.
	data := []byte("sensitive password")
	// md5.Sum returns a 16-byte array.
	hash := md5.Sum(data)
	// Format as hex for readability.
	fmt.Printf("%x\n", hash)
}

md5.Sum takes a byte slice and returns a [16]byte. This is a fixed-size array, not a slice. The compiler enforces this distinction. If you try to assign the result to a []byte variable directly, the compiler rejects the program with cannot use md5.Sum(data) (value of array type [16]byte) as []byte value in assignment. You need to convert the array to a slice if you need a dynamic type, usually by taking a slice of the array: hash[:].

The %x verb in fmt.Printf formats the byte array as lowercase hexadecimal. This is the standard way to display hashes in logs or UIs.

MD5 is a checksum, not a seal of trust.

The hash interface

Go's cryptographic packages share a common interface defined in crypto/hash. The hash.Hash interface looks like this:

type Hash interface {
	io.Writer
	Sum([]byte) []byte
	Size() int
	BlockSize() int
	Reset()
}

The interface embeds io.Writer. This is a deliberate design choice. It means any hash function can accept data written to it in chunks. You don't need to load the entire input into memory. You can stream data into the hash.

md5.New() returns a hash.Hash instance. Because it implements io.Writer, you can pass it to any function that expects a writer. This connects hashing to the rest of Go's I/O ecosystem. io.Copy writes data to the hash. io.TeeReader can split a stream to a hash and a file simultaneously. The interface is the gift. hash.Hash implements io.Writer.

Realistic file hashing

Hashing a large file requires streaming. You can't load a 10GB file into memory just to hash it. Here's how to hash a file efficiently.

package main

import (
	"crypto/md5"
	"fmt"
	"io"
	"os"
)

// HashFile computes the MD5 hash of a file.
func HashFile(path string) ([16]byte, error) {
	// Open the file for reading.
	f, err := os.Open(path)
	if err != nil {
		return [16]byte{}, err
	}
	// Ensure the file is closed when the function returns.
	defer f.Close()

	// Create a new MD5 hasher.
	h := md5.New()
	// Copy the file content into the hasher.
	// io.Copy reads in chunks, avoiding loading the whole file into memory.
	if _, err := io.Copy(h, f); err != nil {
		return [16]byte{}, err
	}
	// Sum returns the final hash value.
	return h.Sum(nil), nil
}

func main() {
	hash, err := HashFile("data.bin")
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Printf("MD5: %x\n", hash)
}

md5.New() creates a hasher. io.Copy reads from the file and writes to the hasher in 32KB chunks by default. This keeps memory usage constant regardless of file size. h.Sum(nil) computes the final hash. Passing nil tells Sum to return a new slice containing the hash. If you pass a non-nil slice, Sum appends the hash to it.

The defer f.Close() call ensures the file handle is released even if an error occurs. This is standard Go convention. The if err != nil check is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible.

Why MD5 is dangerous

MD5 has two fatal flaws for security use cases.

First, collisions are easy. An attacker can generate two different files with the same MD5 hash. If you use MD5 to verify file integrity, the attacker can swap the file and the hash, and your check passes. This breaks digital signatures, certificate verification, and any system that relies on hash uniqueness.

Second, MD5 is vulnerable to length extension attacks. If you compute a hash of secret + message and send the hash to a client, the client can append extra data to the message and compute the new valid hash without knowing the secret. This works because MD5's internal state can be reconstructed from the hash and the message length. The attacker doesn't need the secret. They just need the hash and the length. This kills MD5 for API token signing and message authentication codes.

SHA-256 is also vulnerable to length extension, but you use HMAC for that. MD5 is double vulnerable because collisions are easy too. Never use MD5 for passwords, signatures, tokens, or any data where an attacker might try to forge or tamper with the input.

Performance and alternatives

MD5 is faster than SHA-256. On modern CPUs, SHA-256 uses SIMD instructions that narrow the gap significantly. In most applications, hashing takes microseconds. Network latency takes milliseconds. Optimizing for MD5 speed is premature optimization that buys you zero security.

SHA-256 is the new baseline. It produces a 256-bit hash. Collisions are computationally infeasible. The birthday bound is 2^128, which is far beyond current capabilities. SHA-256 is the recommended replacement for MD5 in almost every context.

For passwords, neither MD5 nor SHA-256 is appropriate. Password hashing requires a slow, memory-hard function that resists brute-force attacks. Use bcrypt or argon2. These algorithms are designed to be slow. They add salt automatically. They make it expensive to test millions of passwords per second.

Speed is irrelevant if the hash can be forged.

Decision matrix

Use MD5 when you need a fast, non-cryptographic checksum for detecting accidental corruption in trusted data. Use SHA-256 when you need to verify integrity against malicious tampering or store data for long-term archival. Use bcrypt or argon2 when hashing passwords, because they are designed to be slow and resistant to brute-force attacks. Use HMAC when you need to verify both integrity and authenticity using a secret key. Use CRC32 when you need maximum speed for error detection in network protocols or file formats where cryptographic security is irrelevant.

Where to go next

MD5 is an old method for creating a unique fingerprint of data, but hackers can easily fake these fingerprints. You should only use it for checking if a file changed during a download, not for protecting passwords or secrets. Think of it like a cheap padlock: it keeps honest people honest, but a thief can pick it instantly.