The messy data dump
You receive a CSV file from a client. It contains user records, but the formatting is inconsistent. Some fields have extra spaces. Some quotes are unescaped. The file is 200 megabytes, which is too large to load entirely into memory. You need to read the file, validate the structure, filter out invalid rows, and write the clean data to a new file. You also want to write a unit test that verifies your logic without touching the disk.
Go's standard library provides encoding/csv for this exact workflow. The package treats CSV data as a stream of records. It does not load the entire file at once unless you ask it to. It works with any source or destination that implements the io.Reader or io.Writer interface. This means you can read from a file, a network response, or a string in memory using the same code.
CSV as a stream
A CSV file is just text with delimiters. The encoding/csv package parses this text row by row. The core types are csv.Reader and csv.Writer. You create a reader by passing an io.Reader to csv.NewReader. You create a writer by passing an io.Writer to csv.NewWriter.
The io.Reader interface is a contract. Any type that implements the Read(p []byte) (n int, err error) method satisfies the interface. An *os.File implements it. A *strings.Reader implements it. An http.Response.Body implements it. The CSV reader does not care where the bytes come from. It only cares that it can request bytes.
This design decouples your CSV logic from I/O details. You can write a function that processes CSV data and test it by passing a string. You can deploy the same function to process a file or a network stream. The code stays the same.
Minimal example: reading all records
The simplest way to read a CSV file is to load every row into memory at once. This works for small files where you need random access to the data.
Here's the basic pattern: open the file, create a reader, call ReadAll, and handle the result.
package main
import (
"encoding/csv"
"fmt"
"os"
)
func main() {
// Open the file for reading.
f, err := os.Open("data.csv")
if err != nil {
// Handle open error immediately.
fmt.Println("open error:", err)
return
}
// Ensure file is closed when main exits.
defer f.Close()
// Create a reader bound to the file.
r := csv.NewReader(f)
// Read all records into a slice of slices.
records, err := r.ReadAll()
if err != nil {
// Handle parse error immediately.
fmt.Println("read error:", err)
return
}
// Access the first record.
if len(records) > 0 {
fmt.Println("First row:", records[0])
}
}
ReadAll returns a [][]string. The outer slice holds the rows. The inner slice holds the fields for each row. If the file has three rows with four columns each, you get a slice of three slices, each containing four strings.
This approach is convenient but dangerous for large files. ReadAll allocates memory for every row. A 500MB CSV file can easily consume gigabytes of RAM because Go strings carry overhead and the slice structure adds pointers. If you process large files, you must read row by row.
Configuring the reader
The default CSV reader is strict. It expects well-formed data. Real-world CSV files are rarely well-formed. The csv.Reader type exposes fields that control parsing behavior. You adjust these fields before calling Read or ReadAll.
Here's how you configure a reader for messy data:
r := csv.NewReader(f)
// Expect exactly 3 fields per row. Returns error if row has 2 or 4.
r.FieldsPerRecord = 3
// Allow quotes without escaping. Useful for messy exports.
r.LazyQuotes = true
// Trim spaces after the delimiter. "a, b" becomes "a", "b".
r.TrimLeadingSpace = true
// Change delimiter to tab for TSV files.
r.Comma = '\t'
FieldsPerRecord is the most important setting for validation. By default, it is set to -1, which means the reader accepts rows with any number of fields. If you set it to a positive integer, the reader returns an error for any row that does not match the count. This catches structural errors early.
LazyQuotes changes how quotes are handled. By default, quotes must be escaped by doubling them. If a field contains a quote, it must appear as "". With LazyQuotes set to true, the reader tolerates unescaped quotes inside quoted fields. This is useful for data exported from legacy systems that do not follow RFC 4180 strictly.
TrimLeadingSpace removes whitespace immediately after the delimiter. This prevents leading spaces from appearing in field values.
Realistic example: streaming and filtering
Processing large files requires a streaming approach. You read one row, process it, and write it to the output. This keeps memory usage constant regardless of file size.
Here's a function that reads a CSV file, filters rows based on a condition, and writes the result to a new file.
func filterCSV(inputPath, outputPath string) error {
// Open input file for reading.
in, err := os.Open(inputPath)
if err != nil {
return err
}
defer in.Close()
// Open output file for writing.
out, err := os.Create(outputPath)
if err != nil {
return err
}
defer out.Close()
// Create reader and writer.
reader := csv.NewReader(in)
writer := csv.NewWriter(out)
// Flush writer when function returns.
// Note: Flush can return an error, but defer hides it.
// For production code, call Flush explicitly and check error.
defer writer.Flush()
// Read header row.
header, err := reader.Read()
if err != nil {
return err
}
// Write header to output.
if err := writer.Write(header); err != nil {
return err
}
// Loop over remaining rows.
for {
record, err := reader.Read()
// io.EOF signals end of file.
if err == io.EOF {
break
}
if err != nil {
return err
}
// Filter logic: keep rows where the first field is not empty.
if len(record) > 0 && record[0] != "" {
if err := writer.Write(record); err != nil {
return err
}
}
}
return nil
}
The loop calls reader.Read() repeatedly. Each call returns the next row and an error. When the file ends, the error is io.EOF. You check for this specific error to break the loop. Any other error indicates a parse failure or I/O issue.
The writer buffers output to reduce system calls. writer.Write adds the record to the buffer. The buffer is written to disk when it fills up or when you call Flush. If you exit the program without flushing, you lose the buffered data. Always flush the writer before closing the output file.
Writing CSV data
Writing CSV is straightforward. You create a writer, call Write for each row, and flush when done. The Write method takes a []string representing the fields.
The writer automatically handles quoting. If a field contains the delimiter, a quote, or a newline, the writer wraps the field in quotes and escapes internal quotes by doubling them. You can control this behavior with RequoteRecord.
w := csv.NewWriter(out)
// Requote fields that contain delimiters or quotes.
// Default is false, which assumes fields are already safe.
w.RequoteRecord = true
// Write a row with a comma inside a field.
// Output: name,age,"Smith, John"
w.Write([]string{"name", "age", "Smith, John"})
RequoteRecord is useful when you receive data from an untrusted source. If the data might contain delimiters or quotes, enabling this option ensures the output is valid CSV. If you control the data and know it is clean, you can leave it disabled for a slight performance gain.
Pitfalls and errors
CSV processing has a few common traps.
Memory exhaustion with ReadAll
Calling ReadAll on a large file allocates memory for the entire dataset. If the file is larger than available RAM, the program crashes with an out-of-memory error. Use Read in a loop for large files.
Forgetting to flush
The writer buffers output. If you close the file without calling Flush, the buffered data is lost. The output file will be empty or truncated. Always flush the writer.
Ignoring io.EOF
The Read method returns io.EOF when there are no more rows. If you treat io.EOF as a generic error, your loop will exit prematurely or return an error incorrectly. Check for io.EOF explicitly.
Bad CSV format
If the CSV file has malformed quotes or inconsistent field counts, the reader returns an error. The error type is *csv.ParseError. It contains the line number and field index where the error occurred. You can use this information to report precise errors to the user.
Compiler errors
If you forget to import the package, the compiler rejects the program with undefined: csv. If you pass a string to csv.NewReader, the compiler complains with cannot use "..." (untyped string constant) as io.Reader value in argument. You must wrap the string in strings.NewReader to satisfy the interface.
Decision matrix
Choose the right approach based on file size and access patterns.
Use ReadAll when the file is small and you need random access to rows. This loads everything into memory, which is convenient for sorting or indexing.
Use a Read loop when the file is large or you want constant memory usage. This streams rows one by one, which scales to gigabytes of data.
Use Write when you generate rows incrementally. This buffers output and writes to disk efficiently.
Use WriteAll when you have a slice of records ready in memory. This writes all rows at once, which is faster for small batches.
Use strings.NewReader when testing CSV logic without files. This allows you to pass CSV data as a string literal in unit tests.
Use FieldsPerRecord when the schema is fixed and you want to validate structure. This catches rows with missing or extra fields immediately.
Use LazyQuotes when processing data from legacy systems. This tolerates unescaped quotes that would otherwise cause parse errors.
Where to go next
CSV is just one format for structured data. Go provides tools for other formats as well.
- How to Marshal/Unmarshal Enums in JSON in Go
- How to Handle Dynamic or Unknown JSON in Go
- How to Use MessagePack in Go
CSV is text. Treat it like a stream. Flush or lose your data.