The wall you hit when the OS says no
Your Go server handles thousands of requests without breaking a sweat. Then the load spikes. New connections start failing. The logs fill with too many open files. The program doesn't crash immediately, but it stops working. You didn't write a logic bug. You hit a resource wall built by the operating system.
This error means your process has exhausted the file descriptors available to it. In Go, a file descriptor is an integer handle the kernel uses to track any open resource. Files, network sockets, pipes, and even some internal runtime structures all consume file descriptors. The OS enforces a hard cap on how many a single process can hold. The default on many Linux distributions is 1024. That number sounds high until you realize a single HTTP server can burn through them in seconds under load.
Go does not manage this limit. The Go runtime asks the kernel for a file descriptor when you open a file or dial a connection. The kernel checks its internal counter. If the counter is below the limit, the kernel hands you a descriptor. If the counter hits the limit, the kernel rejects the request. Go receives the rejection and returns an error. The fix requires understanding what consumes descriptors, how to spot a leak, and how to raise the limit safely.
What a file descriptor actually is
Think of file descriptors like checkout cards at a library. The library has a finite number of cards. When a patron checks out a book, the librarian gives them a card and marks the book as unavailable. When the patron returns the book, the librarian takes back the card and marks the book as available. If every card is out, the librarian cannot check out any more books. The library doesn't run out of books. It runs out of tracking tokens.
The operating system works the same way. Every open resource gets a descriptor. Standard input, output, and error take descriptors 0, 1, and 2. When you call os.Open, the kernel allocates the next available integer, usually 3, then 4, and so on. When you call Close, the kernel releases that integer back to the pool. If you open resources without closing them, the pool empties. The next open attempt fails with too many open files.
Network connections are the most common culprit in Go applications. A TCP socket is a file descriptor. An HTTP request is a file descriptor. A database connection is a file descriptor. Go's netpoller, which handles asynchronous I/O, also uses file descriptors to monitor sockets. The runtime needs a few descriptors for its own bookkeeping. Your code needs descriptors for every active connection. The sum of both cannot exceed the limit.
A minimal leak
The simplest way to trigger the error is to open files in a loop without closing them. The garbage collector eventually reclaims the memory for the *os.File objects, but it does not guarantee timely closure of the underlying kernel resources. Relying on the GC to close files is a recipe for exhaustion.
package main
import (
"fmt"
"log"
"os"
)
// leakFiles opens files without closing them.
// This demonstrates how quickly the descriptor limit is reached.
func leakFiles() {
for i := 0; i < 2000; i++ {
// WHY: os.Create allocates a new file descriptor from the kernel.
// Each iteration consumes one descriptor.
f, err := os.Create(fmt.Sprintf("temp_%d.txt", i))
if err != nil {
// WHY: The error message comes from the OS, wrapped by Go.
// It indicates the process hit the open file limit.
log.Fatalf("failed to create file %d: %v", i, err)
}
// WHY: Missing f.Close() means the descriptor is never returned.
// The file stays open until the process exits or the GC finalizes the object.
// The GC is not a reliable mechanism for closing resources.
}
}
func main() {
leakFiles()
}
Run this program on a system with the default limit. It will succeed for the first 1021 iterations. The 1022nd call to os.Create will fail. The program logs failed to create file 1021: open temp_1021.txt: too many open files and exits. The error is not a Go panic. It is a standard error value returned by the os package. The kernel refused the syscall.
What happens at runtime
When you call os.Open or net.Dial, the Go runtime translates the request into a system call. On Linux, this is usually open, openat, or socket. The kernel checks the process's resource limits, stored in the rlimit structure. The soft limit is the active cap. The hard limit is the maximum the soft limit can be raised to without root privileges.
If the current count of open descriptors is less than the soft limit, the kernel increments the count and returns a new descriptor. If the count equals the soft limit, the kernel returns an error code, typically EMFILE for process limits or ENFILE for system-wide limits. Go catches this error code and converts it into a *os.PathError or *net.OpError with the message too many open files.
The error propagates up the call stack. If you ignore the error, the program continues, but the operation failed. A file wasn't opened. A connection wasn't established. If you are in a server loop, the next request might fail. If you are in a critical path, the request returns a 500 error. The server stays alive, but it is degraded. The worst case is a cascade failure where the server cannot accept new connections at all because the listen socket backlog fills up and the kernel drops packets.
Realistic scenario: connection leaks
File leaks are obvious. Connection leaks are subtle. In a web service, you might fetch data from an upstream API. If you forget to close the response body, the underlying TCP connection remains open. The connection might stay in ESTABLISHED state or linger in TIME_WAIT. Each lingering connection holds a file descriptor. Under high concurrency, these add up fast.
package main
import (
"io"
"log"
"net/http"
)
// fetchUpstream makes a request without closing the body.
// This leaks the connection and consumes a file descriptor.
func fetchUpstream(url string) ([]byte, error) {
// WHY: http.Get creates a new request and executes it.
// It allocates a TCP socket, which is a file descriptor.
resp, err := http.Get(url)
if err != nil {
return nil, err
}
// WHY: The response body must be closed to release the connection.
// If you return early or ignore the body, the connection leaks.
// The kernel keeps the socket open, holding the descriptor.
// WHY: Reading the body is fine, but without Close, the resource is not freed.
data, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
// WHY: Missing resp.Body.Close().
// The connection returns to the pool only if the transport handles it,
// but if the body is not fully read or closed, the connection may be dropped
// and the descriptor held until the OS cleans it up, which takes time.
return data, nil
}
// handleRequest is an HTTP handler that calls the leaky function.
func handleRequest(w http.ResponseWriter, r *http.Request) {
data, err := fetchUpstream("http://upstream.example.com/data")
if err != nil {
http.Error(w, "upstream error", http.StatusBadGateway)
return
}
w.Write(data)
}
func main() {
http.HandleFunc("/fetch", handleRequest)
// WHY: The server listens on a port, consuming a socket descriptor.
// Each incoming request also consumes a descriptor.
log.Fatal(http.ListenAndServe(":8080", nil))
}
In this example, every request to /fetch leaks a connection. If the server handles 1000 requests per second, and the connections linger for even a few seconds, the descriptor count spikes. The server hits the limit. New requests fail with too many open files. The fix is to close the body. The convention in Go is to use defer immediately after checking the error.
resp, err := http.Get(url)
if err != nil {
return nil, err
}
defer resp.Body.Close()
// WHY: defer ensures the body is closed when the function returns,
// regardless of whether the read succeeds or fails.
// This releases the connection back to the pool or closes the socket.
Finding the leak
When the error appears, you need to know what is holding the descriptors. The Go program is the process, but the OS tracks the resources. Use lsof to list open files for the process.
lsof -p <pid> | wc -l
This command counts the open descriptors for the process with the given PID. If the number is near the limit, you have a leak. Pipe the output to grep to see what type of resources are open.
lsof -p <pid> | grep -E "REG|IPv4|IPv6"
REG entries are regular files. IPv4 and IPv6 entries are network sockets. If you see thousands of IPv4 entries in TIME_WAIT or ESTABLISHED state, you have a connection leak. If you see REG entries pointing to temp files or logs, you have a file leak.
For network issues, ss is faster than netstat. Use ss -s to see a summary of socket usage.
ss -s
The output shows the total number of sockets, TCP connections, and UDP sockets. It also shows the inuse count. If inuse is high, check the state distribution. A high number of CLOSE_WAIT connections indicates the remote side closed the connection, but your Go program did not close its end. This is a classic leak pattern.
Raising the limit
Fixing the code is the permanent solution. Raising the limit is a bandage that buys time. Sometimes the bandage is necessary. High-throughput servers legitimately need more than 1024 descriptors. A reverse proxy handling 50,000 concurrent connections needs at least 50,000 descriptors.
The limit is controlled by the OS. Go cannot raise the hard limit. Go can raise the soft limit within the hard limit using syscall.Setrlimit, but this is rare. The standard approach is to configure the limit at the shell or system level.
For a shell session, use ulimit.
ulimit -n 65536
go run main.go
This sets the soft limit to 65,536 for the current session. It does not persist across reboots. It does not affect other users. It is useful for testing.
For a permanent fix on Linux, edit /etc/security/limits.conf. Add lines for the user running the service.
username soft nofile 65536
username hard nofile 65536
The soft limit is the active cap. The hard limit is the maximum the user can raise the soft limit to. Set both to the same value for simplicity. The changes take effect on the next login.
If you use systemd to manage your service, limits.conf is often ignored. Systemd enforces its own limits. Edit the service unit file and add LimitNOFILE.
[Service]
LimitNOFILE=65536
Reload the daemon and restart the service.
sudo systemctl daemon-reload
sudo systemctl restart myservice
Systemd limits apply immediately to the process. They override PAM limits. Always check the service manager configuration first.
Pitfalls and compiler errors
The compiler does not catch file descriptor leaks. Go has no finalizer guarantee. The compiler cannot know if you intend to close a file later. You must close resources explicitly.
Common runtime errors include:
open /path/to/file: too many open filesfromos.Open.accept: too many open filesfromhttp.Serverornet.Listen.getsockopt: too many open filesfrom network operations.dial tcp: too many open filesfromnet.Dial.
These errors are not panics. They are returned as error values. If you ignore them, the program continues in a broken state. Always check errors from os and net calls.
Another pitfall is assuming defer is free. defer has a small runtime cost. In a tight loop processing millions of files, defer can add up. In that case, close the file explicitly after use.
f, err := os.Open("data.txt")
if err != nil {
return err
}
// WHY: Explicit close in a tight loop avoids defer overhead.
// Ensure all return paths call f.Close().
data, err := io.ReadAll(f)
f.Close()
if err != nil {
return err
}
This pattern is verbose but efficient. Use defer for readability in most cases. Use explicit close only when profiling shows defer overhead is significant.
The convention in Go is to pair every open with a close. If you see an Open, Create, Dial, or Get, look for the corresponding Close. If it is missing, you have a leak. The community accepts the boilerplate of defer f.Close() because it makes the resource lifecycle visible. Don't fight the verbosity. It prevents leaks.
Decision matrix
Use ulimit when you are debugging locally and need to reproduce a high-load scenario quickly. Use limits.conf when you manage a bare-metal server without systemd and need persistent limits for a user. Use systemd LimitNOFILE when your service is managed by systemd, which is the standard on modern Linux distributions. Use defer file.Close() in your code to ensure every opened resource is released when the function returns. Use explicit Close() calls in tight loops where defer overhead matters. Use connection pooling with http.Client to reuse TCP connections and reduce descriptor churn. Use lsof and ss to diagnose leaks when the error occurs in production.
The kernel tracks every handle. Close what you open. The OS keeps the receipts. Pay the bill or get shut down.