How to Call OpenAI API from Go

The Chat Completion Request

You are building a CLI tool that summarizes long error logs, or a Discord bot that answers questions about your codebase. You have the OpenAI API key in your .env file. You need to send a prompt and get a response without blocking your entire application or leaking secrets. The net/http package can do this, but wrestling with JSON serialization, streaming protocols, and error parsing manually is tedious. The community standard is to use a typed client library that handles the protocol details so you can focus on the logic.

The github.com/sashabaranov/go-openai library provides Go types that map directly to the OpenAI API. You fill in structs, call methods, and get back typed responses. The library handles the HTTP requests, headers, and JSON conversion. Think of it like a pre-printed form and envelope for a service desk. You write your instructions on the form, and the library stamps, mails, and reads the reply. You don't need to know the postal code or the handwriting of the clerk.

Minimal Example

Here's the simplest way to get a completion: create a client, build a request struct, and handle the response.

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/sashabaranov/go-openai"
)

// main demonstrates a basic chat completion request.
func main() {
	// NewClient initializes the HTTP client with your API key.
	// The key is sent in the Authorization header automatically.
	client := openai.NewClient("sk-your-key-here")

	// Context carries cancellation signals and deadlines.
	// Always pass a context to long-running operations.
	ctx := context.Background()

	// ChatCompletionRequest defines the model and conversation history.
	// GPT3Dot5Turbo is a constant for the model identifier string.
	req := openai.ChatCompletionRequest{
		Model: openai.GPT3Dot5Turbo,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleUser, Content: "Say 'Hello' in Go."},
		},
	}

	// CreateChatCompletion sends the request and waits for the full response.
	resp, err := client.CreateChatCompletion(ctx, req)
	if err != nil {
		// Log.Fatal prints the error and exits with code 1.
		log.Fatal(err)
	}

	// Choices contains the generated responses.
	// Index 0 is the first (and usually only) completion.
	fmt.Println(resp.Choices[0].Message.Content)
}

Walk Through

When you run this, the library constructs a JSON payload and sends a POST request to the OpenAI endpoint. The go-openai package serializes your Go structs into JSON and deserializes the response back into Go types. You don't write json.Marshal or json.Unmarshal yourself. The context parameter is standard Go plumbing. It allows you to cancel the request if the user hits Ctrl+C or if a timeout expires. By convention, the context is always the first argument and named ctx. Functions that take a context should respect cancellation and deadlines.

The compiler enforces strict typing. If you pass a string where a struct is expected, you get cannot use "text" (untyped string constant) as openai.ChatCompletionRequest value in argument. This catches mistakes before the code runs. If you forget to import the package, the build fails with undefined: openai. If you import it but don't use it, you get imported and not used. Go forces you to use every import, which keeps dependencies clean and explicit.

Trust gofmt. Argue logic, not formatting. Run gofmt on your code to apply the community standard for indentation and spacing. Most editors run it on save.

Realistic Example

In production code, you'll wrap the call in a function, load the key from an environment variable, and handle errors without crashing the whole program.

// Summarize sends a prompt to OpenAI and returns the text response.
// It returns an error if the API call fails or the response is empty.
func Summarize(ctx context.Context, client *openai.Client, prompt string) (string, error) {
	// Use GPT4Turbo for better quality; fall back to GPT3Dot5Turbo if needed.
	model := openai.GPT4Turbo

	resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
		Model: model,
		Messages: []openai.ChatCompletionMessage{
			{Role: openai.ChatMessageRoleSystem, Content: "You are a helpful assistant."},
			{Role: openai.ChatMessageRoleUser, Content: prompt},
		},
		// MaxTokens limits the length of the response to control costs.
		MaxTokens: 500,
	})
	if err != nil {
		// Return the error so the caller can decide how to handle it.
		return "", fmt.Errorf("chat completion failed: %w", err)
	}

	// Check if the API returned any choices.
	if len(resp.Choices) == 0 {
		return "", fmt.Errorf("empty response from API")
	}

	return resp.Choices[0].Message.Content, nil
}

Notice the error handling. The function returns an error instead of calling log.Fatal. This lets the caller decide whether to retry, log, or return a user-friendly message. The %w verb wraps the error, preserving the original error chain for debugging. The check for len(resp.Choices) == 0 is a runtime safety net. The API might succeed but return no content due to filtering or model issues. The if err != nil { return err } pattern is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible.

Request Structure and Parameters

The Messages array is the heart of the request. Each message has a role: System, User, or Assistant. The system message sets the behavior of the model. It's like instructions to a junior developer before they start the task. The user message is the input. The assistant message is the output. You can include previous assistant messages to maintain context in a conversation. The library provides constants like openai.ChatMessageRoleUser to prevent typos.

Parameters like Temperature control randomness. A value of 0 makes the output deterministic. A value closer to 1 makes it more creative. MaxTokens caps the response length. This is crucial for cost control. If you don't set it, the model might generate a novel when you only wanted a sentence. Every call costs money based on tokens. Tokens are roughly words or parts of words. The Usage field in the response tells you how many tokens were consumed. You can log this to track costs. If you're building a high-traffic app, cache responses for repeated prompts. The API is expensive if you call it for the same question every second.

Contexts are not just for cancellation. They carry deadlines. If you call context.WithTimeout(ctx, 10*time.Second), the request will abort after 10 seconds. This prevents your server from holding connections open indefinitely if the API is slow. The go-openai client respects the context. When the deadline passes, the underlying HTTP request is cancelled, and CreateChatCompletion returns an error immediately. This is essential for web servers handling multiple requests. A single slow AI call shouldn't block the entire worker pool.

Context is plumbing. Run it through every long-lived call site.

Streaming Responses

For long responses, CreateChatCompletion blocks until the entire text is generated. This can feel slow to users. Use CreateChatCompletionStream to receive chunks as they arrive.

Start by creating the request with Stream: true and opening the stream channel.

req := openai.ChatCompletionRequest{
	Model: openai.GPT3Dot5Turbo,
	Messages: []openai.ChatCompletionMessage{
		{Role: openai.ChatMessageRoleUser, Content: prompt},
	},
	// Stream enables chunked transfer encoding.
	Stream: true,
}

// CreateChatCompletionStream returns a channel that yields responses.
stream, err := client.CreateChatCompletionStream(ctx, req)
if err != nil {
	return "", fmt.Errorf("stream creation failed: %w", err)
}
// Close the stream when done to release resources.
defer stream.Close()

Iterate over the stream, checking for io.EOF to detect the end, and accumulate the delta content.

var fullText string
// Range over the stream until io.EOF signals completion.
for {
	response, err := stream.Recv()
	if err == io.EOF {
		break
	}
	if err != nil {
		return fullText, fmt.Errorf("stream error: %w", err)
	}
	// Append the delta content to the accumulator.
	fullText += response.Choices[0].Delta.Content
}
return fullText, nil

The stream returns a channel. You call Recv() to get the next chunk. Each chunk contains a Delta with the new tokens. You accumulate them to build the full response. The loop breaks when Recv() returns io.EOF, which signals the stream ended successfully. If Recv() returns a different error, the stream failed. Always close the stream with defer stream.Close() to release resources. Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path.

Pitfalls and Errors

Hardcoding the API key in source code is a security risk. Always use environment variables. If your key is wrong, the API returns a 401 status. The library wraps this in an error. You'll see a message like status code: 401. Always validate your environment variables before starting the client.

If the context times out, the error message will mention context deadline exceeded. This happens when the network is slow or the model takes too long to generate a response. Set a reasonable deadline on your context to avoid hanging goroutines. OpenAI errors can be rate limits, invalid keys, or content filters. The go-openai library parses these into Go errors. You can check the error type to handle specific cases. For example, a rate limit error might suggest retrying with exponential backoff. A content filter error means the prompt or response violated safety policies.

Don't pass a *string. Strings are already cheap to pass by value. The library uses value types for strings and structs where appropriate. Public names start with a capital letter. Private start lowercase. No keywords like public or private. The receiver name is usually one or two letters matching the type: (b *Buffer) Write(...), NOT (this *Buffer) or (self *Buffer). Interfaces are accepted, structs are returned. "Accept interfaces, return structs" is the most common Go style mantra.

The worst goroutine bug is the one that never logs.

Decision Matrix

Use go-openai when you want a typed, idiomatic client that handles JSON serialization and error parsing automatically.

Use net/http directly when you need fine-grained control over headers, retries, or custom middleware that the library doesn't expose.

Use CreateChatCompletionStream when the response is long and you want to display tokens to the user as they generate, rather than waiting for the full completion.

Use a background goroutine with a channel when you need to fetch completions asynchronously without blocking the main request loop.

Use plain sequential code when you don't need concurrency: the simplest thing that works is usually the right thing.

Where to go next

Calling the OpenAI API from Go connects your program to OpenAI's servers to generate text responses. You provide a secret key to prove your identity, send a message, and get a reply back. It works like sending an email to a smart assistant and waiting for a return message.