The legacy API problem
You are integrating with a payment gateway built in 2005. The documentation is a PDF. The response is a wall of angle brackets, nested elements, and attributes. You need to extract the transaction ID and the status code without writing a regex that breaks on the first whitespace variation.
Go's standard library includes encoding/xml. It works the same way encoding/json works. You define a struct that mirrors the XML hierarchy. You attach struct tags to tell the parser which element maps to which field. The parser walks the XML tree and fills your struct.
XML is verbose. Go structs are concise. The bridge between them is the struct tag system. It is flexible, but it is also unforgiving about typos and namespaces.
Struct tags map the tree
XML is a tree of elements. A Go struct is a tree of fields. encoding/xml uses reflection to walk the XML and populate the struct. The struct tags provide the mapping rules.
The tag syntax is xml:"...". Inside the quotes, you specify the element name. You can also add options like ,attr for attributes, ,chardata for text content, or ,innerxml for raw XML.
The compiler does not check struct tags. They are just strings attached to the struct definition. If you typo a tag, the field remains zero-valued. No error. No warning. This is the most common source of bugs in XML parsing. A missing value usually means a tag mismatch, not empty data.
Convention aside: gofmt formats your code, but it does not validate struct tags. You must verify the tags match the XML schema manually. The community accepts this trade-off because tags allow flexible mapping without boilerplate code.
Minimal example
Here is the simplest mapping. A struct with fields, tags that match the element names, and xml.Unmarshal to decode the bytes.
package main
import (
"encoding/xml"
"fmt"
"log"
)
// Book mirrors the <book> element structure.
type Book struct {
XMLName xml.Name `xml:"book"` // XMLName captures the root element name and namespace.
Title string `xml:"title"` // xml tag maps the <title> element to this field.
Author string `xml:"author"`
}
func main() {
// XML payload with simple nested elements.
data := []byte(`
<book>
<title>Go in Action</title>
<author>William Kennedy</author>
</book>`)
var b Book
// Unmarshal reads the bytes and fills the struct.
if err := xml.Unmarshal(data, &b); err != nil {
log.Fatal(err)
}
fmt.Printf("Title: %s, Author: %s\n", b.Title, b.Author)
}
The XMLName field is special. If you include a field of type xml.Name, the parser fills it with the element name and namespace of the current node. It is useful for generic parsers that need to know the tag name at runtime. If you omit XMLName, the parser ignores the element name and relies solely on the struct tag.
Tags are strings. The compiler doesn't check them. A typo is a silent zero-value.
How the parser walks the data
xml.Unmarshal takes a byte slice and a pointer to a struct. It uses reflection to inspect the struct fields. For each field, it checks the xml tag. If the tag matches the current XML element, it sets the field value.
The parser handles types automatically. Strings map to strings. Integers map to integers. Booleans map to true/false. Slices map to repeated elements. If the XML contains a value that cannot convert to the field type, the parser returns an error.
The error messages are plain text. If you pass a map where a struct is expected, the compiler rejects this with xml: unsupported type map[string]string. If the XML is malformed, you get xml: syntax error: element <foo> has invalid char <. These errors happen at runtime, not compile time.
Reflection is slower than direct assignment. xml.Unmarshal is fine for small payloads. It is not suitable for high-throughput streaming. For large files, use xml.Decoder.
Realistic example: attributes and slices
Real XML often uses attributes for metadata and repeated elements for lists. The tag syntax supports both.
Use ,attr to map an attribute. Use a slice to map repeated elements. The parser collects all matching elements into the slice.
package main
import (
"encoding/xml"
"fmt"
"log"
)
// Product maps an item with attributes and a list of tags.
type Product struct {
XMLName xml.Name `xml:"product"`
ID string `xml:"id,attr"` // ,attr tells the parser to read the id attribute.
Name string `xml:"name"`
Tags []string `xml:"tag"` // Slices collect repeated <tag> elements.
}
func main() {
// XML with attributes and repeated elements.
data := []byte(`
<product id="123">
<name>Widget</name>
<tag>red</tag>
<tag>plastic</tag>
</product>`)
var p Product
// Unmarshal handles attributes and slices automatically.
if err := xml.Unmarshal(data, &p); err != nil {
log.Fatal(err)
}
fmt.Printf("ID: %s, Name: %s, Tags: %v\n", p.ID, p.Name, p.Tags)
}
The ,attr option changes the parser behavior. Instead of looking for a child element, it looks for an attribute on the current element. The slice field collects all child elements with the matching name. The order is preserved.
If the XML contains an attribute that is not mapped, the parser ignores it. If the struct has a field with no matching XML, the field stays zero-valued. This is by design. XML parsers in Go are lenient about missing data.
Slices collect repeated elements. The parser appends to the slice for each match.
Pitfalls: typos, whitespace, and namespaces
XML parsing in Go has three common traps. Typos in tags, whitespace in text nodes, and namespaces.
Typos are silent. If you write xml:"titel" instead of xml:"title", the Title field remains empty. The parser does not error. It simply finds no match. You must verify the tags against the XML schema.
Whitespace is included in text content. If the XML is <name> Alice </name>, the result is " Alice ". The parser preserves whitespace. You must trim the string manually if you want clean data. Use strings.TrimSpace after unmarshaling.
Namespaces are the hardest part. XML namespaces use prefixes like ns:book. The struct tag must include the prefix. If the XML is <ns:book>, the tag must be xml:"ns:book". If you omit the prefix, the parser does not match.
The xml.Name struct has a Space field for the namespace URI and a Local field for the local name. You can use xml.Name to handle namespaces dynamically.
Convention aside: context.Context is not used by xml.Unmarshal. The function is synchronous and blocks until the data is parsed. If you need cancellation, wrap the call in a goroutine and select on a context channel.
Namespaces are a trap. Match the prefix or lose the data.
Streaming with xml.Decoder
xml.Unmarshal loads the entire XML document into memory. This is fine for small payloads. It fails for large files. If you parse a 100MB XML file, the memory usage spikes. The parser holds the byte slice and the struct in memory.
Use xml.Decoder for streaming. It reads tokens one by one. You process each token and discard it. Memory usage stays constant.
The decoder returns xml.Token values. The main types are xml.StartElement, xml.EndElement, and xml.CharData. You loop over tokens until you find the data you need.
package main
import (
"encoding/xml"
"fmt"
"log"
"strings"
)
// StreamParse reads XML tokens without loading the whole file.
func StreamParse(reader *strings.Reader) {
// Decoder reads tokens sequentially from the reader.
decoder := xml.NewDecoder(reader)
for {
// Token returns the next XML token.
token, err := decoder.Token()
if err != nil {
// io.EOF signals the end of the stream.
break
}
// Switch on the token type to handle elements.
switch t := token.(type) {
case xml.StartElement:
// StartElement contains the name and attributes.
fmt.Printf("Start: %s\n", t.Name.Local)
case xml.EndElement:
// EndElement marks the close of an element.
fmt.Printf("End: %s\n", t.Name.Local)
case xml.CharData:
// CharData is the text content between elements.
fmt.Printf("Text: %s\n", string(t))
}
}
}
func main() {
// Reader provides a stream of XML data.
data := strings.NewReader(`<root><item>1</item></root>`)
StreamParse(data)
}
The decoder is verbose. You must handle each token type. You must track the state of the XML tree. It is more work than Unmarshal. It is necessary for large files or when you only need a small part of the document.
The decoder respects namespaces. The Name field in StartElement includes the namespace. You can filter by namespace if needed.
Streaming is memory-safe. The parser holds only the current token.
Decision: when to use XML parsing
XML is legacy. JSON is modern. Use the right tool for the job.
Use xml.Unmarshal when you have a small XML payload and need to map it to a struct. It is simple and fast enough for most API responses.
Use xml.Decoder when you are processing large XML files or streaming data. It keeps memory usage low and allows you to stop early.
Use encoding/json when you control the API and can switch to JSON. JSON is easier to parse, has better tooling, and is less verbose.
Use regexp only when you are scraping unstructured text that happens to look like XML. Regex is fragile and breaks on nested structures.
Use a third-party library like github.com/clbanning/mxj when you need to convert XML to JSON or map dynamically. The standard library is strict about struct definitions.
XML namespaces are a trap. Match the prefix or lose the data.