You use SIMD instructions in Go assembly by writing them directly in .s files using the architecture-specific mnemonics (like VADDPS for x86 or VADD for Loong64) and referencing vector registers (like XMM0 or V0). The Go compiler handles the mapping of these instructions to the internal SSA representation and ABI, so you simply define a function with the TEXT directive and emit the instructions.
TEXT ยทsimdAdd<>(SB), NOSPLIT, $0-32
MOVL $16, AX
VMOVDQU 0(SP), X0
VMOVDQU 16(SP), X1
VADDPS X0, X1, X0
VMOVDQU X0, 32(SP)
RET
This example adds two 128-bit vectors on x86-64. For Loong64, you would use V0 registers and instructions like VADD.W as defined in cmd/asm/internal/arch/loong64.go.