DEV Community: Rost

Go Error Handling Architecture: Boundaries and Patterns

Rost — Wed, 01 Jul 2026 10:03:31 +0000

Go error handling is easy to complain about.
Every Go developer has written this code hundreds of times:

if err != nil {
    return err
}

That is not the interesting part. The interesting part is what the error means, where it should be handled, where it should be wrapped, where it should be translated, where it should be logged, and what should be exposed to the caller — that is the architecture question.

Go treats errors as values. That makes failures explicit. It also means your codebase needs a clear error-handling design. Without one, errors become random strings, HTTP handlers leak database details, logs duplicate the same failure five times, retries happen for the wrong reasons, and callers inspect text instead of behavior.

This article is not a beginner introduction to if err != nil.

It is a practical guide to Go error handling architecture: wrapping, sentinels, custom error types, errors.Is, errors.As, error boundaries, API mapping, logging, retries, security, and production patterns.

The slightly opinionated version: do not try to make Go errors disappear. Make them meaningful at the right boundary.

What Go errors are

In Go, an error is just a value implementing this interface:

type error interface {
    Error() string
}

That small interface is the reason Go error handling feels so direct.

Functions return errors explicitly:

func LoadUser(id string) (*User, error) {
    // ...
}

Callers decide what to do:

user, err := LoadUser(id)
if err != nil {
    return nil, err
}

There are no exceptions and no hidden stack unwinding. Failure is part of the function signature.

That is good, but it also means errors need design. If every package returns arbitrary messages, callers cannot make reliable decisions. If every layer wraps every error without discipline, operators get noisy messages and developers get confused chains. If no layer wraps errors, failures lose context.

The goal is not less error handling, but better error meaning.

The three jobs of an error

A useful error usually has one or more jobs.

Job 1: Explain what failed

For humans, the error should explain what operation failed.

Example:

return fmt.Errorf("load user %s: %w", id, err)

This gives context. It says the failure happened while loading a user.

Job 2: Preserve the cause

For code, the error should preserve the underlying cause when that cause matters.

Example:

return fmt.Errorf("load user %s: %w", id, err)

The %w wraps the original error so callers can inspect it with errors.Is or errors.As.

Job 3: Let a boundary make a decision

At some boundary, the program must decide what to do.

Examples:

Return HTTP 404
Return HTTP 409
Retry the operation
Log at warning level
Show a user-safe message
Abort the transaction
Send the error to monitoring
Ignore cancellation

That decision should usually be based on error identity or type, not string matching.

The main error tools in modern Go

Modern Go gives you a small but powerful set of tools.

errors.New

Use errors.New to create a simple error value:

var ErrNotFound = errors.New("not found")

This is useful for sentinel errors.

fmt.Errorf with %w

Use fmt.Errorf with %w to wrap an error:

return fmt.Errorf("query user: %w", err)

Wrapping adds context while preserving the original error for inspection.

errors.Is

Use errors.Is to check whether an error matches a specific target somewhere in its chain:

if errors.Is(err, ErrNotFound) {
    // handle not found
}

Use this for sentinel errors and known conditions.

errors.As

Use errors.As to extract a specific error type from a chain:

var validationErr *ValidationError
if errors.As(err, &validationErr) {
    // use validationErr.Field or validationErr.Reason
}

Use this when the error carries structured data.

errors.Join

Use errors.Join when multiple errors happened and all should be preserved:

return errors.Join(closeErr, flushErr)

Joined errors can still be inspected with errors.Is and errors.As.

Use this carefully. A joined error means several failures are part of one result.

Sentinel errors

A sentinel error is a package-level error value that represents a known condition.

Example:

var ErrUserNotFound = errors.New("user not found")
var ErrDuplicateEmail = errors.New("duplicate email")

Sentinel errors are useful when the caller only needs to know what category of failure happened.

Example:

func (r *UserRepository) GetUser(ctx context.Context, id string) (*User, error) {
    user, err := r.queryUser(ctx, id)
    if err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return nil, ErrUserNotFound
        }
        return nil, fmt.Errorf("query user: %w", err)
    }

    return user, nil
}

Then a service or handler can check:

if errors.Is(err, ErrUserNotFound) {
    // return 404
}

When to use sentinel errors

Use sentinel errors when:

The condition is stable.
The caller needs to branch on it.
No extra structured data is needed.
The error belongs to your package or domain.

Good examples:

var ErrNotFound = errors.New("not found")
var ErrAlreadyExists = errors.New("already exists")
var ErrPermissionDenied = errors.New("permission denied")
var ErrConflict = errors.New("conflict")

When not to use sentinel errors

Do not create sentinels for every possible failure.

Bad:

var ErrCouldNotOpenFile = errors.New("could not open file")
var ErrCouldNotReadFile = errors.New("could not read file")
var ErrCouldNotParseLine = errors.New("could not parse line")

If callers do not branch on these, they may just be messages.

Also be careful about exporting too many sentinels. Exported sentinel errors become part of your package API.

Custom error types

A custom error type is useful when the error carries structured information.

Example:

type ValidationError struct {
    Field  string
    Reason string
}

func (e *ValidationError) Error() string {
    return fmt.Sprintf("validation failed for %s: %s", e.Field, e.Reason)
}

Caller:

var validationErr *ValidationError
if errors.As(err, &validationErr) {
    fmt.Println(validationErr.Field)
}

This is better than parsing an error string.

When to use custom error types

Use custom error types when:

Callers need structured data.
The error has meaningful fields.
The type is part of your package contract.
The caller may need to handle multiple values differently.

Examples:

Validation error with field name
Rate limit error with retry time
HTTP error with status code
Parse error with line and column
Domain error with resource ID

When not to use custom error types

Do not create custom types just to avoid errors.New.

This is unnecessary:

type NotFoundError struct{}

func (e NotFoundError) Error() string {
    return "not found"
}

If there is no useful data, a sentinel is often enough.

Error wrapping

Wrapping adds context to an error while preserving the original error.

Example:

func LoadConfig(path string) error {
    data, err := os.ReadFile(path)
    if err != nil {
        return fmt.Errorf("read config %s: %w", path, err)
    }

    if err := parseConfig(data); err != nil {
        return fmt.Errorf("parse config %s: %w", path, err)
    }

    return nil
}

If os.ReadFile fails, the caller gets both:

the high-level operation: read config
the low-level cause: permission denied, file not found, etc.

Both are available through the error chain, which is what makes wrapping with %w worth doing consistently.

Wrap with useful context

Good wrapping says what operation failed:

return fmt.Errorf("create invoice %s: %w", invoiceID, err)

Bad wrapping adds noise:

return fmt.Errorf("error: %w", err)

This tells the caller nothing.

Also avoid repeating the same noun at every layer:

return fmt.Errorf("user service: get user: user repository: query user: %w", err)

That kind of chain is technically correct and practically annoying.

Wrap where context changes meaning. If you cannot explain in one phrase what operation failed, you are probably either wrapping too aggressively or not enough.

When to wrap and when not to wrap

This is one of the most important architecture decisions.

Wrap when crossing a meaningful boundary

Wrap when the error moves from one operation to a higher-level operation.

Example:

func (s *UserService) GetUser(ctx context.Context, id string) (*User, error) {
    user, err := s.repo.GetUser(ctx, id)
    if err != nil {
        return nil, fmt.Errorf("get user %s: %w", id, err)
    }

    return user, nil
}

The repository error is now part of a service operation, and that added context is useful when operators trace a failure back through the logs.

Do not wrap just to say "failed"

Bad:

if err != nil {
    return fmt.Errorf("failed: %w", err)
}

The word "failed" is usually implied by the fact that an error exists.

Do not wrap if you are translating

Sometimes you should translate one error into another domain error.

Example:

if errors.Is(err, sql.ErrNoRows) {
    return nil, ErrUserNotFound
}

This intentionally hides the database detail and exposes a domain condition.

You may still preserve the cause if useful, but do it deliberately.

Do not expose implementation details accidentally

If you wrap a low-level error with %w, callers can inspect it.

That is usually good inside your application.

But in a public package API, wrapping may expose implementation details as part of your contract.

For example, if your package wraps sql.ErrNoRows, callers may start depending on it:

if errors.Is(err, sql.ErrNoRows) {
    // caller now knows you use database/sql
}

If you may change storage later, prefer a domain sentinel:

var ErrUserNotFound = errors.New("user not found")

Then return that from the package boundary.

Error boundaries

The most useful way to think about Go error handling is through boundaries.

A boundary is a place where an error changes meaning or audience.

Common boundaries include:

database to repository
repository to service
service to HTTP handler
service to CLI command
internal error to user-facing message
transient failure to retry decision
operation failure to log event
domain error to API response

Error architecture is mostly boundary design. Each boundary is a decision point where errors either gain context, lose implementation details, or get translated into a form the next layer can act on.

Repository boundary

The repository talks to storage.

It should usually translate database-specific errors into domain errors.

Example:

var ErrUserNotFound = errors.New("user not found")
var ErrDuplicateEmail = errors.New("duplicate email")

type UserRepository struct {
    db *sql.DB
}

func (r *UserRepository) GetUser(ctx context.Context, id string) (*User, error) {
    const query = `
        select id, email, name
        from users
        where id = $1
    `

    var user User

    err := r.db.QueryRowContext(ctx, query, id).Scan(
        &user.ID,
        &user.Email,
        &user.Name,
    )
    if err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return nil, ErrUserNotFound
        }

        return nil, fmt.Errorf("query user by id: %w", err)
    }

    return &user, nil
}

The repository hides sql.ErrNoRows and exposes ErrUserNotFound — a clean boundary that means the service does not need to know anything about how storage represents "not found".

Service boundary

The service owns business meaning.

It should usually add operation context and preserve domain errors.

Example:

type UserService struct {
    repo *UserRepository
}

func (s *UserService) GetUser(ctx context.Context, id string) (*User, error) {
    user, err := s.repo.GetUser(ctx, id)
    if err != nil {
        if errors.Is(err, ErrUserNotFound) {
            return nil, err
        }

        return nil, fmt.Errorf("get user %s: %w", id, err)
    }

    return user, nil
}

This preserves the domain condition while adding context for unexpected errors.

For more complex business rules, the service may create domain errors directly:

var ErrAccountDisabled = errors.New("account disabled")

func (s *UserService) Login(ctx context.Context, email string) (*Session, error) {
    user, err := s.repo.GetUserByEmail(ctx, email)
    if err != nil {
        return nil, fmt.Errorf("get user by email: %w", err)
    }

    if user.Disabled {
        return nil, ErrAccountDisabled
    }

    // ...
    return session, nil
}

The service is the right place for business-level errors — created directly from domain logic rather than translated from infrastructure conditions.

HTTP handler boundary

The HTTP handler translates application errors into HTTP responses.

This is a boundary where internal details should become user-safe responses.

Example:

func GetUserHandler(svc *UserService) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        user, err := svc.GetUser(r.Context(), r.PathValue("id"))
        if err != nil {
            writeHTTPError(w, err)
            return
        }

        writeJSON(w, http.StatusOK, user)
    }
}

Error mapping:

func writeHTTPError(w http.ResponseWriter, err error) {
    switch {
    case errors.Is(err, ErrUserNotFound):
        http.Error(w, "user not found", http.StatusNotFound)

    case errors.Is(err, ErrAccountDisabled):
        http.Error(w, "account disabled", http.StatusForbidden)

    case errors.Is(err, context.Canceled):
        return

    case errors.Is(err, context.DeadlineExceeded):
        http.Error(w, "request timed out", http.StatusGatewayTimeout)

    default:
        http.Error(w, "internal server error", http.StatusInternalServerError)
    }
}

The handler maps domain errors to HTTP semantics rather than exposing raw database or internal error details. This is where many Go applications go wrong — they either expose too much internal detail or collapse all errors into HTTP 500. For a complete picture of handler patterns and middleware in Go APIs, Building REST APIs in Go covers authentication, routing, and error handling across the standard library, Gin, Echo, and Fiber.

CLI boundary

A CLI has a different boundary than an HTTP API.

In a CLI, the error should be useful to the person running the command.

Example:

func RunImport(ctx context.Context, args []string) error {
    if len(args) == 0 {
        return ErrMissingInputFile
    }

    if err := importFile(ctx, args[0]); err != nil {
        return fmt.Errorf("import %s: %w", args[0], err)
    }

    return nil
}

At the command boundary:

func main() {
    if err := run(); err != nil {
        fmt.Fprintln(os.Stderr, formatCLIError(err))
        os.Exit(exitCode(err))
    }
}

Map known errors to exit codes:

func exitCode(err error) int {
    switch {
    case errors.Is(err, ErrMissingInputFile):
        return 2
    case errors.Is(err, ErrValidation):
        return 3
    default:
        return 1
    }
}

A CLI can often show more detail than a public API, but it should still avoid leaking secrets.

API error type pattern

For HTTP APIs, a small app-level error type can be useful.

Example:

type APIError struct {
    Status  int
    Code    string
    Message string
    Err     error
}

func (e *APIError) Error() string {
    if e.Err == nil {
        return e.Message
    }

    return e.Message + ": " + e.Err.Error()
}

func (e *APIError) Unwrap() error {
    return e.Err
}

Constructor:

func NewAPIError(status int, code string, message string, err error) *APIError {
    return &APIError{
        Status:  status,
        Code:    code,
        Message: message,
        Err:     err,
    }
}

Usage:

return NewAPIError(
    http.StatusConflict,
    "duplicate_email",
    "email is already registered",
    ErrDuplicateEmail,
)

Handler:

func writeAPIError(w http.ResponseWriter, err error) {
    var apiErr *APIError
    if errors.As(err, &apiErr) {
        writeJSON(w, apiErr.Status, map[string]string{
            "code":    apiErr.Code,
            "message": apiErr.Message,
        })
        return
    }

    writeJSON(w, http.StatusInternalServerError, map[string]string{
        "code":    "internal_error",
        "message": "internal server error",
    })
}

This pattern is useful when you want structured API errors with stable codes.

Use it at the API boundary. Do not force every internal package to return API-specific errors.

Domain errors vs transport errors

Keep domain errors separate from transport errors.

Domain error:

var ErrInsufficientBalance = errors.New("insufficient balance")

Transport mapping:

if errors.Is(err, ErrInsufficientBalance) {
    http.Error(w, "insufficient balance", http.StatusConflict)
    return
}

Do not make your domain layer return HTTP status codes:

return &APIError{Status: http.StatusConflict}

That couples business logic to HTTP and prevents your service layer from working cleanly across HTTP, CLI, workers, tests, and future gRPC adapters. Transport mapping belongs at the transport boundary, not in domain code. For guidance on where to define domain errors, sentinels, and transport adapters within your project layout, Go Project Structure: Practices & Patterns covers the internal/, pkg/, and adapter conventions that keep these layers cleanly separated.

Retryable errors

Some errors should trigger retry. Some should not.

Do not decide this by matching strings.

Use a marker interface or explicit function.

Example:

type RetryableError struct {
    Err error
}

func (e *RetryableError) Error() string {
    return e.Err.Error()
}

func (e *RetryableError) Unwrap() error {
    return e.Err
}

Helper:

func Retryable(err error) error {
    if err == nil {
        return nil
    }

    return &RetryableError{Err: err}
}

func IsRetryable(err error) bool {
    var retryable *RetryableError
    return errors.As(err, &retryable)
}

Usage:

if err := callRemoteAPI(ctx); err != nil {
    if isTemporaryNetworkError(err) {
        return Retryable(fmt.Errorf("call remote api: %w", err))
    }

    return fmt.Errorf("call remote api: %w", err)
}

Retry loop:

err := doWork(ctx)
if err != nil {
    if IsRetryable(err) {
        // retry with backoff
    }
    return err
}

This is much better than checking whether the error string contains "timeout" — string matching breaks silently when messages change and creates invisible coupling between producer and consumer.

Validation errors

Validation errors often need structured data.

Example:

type FieldError struct {
    Field   string
    Message string
}

type ValidationError struct {
    Fields []FieldError
}

func (e *ValidationError) Error() string {
    return "validation failed"
}

Usage:

func ValidateCreateUser(req CreateUserRequest) error {
    var fields []FieldError

    if req.Email == "" {
        fields = append(fields, FieldError{
            Field:   "email",
            Message: "email is required",
        })
    }

    if len(fields) > 0 {
        return &ValidationError{Fields: fields}
    }

    return nil
}

Handler:

var validationErr *ValidationError
if errors.As(err, &validationErr) {
    writeJSON(w, http.StatusBadRequest, validationErr)
    return
}

This is a good use of errors.As because the caller needs structured information — field names and validation messages — not just an opaque error string.

Multiple errors

Sometimes several things fail.

Examples:

closing multiple resources
validating many fields
shutting down several workers
running independent checks
flushing and closing output

Use errors.Join when all errors should be preserved.

Example:

func CloseAll(closers ...io.Closer) error {
    var errs []error

    for _, closer := range closers {
        if err := closer.Close(); err != nil {
            errs = append(errs, err)
        }
    }

    return errors.Join(errs...)
}

Caller:

if err := CloseAll(a, b, c); err != nil {
    return fmt.Errorf("close resources: %w", err)
}

Both errors.Is and errors.As can inspect joined errors, which means joined error values remain fully compatible with standard error-checking patterns.

When not to use errors.Join

Do not use errors.Join when there is one primary error and some logging context.

Do not use it to avoid deciding which error matters.

Do not return huge joined errors to users.

Joined errors are useful, but they can become noisy quickly.

Panic is not error handling

In normal application code, do not use panic for expected errors.

Bad:

if err != nil {
    panic(err)
}

Use panic for programmer errors or truly unrecoverable situations.

Examples:

impossible internal invariant violation
invalid package initialization
test helper failure with t.Fatal or panic in limited cases
unrecoverable startup configuration error, depending on style

Do not panic because a database query failed or a user submitted invalid input.

Those are normal errors.

Logging errors

A common Go mistake is logging the same error at every layer.

Bad:

func (r *Repo) GetUser(ctx context.Context, id string) (*User, error) {
    user, err := r.query(ctx, id)
    if err != nil {
        log.Printf("query failed: %v", err)
        return nil, err
    }
    return user, nil
}

func (s *Service) GetUser(ctx context.Context, id string) (*User, error) {
    user, err := s.repo.GetUser(ctx, id)
    if err != nil {
        log.Printf("service failed: %v", err)
        return nil, err
    }
    return user, nil
}

This creates duplicate logs for one failure.

Better:

wrap errors as they move up
log once at the boundary where the error is handled
include structured context in the log

Example:

func (s *Server) handleError(r *http.Request, err error) {
    s.logger.ErrorContext(
        r.Context(),
        "request failed",
        "method", r.Method,
        "path", r.URL.Path,
        "err", err,
    )
}

This gives one log event with the full error chain. For a production-ready structured logging setup, Structured Logging in Go with slog covers log/slog records, JSON handlers, context correlation, and redaction — all of which pair naturally with boundary-level error logging.

When to log inside lower layers

Log inside lower layers only when the layer is actually handling the error or adding important operational context that will not be visible elsewhere.

For example, a retry loop may log each retry attempt at debug or warning level.

But a repository should not log every query error if the handler will log the final request failure.

User-facing errors vs operator errors

Do not show internal errors directly to users.

Internal error:

query user by id: dial tcp 10.0.4.12:5432: connection refused

User-facing message:

internal server error

Operator log:

request failed err="get user 123: query user by id: dial tcp 10.0.4.12:5432: connection refused"

These are different audiences, and a good error architecture keeps them separate:

internal diagnostic error
user-safe response
stable API error code
operator log context

Forcing one error string to serve all these audiences produces either an exposure risk or a debugging nightmare. Design your error architecture around distinct values for distinct consumers.

Secure error handling

Errors can leak sensitive information.

Avoid exposing:

database connection strings
SQL queries with secrets
internal hostnames
file paths
access tokens
API keys
stack traces
private customer data
authorization policy details

This matters especially in HTTP APIs.

Bad:

http.Error(w, err.Error(), http.StatusInternalServerError)

Good:

http.Error(w, "internal server error", http.StatusInternalServerError)

Log the internal error securely for operators. Return a safe message to the user.

Error codes

For public APIs, stable error codes are often better than relying only on messages.

Example response:

{
  "code": "user_not_found",
  "message": "user not found"
}

The message can change. The code should be stable.

Use error codes for:

client behavior
documentation
SDKs
localization
support diagnostics

Do not make clients parse English error messages.

A practical layered error design

Here is a clean pattern for many Go backend services.

Repository layer

Talks to database or external storage.
Converts storage-specific not-found errors to domain errors.
Wraps unexpected storage errors with operation context.
Does not return HTTP errors.
Usually does not log.

Example:

if errors.Is(err, sql.ErrNoRows) {
    return nil, ErrUserNotFound
}

return nil, fmt.Errorf("query user by id: %w", err)

Service layer

Owns business rules.
Creates domain errors.
Preserves known domain errors.
Wraps unexpected lower-level errors.
Does not return HTTP status codes.
Usually does not log.

Example:

if user.Disabled {
    return nil, ErrAccountDisabled
}

Transport layer

Maps domain errors to HTTP, gRPC, or CLI responses.
Logs unhandled or unexpected errors.
Hides internal details from users.
Sets status codes and API error codes.

Example:

switch {
case errors.Is(err, ErrUserNotFound):
    writeError(w, http.StatusNotFound, "user_not_found", "user not found")
default:
    writeError(w, http.StatusInternalServerError, "internal_error", "internal server error")
}

This separation keeps error handling understandable and lets each layer evolve independently — you can change storage technology without touching service logic or transport mapping. The layered design works best when dependencies are injected rather than hard-coded; Dependency Injection in Go: Patterns & Best Practices covers the constructor and interface patterns that make each boundary easy to test in isolation.

Complete example

Here is a small end-to-end example.

Domain errors:

package users

import "errors"

var (
    ErrUserNotFound   = errors.New("user not found")
    ErrDuplicateEmail = errors.New("duplicate email")
    ErrAccountDisabled = errors.New("account disabled")
)

Repository:

package users

import (
    "context"
    "database/sql"
    "errors"
    "fmt"
)

type Repository struct {
    db *sql.DB
}

func (r *Repository) GetByID(ctx context.Context, id string) (*User, error) {
    const query = `
        select id, email, name, disabled
        from users
        where id = $1
    `

    var user User

    err := r.db.QueryRowContext(ctx, query, id).Scan(
        &user.ID,
        &user.Email,
        &user.Name,
        &user.Disabled,
    )
    if err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return nil, ErrUserNotFound
        }

        return nil, fmt.Errorf("query user by id: %w", err)
    }

    return &user, nil
}

Service:

package users

import (
    "context"
    "errors"
    "fmt"
)

type Service struct {
    repo *Repository
}

func (s *Service) GetProfile(ctx context.Context, id string) (*Profile, error) {
    user, err := s.repo.GetByID(ctx, id)
    if err != nil {
        if errors.Is(err, ErrUserNotFound) {
            return nil, err
        }

        return nil, fmt.Errorf("get profile for user %s: %w", id, err)
    }

    if user.Disabled {
        return nil, ErrAccountDisabled
    }

    return &Profile{
        ID:    user.ID,
        Email: user.Email,
        Name:  user.Name,
    }, nil
}

HTTP handler:

package httpapi

import (
    "context"
    "errors"
    "net/http"

    "example.com/app/users"
)

type Handler struct {
    users *users.Service
}

func (h *Handler) GetProfile(w http.ResponseWriter, r *http.Request) {
    profile, err := h.users.GetProfile(r.Context(), r.PathValue("id"))
    if err != nil {
        h.writeError(w, err)
        return
    }

    writeJSON(w, http.StatusOK, profile)
}

func (h *Handler) writeError(w http.ResponseWriter, err error) {
    switch {
    case errors.Is(err, users.ErrUserNotFound):
        writeJSON(w, http.StatusNotFound, map[string]string{
            "code":    "user_not_found",
            "message": "user not found",
        })

    case errors.Is(err, users.ErrAccountDisabled):
        writeJSON(w, http.StatusForbidden, map[string]string{
            "code":    "account_disabled",
            "message": "account is disabled",
        })

    case errors.Is(err, context.Canceled):
        return

    case errors.Is(err, context.DeadlineExceeded):
        writeJSON(w, http.StatusGatewayTimeout, map[string]string{
            "code":    "request_timeout",
            "message": "request timed out",
        })

    default:
        writeJSON(w, http.StatusInternalServerError, map[string]string{
            "code":    "internal_error",
            "message": "internal server error",
        })
    }
}

This structure gives you:

domain errors
storage translation
service context
safe HTTP mapping
inspectable error chains
no string matching
no transport leakage into domain code

That is the kind of error architecture that scales — straightforward enough for a new contributor to understand, yet structured enough that domain logic never leaks into transport responses.

Testing error behavior

Error behavior should be tested just as thoroughly as the happy path, because boundary decisions — sentinel mapping, type extraction, HTTP codes — are often where bugs hide longest. For a full guide to Go test structure, mocking, and coverage patterns, see Go Unit Testing: Structure & Best Practices.

Test sentinel mapping

func TestGetByIDNotFound(t *testing.T) {
    repo := newTestRepository(t)

    _, err := repo.GetByID(t.Context(), "missing")
    if !errors.Is(err, users.ErrUserNotFound) {
        t.Fatalf("got %v, want ErrUserNotFound", err)
    }
}

Test custom error extraction

func TestValidationError(t *testing.T) {
    err := ValidateCreateUser(CreateUserRequest{})

    var validationErr *ValidationError
    if !errors.As(err, &validationErr) {
        t.Fatalf("got %T, want ValidationError", err)
    }

    if len(validationErr.Fields) == 0 {
        t.Fatal("expected validation fields")
    }
}

Test HTTP mapping

func TestWriteErrorNotFound(t *testing.T) {
    rec := httptest.NewRecorder()

    writeHTTPError(rec, users.ErrUserNotFound)

    if rec.Code != http.StatusNotFound {
        t.Fatalf("status = %d, want %d", rec.Code, http.StatusNotFound)
    }
}

Tests should prove that known errors produce the right behavior at each boundary, so that refactoring storage or transport layers cannot silently change the failure contract.

Common anti-patterns

Anti-pattern 1: String matching

Bad:

if strings.Contains(err.Error(), "not found") {
    // ...
}

Use errors.Is or errors.As instead — both handle wrapped error chains automatically and do not break when messages are reformatted or localized.

Anti-pattern 2: Losing the cause

Bad:

return errors.New("query failed")

Better:

return fmt.Errorf("query user: %w", err)

Anti-pattern 3: Wrapping without meaning

Bad:

return fmt.Errorf("error happened: %w", err)

Wrap with operation context that explains what was being attempted, such as "create invoice %s: %w" rather than a vague prefix that adds no diagnostic value.

Anti-pattern 4: Logging at every layer

Bad:

log.Println(err)
return err

at every level. Log once where the error is finally handled, not at every intermediate layer that simply passes it upward.

Anti-pattern 5: Returning HTTP errors from domain code

Bad:

return &APIError{Status: http.StatusNotFound}

from a domain service. Map domain errors to HTTP status codes and response bodies at the handler boundary, keeping your service layer independent of transport concerns.

Anti-pattern 6: Exposing internal errors to users

Bad:

http.Error(w, err.Error(), http.StatusInternalServerError)

Return safe generic messages to users and log the full internal error with structured context for operators. Never expose database connection strings, file paths, or raw stack traces in API responses.

Anti-pattern 7: Too many exported sentinels

Exported errors are part of your package API, and adding them commits you to maintaining them. Do not export every internal condition unless external callers genuinely need to branch on it — prefer keeping sentinels unexported until there is a clear need.

Anti-pattern 8: Using panic for expected failures

Bad:

panic(err)

for normal runtime failures. Reserve panic for truly unrecoverable conditions or programmer errors, not for missing records or invalid user input — always return errors in those cases.

Anti-pattern 9: Ignoring context errors

Bad:

return fmt.Errorf("request failed")

when the real cause was context.Canceled. Preserve context errors so that callers can distinguish between a genuine operation failure and a canceled or timed-out request, and respond appropriately to each.

Error review checklist

Use this checklist in code review.

Error creation

Is this a known condition?
Should it be a sentinel?
Does it need structured data?
Should it be a custom type?
Is the error message clear?

Error wrapping

Does the wrap add useful operation context?
Does %w preserve the cause where needed?
Is the code accidentally exposing implementation details?
Is the chain too noisy?

Error translation

Is a low-level error translated at the right boundary?
Is database-specific behavior hidden from service code?
Are domain errors independent of HTTP or CLI concerns?

Error handling

Does the caller branch with errors.Is or errors.As?
Are context cancellation and deadlines handled correctly?
Are retryable errors identified explicitly?
Are validation errors structured?

Logging

Is the error logged once, at the handling boundary?
Are logs structured?
Are sensitive details excluded from user responses?
Is there enough context for operators?

Testing

Are known error cases tested?
Are HTTP or CLI mappings tested?
Are validation details tested?
Are retry decisions tested?

My opinionated rules

Rule 1: Errors should cross boundaries with meaning

Do not just pass errors around. Decide what they mean at each layer.

Rule 2: Wrap for context, not decoration

If wrapping does not add useful information about what operation failed, do not wrap. An extra layer of context without meaning makes the error chain harder to read and adds no diagnostic value.

Rule 3: Translate implementation errors into domain errors

Do not let sql.ErrNoRows become part of your business logic. Translate implementation errors to domain errors at the storage boundary, so the rest of the application never needs to know which database or ORM is underneath.

Rule 4: Do not parse error strings

If code needs to branch on failure type, use sentinels, custom types, errors.Is, or errors.As. String inspection creates invisible coupling that breaks silently when error messages change.

Rule 5: Log once

Wrap as errors move up. Log where the error is finally handled.

Rule 6: Keep user messages safe

Internal diagnostic errors are for logs. User-facing messages are for users.

Rule 7: Keep transport errors at the transport boundary

HTTP status codes belong in handlers or API adapters, not in domain services. Domain code should be reusable across transports — today HTTP, tomorrow CLI, gRPC, or an event-driven worker.

Final thoughts

Go error handling is not about writing if err != nil forever — it is about making failure explicit and understandable at every boundary.

The mechanics are simple:

return errors
wrap with %w
check with errors.Is
extract with errors.As
join when several errors matter

The architecture is the harder part:

translate at boundaries
preserve causes
hide internals from users
log once
test known failures

That is Go error handling done well — not clever, not magical, but clear enough that the next developer, operator, API client, and future you can understand what failed and what should happen next. For a broader view of production Go patterns across integration, testing, and data access, see App Architecture in Production.

Sources

Testing Concurrent Go Code with synctest

Rost — Tue, 30 Jun 2026 08:14:54 +0000

Testing concurrent Go code has always required a bit of discipline.
Goroutines are cheap, channels are simple, and context cancellation is idiomatic — background workers and timers are everywhere in real Go services.

But testing all of that reliably is harder than writing it.

The usual bad pattern is familiar:

go doSomething()

time.Sleep(100 * time.Millisecond)

if !done {
    t.Fatal("background work did not finish")
}

That test may pass on your laptop and fail in CI. Or it may pass for six months and then fail on a loaded runner. Or it may be slow because someone increased the sleep from 100 milliseconds to 2 seconds "just to be safe".

This is not good testing — it is gambling with a timer, and that gamble gets more expensive as the test suite grows.

The testing/synctest package gives Go developers a better way to test many forms of asynchronous and time-dependent code. It lets a test run inside an isolated bubble, gives that bubble a fake clock, and provides a way to wait until goroutines inside the bubble are blocked.

The result is simple but powerful:

No arbitrary sleeps
Faster timeout tests
More deterministic concurrent tests
Better testing of context cancellation
Better testing of background goroutines
Less flaky CI

The slightly opinionated version: if your concurrent Go test depends on a real time.Sleep, you should probably treat that test as suspicious.

What testing/synctest is

testing/synctest is a Go standard library package for testing concurrent code.

It provides two main functions:

package synctest

func Test(t *testing.T, f func(*testing.T))
func Wait()

synctest.Test runs a function inside an isolated test bubble. Any goroutines started inside that bubble are also part of the bubble, time inside the bubble is fake, and the time package works against that fake clock rather than the real wall clock.

synctest.Wait waits until all other goroutines in the bubble are durably blocked. That sounds abstract, but the practical effect is easy to understand:

synctest.Test(t, func(t *testing.T) {
    time.Sleep(10 * time.Second)
})

This does not make your test wait 10 real seconds. Inside the synctest bubble, time can advance instantly when the bubble is blocked and waiting for time to move forward — that is the core trick behind the package.

Why concurrent Go tests are flaky

If you are new to Go testing in general, Go Unit Testing: Structure & Best Practices covers the testing package, table-driven tests, and mocking patterns that form the foundation this article builds on. Concurrent tests are usually flaky for one of three reasons.

First, they depend on the scheduler. A goroutine may run immediately on your machine and later on CI.

Second, they depend on real time. A test that sleeps for 50 milliseconds assumes that 50 milliseconds is enough time for the background work to finish.

Third, they observe state too early. The test checks the result before the background operation has actually completed.

Here is a simple example:

func TestBackgroundWorkBad(t *testing.T) {
    done := false

    go func() {
        done = true
    }()

    time.Sleep(10 * time.Millisecond)

    if !done {
        t.Fatal("background work did not finish")
    }
}

This test has two problems.

The obvious one is the sleep. There is no guarantee that 10 milliseconds is the right amount of time.

The less obvious one is the data race. The test writes done in one goroutine and reads it in another without synchronization.

You can fix this specific example with a channel or a sync.WaitGroup, and often you should. But when the code under test uses timers, context deadlines, time.AfterFunc, background workers, or delayed cleanup, the test can still become awkward — and that is exactly where testing/synctest helps.

The core idea: run the test inside a bubble

A synctest bubble isolates the goroutines created inside it.

Use it like this:

func TestSomethingConcurrent(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        // Test concurrent code here.
    })
}

Inside the bubble:

Goroutines started by the test belong to the bubble.
Timers and sleeps use a fake clock.
synctest.Wait can wait for background activity to settle.
The test should avoid depending on external goroutines, real network I/O, or external processes.

The bubble is not magic. It does not make bad concurrency design good. But it gives your test a controlled environment where time and blocking behavior are more deterministic.

The problem with time.Sleep in tests

A real time.Sleep in a test usually means one of two things:

I do not know how to wait for the event I actually care about.

or:

I know what I care about, but the code under test does not expose a clean way to observe it.

Both are design signals worth taking seriously — they point to places where the production code may benefit from cleaner observability or more explicit coordination mechanisms.

Consider a function that completes work in the background:

type Worker struct {
    out chan string
}

func NewWorker() *Worker {
    return &Worker{
        out: make(chan string, 1),
    }
}

func (w *Worker) Start() {
    go func() {
        time.Sleep(5 * time.Second)
        w.out <- "done"
    }()
}

func (w *Worker) Result() <-chan string {
    return w.out
}

A bad test might look like this:

func TestWorkerBad(t *testing.T) {
    w := NewWorker()
    w.Start()

    time.Sleep(6 * time.Second)

    select {
    case got := <-w.Result():
        if got != "done" {
            t.Fatalf("got %q, want done", got)
        }
    default:
        t.Fatal("worker did not finish")
    }
}

This test waits six real seconds.

That is slow. If you have many tests like this, the suite becomes painful.

A better test with synctest can advance fake time instantly:

func TestWorkerWithSynctest(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        w := NewWorker()
        w.Start()

        time.Sleep(5 * time.Second)
        synctest.Wait()

        select {
        case got := <-w.Result():
            if got != "done" {
                t.Fatalf("got %q, want done", got)
            }
        default:
            t.Fatal("worker did not finish")
        }
    })
}

The test still expresses the business fact — the worker should finish after 5 seconds — but it does not spend 5 real seconds doing so. That is the difference between testing time-dependent behavior and wasting developer time.

Testing context timeouts

One of the best uses for testing/synctest is testing context.Context deadlines and timeouts. Correctly propagating context.Canceled and context.DeadlineExceeded through service and handler layers is covered in depth in Go Error Handling Architecture: Boundaries and Patterns — synctest lets you verify that behavior without real time passing.

Here is a simple function that waits until a context is canceled:

func WaitForCancel(ctx context.Context, done chan<- error) {
    go func() {
        <-ctx.Done()
        done <- ctx.Err()
    }()
}

Without synctest, testing this with a 30-second timeout would either make the test slow or force you to change the timeout just for the test.

With synctest, you can test the real timeout duration quickly:

func TestWaitForCancelWithTimeout(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithTimeout(t.Context(), 30*time.Second)
        defer cancel()

        done := make(chan error, 1)
        WaitForCancel(ctx, done)

        synctest.Wait()

        select {
        case err := <-done:
            t.Fatalf("context canceled too early: %v", err)
        default:
        }

        time.Sleep(30 * time.Second)
        synctest.Wait()

        select {
        case err := <-done:
            if !errors.Is(err, context.DeadlineExceeded) {
                t.Fatalf("got %v, want %v", err, context.DeadlineExceeded)
            }
        default:
            t.Fatal("context was not canceled")
        }
    })
}

This is the kind of test that synctest makes pleasant.

You can keep realistic timeout values in code and still run tests quickly.

Testing context cancellation

You can also test explicit cancellation without racing the background goroutine.

func TestWaitForCancelWithExplicitCancel(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithCancel(t.Context())

        done := make(chan error, 1)
        WaitForCancel(ctx, done)

        synctest.Wait()

        select {
        case err := <-done:
            t.Fatalf("context canceled too early: %v", err)
        default:
        }

        cancel()
        synctest.Wait()

        select {
        case err := <-done:
            if !errors.Is(err, context.Canceled) {
                t.Fatalf("got %v, want %v", err, context.Canceled)
            }
        default:
            t.Fatal("context was not canceled")
        }
    })
}

The important detail is synctest.Wait.

It gives the background goroutine a chance to observe cancellation and settle before the test checks the result.

What synctest.Wait does

synctest.Wait waits until all other goroutines in the bubble are durably blocked.

In normal language, it means:

Wait until the goroutines inside this test have reached a stable blocked point.

This is useful when the test starts a goroutine and needs to know that the goroutine has either finished or is waiting.

For example:

func TestWaitExample(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        done := false

        go func() {
            done = true
        }()

        synctest.Wait()

        if !done {
            t.Fatal("goroutine did not run")
        }
    })
}

This is intentionally small, but it demonstrates the idea.

synctest.Wait is not just a nicer sleep — it is a synchronization point inside the bubble, and that distinction matters more than it first appears.

A sleep says:

I hope enough time has passed.

Wait says:

I want the bubble to reach a stable blocked state.

The second is far better for tests because it describes an observable condition rather than a guess about elapsed time.

Fake time in a synctest bubble

Inside a synctest bubble, the time package uses a fake clock.

The fake clock starts at a fixed time. It advances only when every goroutine in the bubble is durably blocked and time needs to move forward to unblock something.

That means this test is fast:

func TestFakeTime(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        start := time.Now()

        time.Sleep(1 * time.Hour)

        elapsed := time.Since(start)
        if elapsed != time.Hour {
            t.Fatalf("got %v, want %v", elapsed, time.Hour)
        }
    })
}

It reads like it waits an hour.

It does not.

This is useful for testing:

timeouts
deadlines
retries
backoff
delayed cleanup
rate limits
timers
tickers
context cancellation

But there is one important rule: fake time only helps code that uses the time package inside the bubble.

If your code depends on an external system, real network I/O, or time measured outside the bubble, synctest cannot make that deterministic.

Testing a retry loop

Retry loops are a common source of slow and flaky tests.

Here is a small retry helper:

func Retry(ctx context.Context, attempts int, delay time.Duration, fn func() error) error {
    var last error

    for i := 0; i < attempts; i++ {
        if err := fn(); err != nil {
            last = err
        } else {
            return nil
        }

        if i == attempts-1 {
            break
        }

        timer := time.NewTimer(delay)
        select {
        case <-ctx.Done():
            timer.Stop()
            return ctx.Err()
        case <-timer.C:
        }
    }

    return last
}

A normal test might reduce the delay to 1 millisecond just to keep the suite fast.

That is not terrible, but it means the test is no longer exercising the real value used by production code.

With synctest, you can keep the real delay:

func TestRetryEventuallySucceeds(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx := t.Context()

        calls := 0
        err := Retry(ctx, 3, 10*time.Second, func() error {
            calls++
            if calls < 3 {
                return errors.New("temporary failure")
            }
            return nil
        })

        if err != nil {
            t.Fatalf("Retry returned error: %v", err)
        }

        if calls != 3 {
            t.Fatalf("calls = %d, want 3", calls)
        }
    })
}

The test represents two 10-second waits.

It still runs quickly.

This is where synctest changes the economics of testing. You no longer need fake tiny durations scattered through tests just to avoid slow CI.

Testing retry cancellation

You can also test cancellation during retry delay:

func TestRetryStopsWhenContextCanceled(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithCancel(t.Context())

        errCh := make(chan error, 1)

        go func() {
            errCh <- Retry(ctx, 10, 10*time.Second, func() error {
                return errors.New("temporary failure")
            })
        }()

        synctest.Wait()

        cancel()
        synctest.Wait()

        select {
        case err := <-errCh:
            if !errors.Is(err, context.Canceled) {
                t.Fatalf("got %v, want %v", err, context.Canceled)
            }
        default:
            t.Fatal("Retry did not return after cancellation")
        }
    })
}

This test checks that the retry loop responds to cancellation instead of sleeping through the delay.

That is exactly the kind of behavior that matters in production.

Testing time.AfterFunc

time.AfterFunc is another good fit.

Suppose you have a function that schedules cleanup:

type Cache struct {
    cleaned chan struct{}
}

func NewCache() *Cache {
    return &Cache{
        cleaned: make(chan struct{}, 1),
    }
}

func (c *Cache) CleanupAfter(d time.Duration) {
    time.AfterFunc(d, func() {
        c.cleaned <- struct{}{}
    })
}

func (c *Cache) Cleaned() <-chan struct{} {
    return c.cleaned
}

The test can advance fake time:

func TestCleanupAfter(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        cache := NewCache()
        cache.CleanupAfter(1 * time.Minute)

        synctest.Wait()

        select {
        case <-cache.Cleaned():
            t.Fatal("cleanup happened too early")
        default:
        }

        time.Sleep(1 * time.Minute)
        synctest.Wait()

        select {
        case <-cache.Cleaned():
        default:
            t.Fatal("cleanup did not happen")
        }
    })
}

This test verifies both sides:

The cleanup does not happen before the delay.
The cleanup does happen after the delay.

And it does not wait a real minute.

Testing tickers

Tickers can also be tested with fake time, but be careful. Tickers are often used in long-running loops, and long-running loops need a clean shutdown path.

Here is a small ticker-based counter:

type Counter struct {
    ticks int
    done  chan struct{}
}

func NewCounter() *Counter {
    return &Counter{
        done: make(chan struct{}),
    }
}

func (c *Counter) Start(ctx context.Context, interval time.Duration) {
    ticker := time.NewTicker(interval)

    go func() {
        defer ticker.Stop()
        defer close(c.done)

        for {
            select {
            case <-ctx.Done():
                return
            case <-ticker.C:
                c.ticks++
            }
        }
    }()
}

func (c *Counter) Wait() {
    <-c.done
}

func (c *Counter) Ticks() int {
    return c.ticks
}

A test might look like this:

func TestCounterTicks(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithCancel(t.Context())

        counter := NewCounter()
        counter.Start(ctx, 10*time.Second)

        time.Sleep(35 * time.Second)
        synctest.Wait()

        cancel()
        counter.Wait()

        if counter.Ticks() != 3 {
            t.Fatalf("ticks = %d, want 3", counter.Ticks())
        }
    })
}

This example has a deliberate design detail: the worker has a shutdown path.

That is not only good for tests. It is good for production.

Tests often reveal whether your goroutines can actually stop.

synctest and goroutine leaks

testing/synctest is helpful here because synctest.Test waits for goroutines in the bubble to exit before returning, which means leaked goroutines are harder to ignore. If a background goroutine never exits, the test fails instead of silently leaving work behind — and that is a good thing.

Concurrent code should have clear ownership. If a function starts a goroutine, there should be an explicit way to stop it, or a documented reason why it is allowed to live forever. In tests, "forever" is almost never acceptable.

A good pattern is:

ctx, cancel := context.WithCancel(t.Context())
defer cancel()

Then make the goroutine stop when the context is canceled.

What "durably blocked" means in practice

The official docs use the term "durably blocked".

You do not need to memorize every runtime detail, but you should understand the practical meaning.

A goroutine is durably blocked when it is blocked in a way that can only be unblocked by something inside the same synctest bubble.

Examples include:

receiving from a channel created inside the bubble
sending to a channel created inside the bubble
waiting on a sync.WaitGroup associated with the bubble
sleeping with time.Sleep
waiting on certain timer operations

Some things are not durably blocked because something outside the bubble may unblock them.

Examples include:

network I/O
system calls
external process operations
some mutex waits
interactions with goroutines outside the bubble

This is why synctest tests should be self-contained and kept free from external synchronization that the bubble cannot see. Do not use synctest as a wrapper around integration tests that talk to the real network.

What synctest is good for

testing/synctest is especially good for unit tests around asynchronous behavior.

Good candidates include:

context cancellation
context timeouts
retry loops
backoff logic
delayed cleanup
timer-driven workers
ticker-driven loops
background goroutines
timeout behavior
channel coordination
time.AfterFunc
deterministic waiting for goroutines

The best use case is code where the hard part is time or scheduling, not external I/O.

What synctest is not good for

testing/synctest is not a replacement for all concurrency testing.

It is not a full deterministic scheduler for every possible race.

It is not a substitute for the race detector.

It is not a replacement for integration tests.

It does not make real network I/O deterministic.

It does not fix bad goroutine lifecycle design.

It does not mean you can ignore channels, contexts, ownership, and shutdown.

Use synctest for the right layer: deterministic unit tests for concurrent and time-dependent behavior.

Use other tools for other layers:

use go test -race to detect data races
use integration tests for real dependencies
use load tests for throughput and contention
use benchmarks for performance
use tracing and profiling for production behavior

synctest vs the race detector

testing/synctest and the race detector solve different problems.

The race detector finds unsafe concurrent memory access.

synctest helps you control asynchronous timing and waiting in tests.

You should often use both.

For example, this is still a race even inside a synctest bubble if there is no proper synchronization:

value := 0

go func() {
    value = 1
}()

_ = value

synctest.Wait can provide a synchronization point for some test patterns, but it does not mean every concurrent access in your code is automatically safe.

Run concurrent tests with:

go test -race ./...

The race detector is still one of the best tools Go gives you. Pairing it with Go Linters: Essential Tools for Code Quality gives you a solid static analysis and runtime-check baseline for any concurrent codebase.

synctest vs manual fake clocks

Before testing/synctest, many teams used manual fake clocks.

That can still be a good design.

A manual clock interface might look like this:

type Clock interface {
    Now() time.Time
    After(time.Duration) <-chan time.Time
    Sleep(time.Duration)
}

Then production code uses a real clock and tests use a fake clock.

This gives explicit control, but it has a cost:

more interfaces
more plumbing
more test-only abstractions
more ways for code to bypass the fake clock accidentally

synctest is attractive because ordinary code that uses the time package can run against fake time inside the test bubble.

That reduces the need for clock injection in many cases.

My opinion: use synctest when it keeps production code simpler. Use an injected clock only when clock control is part of your domain design or when you need control outside what synctest provides. For a broader look at dependency injection patterns in Go — including when and how to inject testable abstractions — see Dependency Injection in Go: Patterns & Best Practices.

synctest vs channels and WaitGroups

Do not replace good synchronization with synctest.

If your code can expose a completion channel, a callback, or a Wait method, that is often good design.

For example:

type Server struct {
    done chan struct{}
}

func (s *Server) Done() <-chan struct{} {
    return s.done
}

A test can wait on that directly.

synctest is most useful when the behavior under test involves time, context deadlines, background scheduling, or async callbacks.

The best tests often combine both:

production code has explicit shutdown or completion signals
synctest removes real-time waiting
Wait makes background activity deterministic

Common mistakes

Mistake 1: Wrapping every test in synctest

Do not use synctest everywhere. If the code is synchronous, a plain test function is clearer, and adding the bubble wrapper only introduces unnecessary machinery that makes tests harder to read and reason about.

Mistake 2: Testing real network I/O inside the bubble

Keep synctest tests self-contained. If your test uses a real network socket, external service, database, or subprocess, it belongs in an integration test rather than inside a synctest bubble. Use fakes for unit tests and reserve real dependencies for separate integration tests where bubble isolation does not apply.

Mistake 3: Leaking goroutines

If your test starts a goroutine, make sure it has a clear exit path. Use context cancellation, closed channels, or explicit stop methods — a goroutine that never stops is both a production smell and a test smell that synctest will surface rather than hide.

Mistake 4: Depending on package-level state

Package-level channels, timers, and WaitGroups can break bubble isolation in subtle ways. Prefer creating all test state inside the synctest.Test function so that every resource belongs to the bubble and its lifetime is clearly scoped to the test.

Mistake 5: Treating fake time as real time

Fake time is for deterministic tests, not performance measurement. A test that advances one hour instantly tells you nothing useful about CPU cost, lock contention, memory usage, or real scheduling behavior in production — use benchmarks and load tests for those questions.

Mistake 6: Ignoring the race detector

synctest is not a replacement for go test -race, and the two tools solve different problems. Run the race detector alongside your synctest tests to catch unsafe concurrent memory access that the bubble alone cannot detect.

A practical checklist

Use this checklist when writing tests with testing/synctest.

Use synctest when

the code starts goroutines
the code uses time.Sleep
the code uses timers or tickers
the code uses context deadlines
the code has retry or backoff behavior
the test currently uses arbitrary sleeps
the test is flaky in CI
the test is slow because it waits for real time

Avoid synctest when

the code is synchronous
the test depends on real network I/O
the test depends on external processes
the test is really an integration test
you are trying to measure performance
the code has no clean shutdown path

Prefer this pattern

func TestSomething(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        // Arrange.
        ctx, cancel := context.WithCancel(t.Context())
        defer cancel()

        // Act.
        _ = ctx

        // Let background work settle.
        synctest.Wait()

        // Advance fake time if needed.
        time.Sleep(1 * time.Second)
        synctest.Wait()

        // Assert.
    })
}

This pattern is simple:

set up inside the bubble
start work inside the bubble
wait for background activity to settle
advance fake time only when needed
assert after synchronization

Where to use testing/synctest in real projects

The best places to look are usually not in simple business logic.

Look for tests with these smells:

grep -R "time.Sleep" .
grep -R "time.After" .
grep -R "WithTimeout" .
grep -R "WithDeadline" .
grep -R "NewTicker" .
grep -R "AfterFunc" .

Then ask:

Is this test slow because it waits for real time?
Is this test flaky because it assumes a goroutine already ran?
Can this test be isolated from network and external processes?
Can the background goroutine be stopped cleanly?
Would fake time make the assertion clearer?

Good candidates often live in:

worker packages
retry packages
cache packages
scheduler packages
queue consumers
HTTP client wrappers
timeout middleware
background cleanup code
rate-limiting code

Start with one flaky test. Do not migrate the whole codebase at once. If your test suite uses parallel table-driven tests alongside async code, Parallel Table-Driven Tests in Go covers the t.Parallel() patterns and race condition traps that pair naturally with the synctest approach.

Example: before and after

Here is a realistic bad test:

func TestRetryBad(t *testing.T) {
    calls := 0

    err := Retry(context.Background(), 3, 500*time.Millisecond, func() error {
        calls++
        if calls < 3 {
            return errors.New("temporary failure")
        }
        return nil
    })

    if err != nil {
        t.Fatalf("Retry returned error: %v", err)
    }

    if calls != 3 {
        t.Fatalf("calls = %d, want 3", calls)
    }
}

This waits about one second because two retry delays occur.

That may not sound bad, but multiply it by many tests and several packages. Slow tests make developers run tests less often.

Now the synctest version:

func TestRetryWithSynctest(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        calls := 0

        err := Retry(t.Context(), 3, 500*time.Millisecond, func() error {
            calls++
            if calls < 3 {
                return errors.New("temporary failure")
            }
            return nil
        })

        if err != nil {
            t.Fatalf("Retry returned error: %v", err)
        }

        if calls != 3 {
            t.Fatalf("calls = %d, want 3", calls)
        }
    })
}

The test keeps the real delay value, the suite stays fast, and the intent is clearer. That is the main value of testing/synctest.

How to adopt synctest safely

I would adopt it gradually.

Step 1: Find flaky or slow concurrent tests

Search for real sleeps and timeout-heavy tests. The grep commands in the previous section are a good starting point for identifying candidates across the codebase.

Step 2: Pick one package

Choose a package that has clear asynchronous behavior but does not require real external services. Worker packages, retry helpers, and timer-driven components are ideal first targets.

Step 3: Convert one test

Wrap the test in synctest.Test and replace arbitrary sleeps with synctest.Wait, fake-time sleeps, or explicit synchronization. The conversion is usually small — the hardest part is making sure goroutines have clean shutdown paths.

Step 4: Run with the race detector

Always run with go test -race ./... after converting. A passing synctest test does not mean the code is race-free; it only means the async timing is now deterministic.

Step 5: Review goroutine lifecycle

Make sure every goroutine started by the test has a way to exit before the bubble closes. If it does not, synctest.Test will surface the leak rather than silently ignoring it.

Step 6: Repeat only where it improves clarity

Do not convert tests just for fashion. A good synctest test should be measurably faster, clearer to read, or less flaky than the version it replaced — if it is not, the conversion was not worth it.

My opinionated rules

Use these as practical rules of thumb.

Rule 1: No arbitrary sleeps in concurrent unit tests

A sleep that waits for a goroutine to maybe finish is a smell. Replace it with channels, WaitGroups, callbacks, synctest.Wait, or fake time — anything that waits for a condition rather than hoping enough time has passed.

Rule 2: Keep synctest tests self-contained

Create goroutines, channels, contexts, timers, and workers inside the bubble. Avoid package-level shared state, which can leak between tests and break the isolation that makes synctest useful.

Rule 3: Do not use synctest as an integration test wrapper

If the test talks to a real database, real network, or external process, keep it out of synctest unless you have a very specific reason for doing so.

Rule 4: Test behavior, not scheduler luck

The goal is not to force a goroutine to run. The goal is to verify observable behavior after the system has reached a meaningful state, which synctest.Wait makes possible without depending on timing assumptions.

Rule 5: Keep cancellation paths explicit

Every background goroutine should have a shutdown path, and tests should prove that path works by canceling the context or closing the channel and then verifying the goroutine exits cleanly.

Final thoughts

testing/synctest is one of those Go features that looks small but changes how you write a class of tests. It does not replace good concurrency design, the race detector, or the need for integration tests — but it does make many asynchronous unit tests faster, cleaner, and far less dependent on timing luck.

That matters because concurrent code is already hard enough. Tests should reduce uncertainty, not add to it. For a broader view of production Go patterns across integration, code structure, and data access, see App Architecture in Production.

The practical takeaway is simple:

Use synctest for deterministic unit tests around goroutines, timers, timeouts, retries, and cancellation.
Keep real sleeps out of concurrent tests unless you have a very good reason.

That one habit will make many Go test suites faster and less flaky.

The important current facts are: testing/synctest became generally available in Go 1.25, it exposes synctest.Test and synctest.Wait, it runs tests inside an isolated bubble, and time inside that bubble uses a fake clock that advances only when goroutines are durably blocked.

Sources

Google A2A Protocol in 2026: Adoption, Hype, and Reality

Rost — Mon, 29 Jun 2026 22:44:12 +0000

Google's Agent2Agent protocol, usually shortened to A2A, had a strange first year.

When Google announced A2A in April 2025, the pitch was clear: AI agents built by different vendors, frameworks, and teams needed a standard way to communicate. The protocol promised agent discovery, task delegation, message exchange, streaming updates, and artifact sharing. The reaction, however, was considerably less clean than the announcement.

Some developers saw A2A as the missing agent-to-agent layer for the emerging agentic stack. Others saw it as yet another Google protocol, another acronym, and another attempt to define a market before the market had real production needs. The skeptical take came down to a single question: "We already have MCP. Why do we need A2A?" That was a fair question in 2025, and it remains a fair question in 2026 — though the answer has shifted considerably.

A2A is not dead, but it is also not universally useful. The practical reality is that A2A is becoming genuinely valuable in a specific context: where agents are independent systems with their own ownership, tools, and trust boundaries, rather than just internal functions or tool wrappers. That distinction between tool integration and agent delegation is what the protocol is actually designed to address, and understanding it is the key to evaluating A2A without the hype in either direction.

What Is Google's A2A Protocol?

A2A stands for Agent2Agent Protocol, and that name captures its purpose precisely. It is an open standard for communication and interoperability between independent AI agent systems — specifically, agents that may be built using different frameworks, languages, or vendor stacks.

A2A is not mainly about connecting an agent to a database, file system, calendar, API, or search index. That is closer to the job of MCP, the Model Context Protocol. A2A is about something different: one agent communicating with another agent, treating the peer system as an actor with its own capabilities rather than a passive data source.

A typical A2A flow might involve:

Discovering an agent through an Agent Card
Reading the agent's skills and capabilities
Sending a task
Exchanging messages
Receiving status updates
Handling input-required states
Receiving final artifacts
Tracking completion, failure, or cancellation

The important word in that list is "task." A2A is not just a function call with a different wrapper — it is a task lifecycle protocol for agent collaboration, designed to handle the full arc from discovery and delegation through execution, status updates, and artifact return. For a deep technical walkthrough of each concept — Agent Cards, task lifecycle, messages, parts, and artifacts — see What Is the A2A Protocol? Agent Cards and Tasks Explained.

Why A2A Was Easy To Mock

A2A arrived in a market already drowning in agent acronyms.

By 2025, developers were already dealing with:

LLM APIs
Function calling
Tool calling
Agent frameworks
MCP servers
RAG pipelines
Workflow engines
Multi-agent orchestration libraries
Custom JSON protocols
Internal plugin systems

So when Google announced A2A, a common reaction was predictable:

"Do we really need another standard?"

The skepticism was not irrational, and it came from several directions at once. A2A looked like it overlapped with MCP. It came from Google, which made some developers worry about long-term commitment. It arrived before most teams had even solved basic tool access, prompt injection, observability, cost control, and security for single-agent systems.

In that environment, "agent-to-agent interoperability" sounded ambitious, but also a little premature.

And to be blunt, many AI agent demos in 2025 did not need A2A at all.

They needed better prompts, better tools, better permissions, better retry logic, and better logs.

The 2026 Update: A2A Is Not Dead

The big change in 2026 is that A2A is no longer only a Google announcement.

By April 2026, the Linux Foundation reported that the A2A project had passed 150 supporting organizations, gained major cloud platform integrations, and reached production deployments across multiple industries.

That does not mean every claim should be swallowed without skepticism. "Supported by" is not the same thing as "deeply used in production by most developers". Protocol ecosystems often look larger in press releases than they feel in day-to-day engineering work.

The signal matters, however, because it is harder to dismiss. A2A has crossed an important line: it is no longer just a Google blog post. It has a formal specification, governance momentum, public examples, SDK work, cloud platform attention, and a growing ecosystem around agent interoperability. That makes the "dead" label difficult to defend on technical or adoption grounds.

A more defensible criticism is that A2A is alive but its useful scope is narrower than the hype suggests.

A2A vs MCP: The Confusion That Would Not Die

Most A2A confusion comes from its relationship with MCP.

MCP, created by Anthropic, standardizes how AI applications connect to external tools and data sources. MCP servers expose tools, resources, and prompts. AI hosts and clients consume them.

In simple terms:

MCP connects agents to tools.
A2A connects agents to other agents.

That sounds clean, but the real world is considerably messier. An MCP server can expose something that looks very agentic — for example, an MCP tool named research_company that internally runs search, retrieval, summarization, ranking, and report writing. From the MCP host's point of view, it is a tool. From an architecture point of view, it is hiding an agent-like workflow behind a function call boundary. This ambiguity is precisely why some developers argued A2A was unnecessary: if an agent can be represented as an MCP tool, why create a separate protocol?

The answer is that A2A gives first-class structure to things MCP treats more awkwardly:

Agent discovery
Agent capabilities
Task lifecycle
Long-running work
Multi-turn task state
Agent-to-agent messaging
Artifacts
Collaboration between opaque agents
Delegation across organizational boundaries

MCP can wrap a great deal, but wrapping everything as a tool eventually becomes a bad abstraction. At some point, a specialist system has enough of its own state, policy, lifecycle, and decision-making authority that modeling it as a tool obscures the architecture rather than simplifying it. That is the inflection point where treating a peer agent as a peer agent — rather than as a tool call — starts to pay off. For a detailed comparison of where the boundary falls in practice, see A2A vs MCP: Do AI Agents Really Need Both Protocols?

The Best Mental Model: MCP Below, A2A Above

The cleanest architecture is not "A2A vs MCP".

The cleanest architecture is layered:

flowchart TD
    U["User or application"]
    O["Primary assistant / orchestrator"]
    S1["Specialist agent A"]
    S2["Specialist agent B"]
    T1["Tools, APIs, files, databases"]
    T2["More tools and data sources"]

    U --> O
    O -->|A2A| S1
    O -->|A2A| S2
    S1 -->|MCP| T1
    S2 -->|MCP| T2

In this model:

A2A is the agent collaboration layer.
MCP is the tool integration layer.

That is the pattern that makes the most sense in 2026, and it is the framing that most serious agent architects are converging on. A2A should not replace MCP, and MCP should not be forced to represent every agent boundary — they solve different problems at different layers of the stack. The "protocol war" framing is mostly lazy analysis that makes for good headlines while doing nothing to help engineers design better systems.

Where A2A Is Actually Useful

A2A becomes useful when an agent is no longer just a library call inside your application.

It is useful when agents are:

Independently deployed
Owned by different teams
Built with different frameworks
Exposed by vendors
Running with their own tools and permissions
Responsible for long-running tasks
Returning artifacts rather than simple values
Part of a broader multi-agent workflow

For example, imagine an enterprise assistant that needs to prepare a supplier risk report.

It might delegate work to:

A procurement agent
A legal review agent
A finance agent
A compliance agent
A market research agent
A report writing agent

Each agent has its own domain, tools, rules, permissions, and audit requirements.

For that kind of system, A2A is not absurd. It is a reasonable boundary.

The primary assistant should not need direct access to every procurement database, legal policy store, finance spreadsheet, and compliance workflow. It should ask the responsible agent to perform the task.

That is the essential distinction: tool access is a vertical connection between an agent and its resources, while domain delegation is a horizontal handoff between autonomous agents, each with its own boundary of authority and accountability. The layered model for how these components combine — LLM, memory, tooling, routing, and observability — is covered in AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability.

Where A2A Is Still Overhyped

A2A is overhyped when people present it as mandatory infrastructure for every AI project.

Most projects do not need it.

If you are building a local coding assistant, a chatbot for your docs, a small internal automation agent, or a single workflow that calls a handful of tools, A2A is probably unnecessary.

You may need:

MCP
Good tool schemas
Guardrails
Evaluation
Logging
Cost control
Retry logic
Better prompts
Better retrieval

You probably do not need a full agent-to-agent protocol.

A2A can be a mistake when:

There is only one agent
All components live in one codebase
Workflows are short and synchronous
Agents do not need discovery
Agents do not need independent task state
There are no external agent providers
An API or queue would be simpler
The team cannot operate the extra complexity

A protocol is not free. It adds concepts, failure modes, debugging overhead, security concerns, and operational work.

In many small systems, adopting A2A is architecture cosplay — borrowing the vocabulary of distributed agent systems without any of the actual boundary problems that make the protocol valuable.

A2A And The Google Problem

Part of the A2A skepticism comes from Google itself.

Developers have long memories. When Google launches a platform, protocol, product, or ecosystem, many engineers immediately ask:

"Will this still exist in three years?"

That reaction is not entirely fair to the A2A technical design, but it is a real adoption factor.

The Linux Foundation hosting story helps here. A2A becoming part of a broader open governance environment makes it less dependent on Google's internal priorities.

That does not guarantee success. Open governance does not magically create developer adoption. But it does reduce one of the biggest concerns: that A2A is only a Google-controlled strategic move.

In 2026, A2A should be judged less as "Google's protocol" and more as an emerging agent interoperability standard that Google helped start.

That is a healthier lens, and it is the one that makes A2A's technical merits easier to evaluate on their own terms rather than through the filter of Google's historical relationship with developer ecosystems.

Adoption: Strong Signal, But Not The Whole Story

The reported 150+ supporting organizations is meaningful, but it should not be confused with universal developer adoption. "Supported by" is a spectrum, not a binary, and it helps to read adoption claims with that in mind.

At the weakest end is logo adoption: a company says it supports the standard, which may reflect genuine implementation, strategic positioning, a prototype, or simply planned support that has not materialized. Slightly stronger is SDK adoption, where developers can actually build with available libraries, examples, and documentation — this means the protocol has moved from slideware into working implementation, and real engineers have found it worth their time. Stronger still is platform adoption, where clouds, agent frameworks, and enterprise systems expose real native support, making A2A a plausible default architectural choice rather than something teams have to wire together themselves.

The only adoption tier that really matters for long-term ecosystem health is production retention. For a sense of what real adoption curves look like in the AI agent space — measured in GitHub stars, OpenRouter tokens, and download trends — the OpenClaw vs Hermes Agent popularity data shows how quickly momentum builds and plateaus once early adopter energy subsides.: teams relying on the protocol for live workflows beyond the initial 90-day honeymoon. The Linux Foundation's 2026 update claims production use across multiple industries, which is meaningful evidence. But the more useful question is not "who supports A2A?" — it is "who keeps A2A in production after the first real operational incident?" Long-term retention under pressure is the signal that separates genuine infrastructure from protocol theater.

The Real Test: Retention In Production

Developer hype is cheap, and production retention is expensive. The two are rarely proportional, which is why the 90-day retention question matters more than launch-week enthusiasm.

A2A will prove itself if teams keep using it after they encounter:

Authentication problems
Authorization problems
Agent identity problems
Debugging issues
Task lifecycle edge cases
Streaming failures
Version compatibility
Vendor differences
Cost surprises
Security reviews
Audit requirements
Human approval workflows

This is where many agent frameworks and protocols fail. They look elegant in diagrams, then become painful in production.

A2A has a good reason to exist, but good reasons do not automatically translate into production resilience. The protocol has to survive the operational reality it encounters on the way from demo to deployment.

The best sign for A2A in 2026 is not that people are writing blog posts about it. The best sign is that enterprises are starting to use it for real multi-agent boundaries.

The worst sign would be if developers only use it in demos while production systems fall back to custom APIs and queues.

Security Is The Biggest Unresolved Question

A2A's hardest problems are not syntax or specification problems. They are trust problems that emerge when you actually deploy autonomous agents across organizational or system boundaries.

When one agent talks to another agent, several questions become urgent:

Who is this agent?
Who owns it?
What is it allowed to know?
What is it allowed to do?
Can it delegate work further?
Can it call tools on behalf of a user?
Can it preserve user intent?
Can it prove what happened?
Can it be audited after the task completes?

These questions are not optional in enterprise environments.

A2A makes agent collaboration easier. It also creates new places where trust can break.

For example:

A malicious agent could misrepresent its capabilities.
A compromised agent could request sensitive context.
A delegated task could exceed the user's authority.
An agent could return poisoned artifacts.
A chain of agents could make accountability unclear.
Sensitive data could flow across boundaries without proper logging.

This is why serious A2A systems need more than protocol compliance.

They need:

Strong agent identity
Scoped authorization
Task-level audit logs
Delegation tracking
Human approval for risky actions
Artifact provenance
Rate limits
Policy enforcement
Observability across agent boundaries

A2A is not a security architecture by itself — it is a communication protocol that must be deployed inside one, with explicit decisions made about identity, authorization, audit, and policy enforcement at every boundary it crosses.

A2A And The Agent Marketplace Idea

One of the more interesting long-term A2A use cases is agent marketplaces.

If agents can advertise capabilities through Agent Cards, then other agents or platforms can discover them, evaluate them, and send tasks.

That creates a possible future where agent capabilities become more modular:

A tax agent
A legal agent
A code review agent
A travel planning agent
A security analysis agent
A procurement agent
A data quality agent

Each could expose a standard interface for task-based collaboration.

This sounds exciting, but it is also where hype gets dangerous.

An open agent marketplace requires more than Agent Cards. It needs identity, reputation, billing, compliance, sandboxing, liability, versioning, and dispute resolution.

Without those, an agent marketplace becomes a security incident waiting to happen.

A2A is a useful building block for this kind of future, but it is one piece of a much larger puzzle that also requires identity systems, reputation mechanisms, billing infrastructure, compliance controls, and dispute resolution before it becomes a safe market to operate in.

A2A For Internal Enterprise Agents

The more realistic near-term use case is not public agent marketplaces.

It is internal enterprise agent networks.

Large organizations already have many boundaries:

Teams
Departments
Systems
Vendors
Data domains
Compliance zones
Security policies
Approval processes

A2A maps naturally onto these boundaries, because the protocol is designed around the same fundamental need: structured communication between systems that have their own ownership and do not share a codebase. The broader AI Systems cluster covers how specialist agents like Hermes and OpenClaw fit into this kind of layered architecture in practice.

Instead of building one giant assistant with direct access to everything, an enterprise can build specialist agents with limited responsibility:

HR agent
Finance agent
Support agent
DevOps agent
Security agent
Knowledge management agent
Data platform agent

Each agent can own its tools and policies internally. Other agents can interact with it through A2A.

This is a much better model than giving a single general-purpose agent direct access to every system in the organization, both from a security perspective and from an operational one. Each specialist agent can be owned, operated, audited, and secured independently, which also makes the overall system easier to reason about when something goes wrong.

A2A For Small Teams And Indie Hackers

For small teams building products with one or two agents, A2A is genuinely less urgent — and often a distraction from more immediate problems. You probably do not need an agent-to-agent protocol yet.

Use normal code. Use HTTP APIs. Use queues. Use MCP where tool integration matters.

Add A2A when you actually have:

Multiple independent agents
Third-party agent boundaries
Long-running delegated tasks
Agent discovery requirements
Artifact exchange requirements
Cross-framework interoperability needs

The sequence matters more than the ambition. Start with the simplest architecture that exposes the real pressure points, and let those pressure points tell you whether you actually need A2A before committing to the complexity it brings. For most small builders, MCP first and A2A later is the right path.

A Practical Decision Framework

Use this framework when deciding whether A2A belongs in your system.

No A2A when the workflow is local. Avoid A2A when everything runs inside one application and the components are not independently deployable. A Python function, class, service, queue, or workflow engine is probably enough.

MCP when the agent needs tools. Use MCP when your agent needs standardized access to files, databases, APIs, SaaS systems, search indexes, repositories, internal documentation, or observability systems. MCP gives immediate practical value and is the right starting point for most teams building agents today.

A2A when the agent needs peers. Use A2A when your agent needs to communicate with other independent agents — especially when those agents have their own capabilities, policies, state, tools, owners, deployment lifecycle, and security boundary.

Both when the architecture has layers. Use both when specialist agents collaborate with each other and each specialist also needs tools. The production pattern is A2A between agents and MCP between agents and tools. That is the most sensible version of the 2026 agent protocol stack, and the architecture that maps most cleanly onto how production multi-agent systems are actually being built.

Common Mistakes With A2A

Using A2A because it sounds strategic. This is the classic enterprise architecture trap. A2A should solve a real boundary problem that exists in the architecture, not one invented to justify the protocol choice. If there is no genuine boundary — no independent deployment, no separate ownership, no distinct security perimeter — there is probably no need for A2A.

Treating MCP and A2A as competitors. MCP is not obsolete because A2A exists, and A2A is not unnecessary because MCP exists. They address different structural problems and work best as complementary layers, not competing alternatives.

Exposing every capability as an agent. A calculator does not need to be an agent. A weather API does not need to be an agent. A database query does not need to be an agent. Many things are straightforward tools, and the agent abstraction adds overhead without adding clarity when applied to components that have no meaningful autonomy, state, or lifecycle of their own.

Hiding a full agent behind one tool. The opposite mistake is also common. If a "tool" has its own task lifecycle, memory, policies, artifacts, and delegation behavior, it might deserve to be modeled as an agent rather than squeezed behind a function call boundary.

Ignoring observability. Multi-agent systems without traces are painful to debug and impossible to audit. You need to know which agent received the task, which messages were exchanged, which tools were called, which artifacts were produced, which policies were applied, and which agent made the final decision. Without that visibility, debugging becomes archaeology — reconstructing what happened by inference rather than observation. The full observability stack for AI and LLM-backed systems, including metrics, distributed traces, and SLOs that span agent boundaries, is covered in Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production.

So Is A2A Overhyped?

Yes, partly. A2A is overhyped when it is presented as the inevitable default for all AI agent systems, when people imply that every developer needs to adopt it immediately, when agent demos use A2A to coordinate what could have been three function calls, or when the protocol discussion ignores identity, authorization, observability, and production operations. These are real examples of hype that makes A2A sound more universal than it is.

But overhyped does not mean useless. Many important technologies are overhyped before they become boring infrastructure, and the hype often arrives well before the ecosystem is mature enough to support it. The real question is not whether the marketing is excessive — it clearly is at times. The real question is whether the underlying abstraction is useful, and for A2A, the answer is yes when agents become genuinely independent actors in a system with real boundaries, real ownership, and real stakes.

So Is A2A Dead?

No.

The "A2A is dead" argument made more sense during the early skepticism phase, when the protocol looked like a Google-led response to MCP momentum.

In 2026, that argument is weaker.

A2A has a formal specification, ecosystem support, Linux Foundation momentum, major cloud attention, and reported production deployments.

None of that makes A2A dominant, mandatory, or universally loved by the developer community — but it clearly is not dead. A better statement is that A2A is alive and still proving its production value beyond enterprise and platform ecosystems, which is where most of the confirmed deployments currently live.

So Is A2A Finally Useful In 2026?

Yes, but only in the right architecture. A2A is useful when your system has real agent boundaries — not just because your code has multiple prompts, or because your system uses the word "agent" in variable names. It becomes useful when agent collaboration genuinely needs standard structure:

Discovery
Capabilities
Task lifecycle
Messages
Artifacts
Long-running work
Opaque implementation boundaries
Cross-vendor interoperability

That is where A2A earns its place, by providing a common contract for collaboration that would otherwise require custom protocol work at every boundary.

My Opinionated Take

A2A is not the protocol most developers should start with — MCP is. MCP solves a more immediate and broadly applicable problem: connecting agents to useful tools and context. A2A solves a later-stage problem: connecting independent agents to each other across real deployment and ownership boundaries. That makes MCP more useful today for the vast majority of individual developers and small teams.

A2A may become more important as agent systems mature from demos into enterprise workflows. Once organizations have multiple specialist agents owned by different teams, the need for a standard agent-to-agent boundary becomes obvious and the overhead of the protocol starts to pay for itself.

My practical recommendation is to start with MCP, design clean agent boundaries from the beginning, and add A2A only when those boundaries become real deployment, ownership, or interoperability constraints. Do not adopt A2A for vibes. Adopt it when the architecture demands it.

Final Verdict

Google's A2A protocol is not dead.

It is also not the universal future of every AI agent project.

It is a useful, still-maturing protocol for a specific problem: communication between independent AI agents.

If you are building a simple assistant, A2A is probably unnecessary.

If you are building a multi-agent enterprise system, an agent marketplace, a vendor-neutral agent network, or a set of independently deployed specialist agents, A2A is worth serious attention.

The best 2026 framing is not:

A2A vs MCP

It is:

MCP for tools.
A2A for agents.
Both for serious multi-agent systems.

That is less dramatic than a protocol war narrative, but it is also more accurate and more useful to engineers who need to make real architectural decisions.

Sources

Polling Agents in AI Assistants: 11 Implementation Patterns

Rost — Sat, 27 Jun 2026 13:27:16 +0000

Polling agents are one of the least glamorous parts of AI assistant architecture, but they are also one of the most useful.

A normal chat assistant waits for the user to ask something. A polling agent keeps watching. It checks a source, notices changes, decides whether anything matters, and then acts. That action may be a notification, a summary, a draft, a tool call, or a full workflow.

This is how an assistant moves from "answer my question" to "keep an eye on this for me." Instead of being reactive, it becomes a background process that notices things on the user's behalf and acts when conditions are met.

The important design point is simple: do not make the language model responsible for time, state, retries, or locking. Use normal backend infrastructure for that. Use the model where it is valuable: interpreting messy context, making semantic judgments, and producing useful language.

What Is a Polling Agent?

A polling agent is a background process that repeatedly checks a source and triggers an assistant action when a condition is met. In the broader AI Systems stack — where the assistant combines an LLM, memory, tooling, routing, and observability — the polling layer is what makes the assistant proactive rather than purely reactive. For the full five-layer picture, see AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability.

Examples:

Check an inbox every morning and summarize important messages.
Watch a Notion task list and execute the next todo item.
Monitor a GitHub issue until it changes status.
Poll a long-running AI job until the result is ready.
Check a booking slot until one becomes available.
Watch a supplier portal until a document appears.
Scan new research papers once per week and summarize relevant ones.

A practical polling agent has five responsibilities:

Wake up at the right time.
Read from the source.
Remember what it has already seen.
Decide whether the new state matters.
Act once, safely, without repeating itself.

A typical production flow looks like this:

scheduler
  -> polling worker
  -> source system
  -> state store
  -> deterministic filters
  -> optional LLM evaluation
  -> assistant action

This structure is boring in the best possible way. Boring systems are easier to debug at 2 AM.

The State Every Polling Agent Needs

Polling agents need durable state. Conversation history is not enough. The assistant may remember the conversation, but the system needs a reliable operational record.

A good polling state record usually contains:

{
  "poll_id": "poll_123",
  "user_id": "user_456",
  "source_type": "notion",
  "source_ref": "database_tasks",
  "condition": "take one task in Todo state and execute it",
  "interval_seconds": 600,
  "last_run_at": "2026-06-19T01:00:00Z",
  "next_run_at": "2026-06-19T01:10:00Z",
  "last_seen_cursor": "cursor_or_timestamp",
  "last_result_hash": "b64e8a...",
  "failure_count": 0,
  "status": "active"
}

The exact schema depends on the source, but most systems need these concepts.

Poll Definition

This describes what the agent is watching and why.

poll_id
user_id
workspace_id
source_type
source_ref
condition_text
priority
status

For example:

source_type: notion
source_ref: Tasks database
condition_text: Find one Todo task, claim it, execute it, mark it Complete.

Schedule

This describes when the agent should run.

interval_seconds
cron_expression
timezone
last_run_at
next_run_at
jitter

For a Hermes agent that checks Notion every 10 minutes:

interval_seconds: 600
timezone: Australia/Melbourne

Cursor or Snapshot

This helps the agent avoid reprocessing the same data.

Depending on the source, this may be:

last_seen_id
last_seen_timestamp
api_cursor
etag
version
content_hash

For a Notion task queue, the cursor may be less important than task status and claim fields. For Gmail, GitHub, or a sync API, the cursor is usually critical.

Claim or Lease

This prevents two workers from taking the same job.

claimed_by
claimed_at
claim_expires_at
run_id

For example, a Notion task can be changed from:

Status: Todo

to:

Status: InProgress
ClaimedBy: hermes
ClaimedAt: 2026-06-19T01:00:00Z
ClaimExpiresAt: 2026-06-19T01:30:00Z
RunId: run_789

This is the difference between "I hope only one worker picks it" and "the system has a claim protocol."

Execution Record

This records what happened during a run.

run_id
poll_id
source_object_id
started_at
finished_at
status
items_checked
items_changed
decision_summary
error

The execution record should live in the assistant backend, not only in Notion or another external tool. Notion is good for human visibility. It is not ideal as your only execution log.

Dedupe Record

This prevents duplicate notifications or repeated actions.

dedupe_key
poll_id
source_object_id
condition_version
action_type
delivered_at

For example:

user_456:poll_123:notion_page_999:execute:v1

If the same action is attempted again, the system can suppress it.

Method 1: Scheduled Polling Worker

This is the simplest reliable pattern.

A scheduler wakes up every fixed interval and calls a worker. The worker reads the source, updates state, and triggers an assistant action if required.

scheduler
  -> worker
  -> source API
  -> database
  -> assistant action

How It Runs

The scheduler is responsible for time. It might be cron, a cloud scheduler, a Kubernetes CronJob, or a small internal scheduler.

Every interval, it starts a worker run. The worker loads its configuration, queries the target source, compares the result with stored state, and acts if needed.

For a simple assistant, this is often enough. A single scheduler and a lightweight worker process can handle dozens of daily checks without requiring queues, leases, or distributed coordination.

State Model

The scheduler stores very little. Usually it only knows when to trigger a job.

The application database stores the important state:

poll definition
schedule
cursor or snapshot
last run time
failure count
status

The worker should be stateless. It can hold temporary data while running, but the durable truth belongs in the database.

Example Flow

Every 10 minutes:
  trigger Hermes polling worker

Worker:
  load active poll configuration
  query source
  compare with previous state
  run deterministic checks
  call LLM only if needed
  update state
  emit assistant event

Best Fit

Use scheduled polling workers for:

Daily summaries.
Hourly checks.
Small internal automations.
Simple "watch this" tasks.
Low to medium volume assistant jobs.

Weaknesses

Scheduled polling is easy to understand, but it can become fragile at scale. If many polls run at the same time, you may overload your workers or hit provider rate limits. Retries can also become messy if the scheduler directly starts the work.

Method 2: Queue-Based Polling Workers

Queue-based polling is usually the best default for production AI assistants.

The scheduler does not execute the poll directly. It puts a job on a queue. Worker processes consume jobs from the queue.

scheduler
  -> queue
  -> worker pool
  -> source API
  -> state store
  -> assistant action

How It Runs

A scheduler scans for due polls and enqueues jobs. Workers pull jobs when they have capacity.

This gives you backpressure. If the system is busy, jobs wait in the queue instead of overwhelming the source API or the LLM provider.

State Model

The database stores the poll state:

poll_id
user_id
source_ref
condition_text
next_run_at
cursor
status
failure_count

The queue message should stay small:

{
  "poll_id": "poll_123",
  "scheduled_for": "2026-06-19T01:10:00Z",
  "attempt": 1
}

The worker loads the full state from the database when it starts.

Example Flow

Every minute:
  scheduler finds polls where next_run_at <= now
  scheduler enqueues jobs

Workers:
  pull jobs from queue
  lock or lease the poll
  query the source
  update state
  emit assistant action if needed
  set next_run_at

Best Fit

Use queue-based polling for:

Multi-user AI assistants.
Many simultaneous polls.
Integrations with rate limits.
Retriable background work.
Jobs that may take different amounts of time.
SaaS products where reliability matters.

Weaknesses

Queues add infrastructure. You need dead letter handling, idempotency, visibility timeouts, and retry policies. This is worth it for production systems, but probably excessive for a small prototype.

Method 3: External Tool as a Task Queue

This is the pattern in the Notion plus Hermes example.

The external tool is not just a data source. It becomes the human-facing task queue. The agent periodically checks the tool, claims one task, executes it, and updates the task status.

scheduler
  -> Hermes worker
  -> Notion database
  -> claim one task
  -> execute task
  -> update Notion status

How It Runs

Every 10 minutes, Hermes queries the Notion database for one task in Todo state. It chooses the next task, usually by priority and creation time. Then it claims the task by setting it to InProgress.

After that, Hermes executes the task. If execution succeeds, it marks the task as Complete. If execution fails, it marks the task as Failed or returns it to Todo with a retry count.

State Model

Notion stores the human-facing task state:

Title
Description
Status: Todo | InProgress | Complete | Failed
Priority
CreatedAt
ClaimedBy
ClaimedAt
ClaimExpiresAt
RunId
RetryCount
LastError
CompletedAt

Hermes backend stores the operational execution state:

run_id
notion_page_id
started_at
finished_at
execution_status
tool_calls
LLM trace
error details
idempotency_key

This split matters. Notion is excellent for visibility and manual editing. Hermes backend is better for logs, retries, dedupe, and audit history.

Example Flow

Every 10 minutes:
  Hermes wakes up

Hermes:
  query Notion for one task where Status = Todo
  sort by Priority, CreatedAt
  update selected task to InProgress
  set ClaimedBy, ClaimedAt, ClaimExpiresAt, RunId
  execute the task
  write execution log
  set task to Complete or Failed

Best Fit

Use this pattern when:

Humans already manage work in Notion, Jira, Linear, Trello, or another tool.
You want the assistant to process visible tasks.
The task board is the user interface.
You need a simple human-in-the-loop automation model.

Weaknesses

External tools are rarely perfect queues. Atomic claims may be limited. Query consistency may lag. Rate limits may apply. If the agent can run in multiple instances, you need a careful claim or lease strategy.

The practical recommendation is to use Notion as the human-facing task inbox while keeping all execution logs, retry records, traces, and idempotency keys in Hermes. Notion gives users visibility; Hermes keeps the system reliable. For the dispatcher and concurrency mechanics that sit behind this pattern in Hermes, see Kanban in Hermes Agent for Self Hosted LLM Workflows.

Method 4: Long-Running Worker Loop

A long-running loop is the simplest implementation.

while True:
    due_polls = db.find_due_polls()
    for poll in due_polls:
        run_poll(poll)
    sleep(30)

This pattern combines scheduling and execution in one service, which makes it the simplest possible starting point for background agent work.

How It Runs

The worker process runs continuously. Every few seconds or minutes, it checks the database for due polls and executes them. It is easy to build, easy to reason about, and fast to iterate on during development.

State Model

The database still stores durable state:

poll configuration
next_run_at
cursor
last result
failure count
status

The process memory should only contain temporary state:

current batch
short-lived cache
in-flight run

Never store important progress only in memory. If the process crashes, any state that was not written to durable storage is gone, and the next run will have no way to know where things left off.

Best Fit

Use long-running loops for:

Prototypes.
Local development.
Internal tools.
Single-tenant systems.
Low-volume agents.

Weaknesses

This pattern becomes risky with multiple replicas. Without leases, two workers may run the same poll. It also lacks the operational features of a real queue or workflow engine.

A long-running loop is not wrong as a starting point, but it is not a distributed scheduler and should not be treated as one. As soon as you need multiple replicas or stronger reliability guarantees, you will need to move to one of the more structured patterns above.

Method 5: Webhook-First With Polling Fallback

If the source supports webhooks, use them. Polling should often be the backup, not the primary mechanism.

external system
  -> webhook endpoint
  -> event store
  -> assistant action

reconciliation poll
  -> source API
  -> compare with event store
  -> repair missed events

How It Runs

The external system sends events to your webhook endpoint when something changes. Your system stores the event and processes it asynchronously.

A slower reconciliation poll runs every few hours or once per day. It checks whether any events were missed.

State Model

The event store records incoming webhooks:

event_id
source_type
source_object_id
event_type
received_at
payload_hash
processed_at
signature_valid

The reconciliation poll stores:

last_reconciliation_at
last_seen_cursor
last_seen_version

The source object table stores the latest known state:

external_id
current_status
external_updated_at
last_processed_event_id

Best Fit

Use webhook-first architecture for:

GitHub events.
Stripe events.
Slack events.
CRM updates.
Deployment notifications.
Ticketing systems.

Weaknesses

Webhooks require a public endpoint, signature validation, replay protection, and event dedupe. Some providers also send incomplete events, so you may still need to fetch the full object.

Even so, if good webhooks exist, polling every minute is usually wasteful.

Method 6: Provider-Side Background Job Polling

Sometimes the thing being polled is the AI job itself.

The application starts a long-running provider job, stores the job ID, and checks later whether it has completed.

app
  -> start AI background job
  -> store provider job id
  -> poll status
  -> fetch result
  -> notify user

How It Runs

The assistant starts a job with the provider. The provider returns an ID. Your backend stores that ID and checks its status until the job succeeds, fails, expires, or times out.

State Model

Your backend stores:

assistant_task_id
provider_job_id
user_id
status
created_at
last_checked_at
expires_at
result_ref

The provider stores the temporary job state and output.

If the output matters, copy it into your own durable storage as soon as the job completes. Provider-side result storage has short retention windows and is not a substitute for a proper archive in your own system.

Best Fit

Use provider-side background job polling for:

Long AI research tasks.
Large document processing.
Codebase analysis.
Report generation.
Data extraction jobs.
Tasks that exceed normal HTTP request timeouts.

Weaknesses

This pattern solves one problem: waiting for a long provider job. It does not replace your workflow engine, scheduler, queue, or business state store.

Method 7: Durable Workflow Engine

A durable workflow engine manages long-running execution, timers, retries, and recovery. Temporal is the most common choice for Go and Python-based assistant backends; for a full implementation guide see Implementing Workflow Applications with Temporal in Go.

Instead of manually wiring every wait and retry, you model the process as a workflow.

workflow engine
  -> activity: check source
  -> timer: wait
  -> activity: evaluate result
  -> activity: notify user

How It Runs

The workflow starts once and then controls its own waiting. It can sleep for minutes, days, or weeks. If the worker process crashes, the workflow engine can resume from the recorded state.

State Model

The workflow engine stores:

workflow_id
execution history
timer state
activity attempts
retry policy
current workflow state

Your application database stores:

user-facing poll definition
authorization references
business records
notification records

The workflow engine owns process state — execution history, timers, retries, and activity attempts. Your database owns business state — user configurations, authorization records, notifications, and audit logs. Keeping these separate prevents each layer from becoming a confused hybrid of both.

Best Fit

Use durable workflows for:

Multi-step business processes.
Long-running automations.
Human approval flows.
Reliable retries.
Auditable background work.
Processes that must resume after failure.

Weaknesses

Workflow engines add concepts and infrastructure. They are excellent when the process is important, but heavy for simple hourly checks.

Method 8: Persistent Agent Runtime

Some agent frameworks can persist agent state, checkpoint execution, and resume later.

This is useful when the agent itself has a multi-step reasoning process.

scheduler or workflow
  -> agent runtime
  -> load checkpoint
  -> call tools
  -> save checkpoint
  -> resume later

How It Runs

An external scheduler or workflow starts the agent. The agent runtime loads previous state, runs the next step, calls tools if needed, and writes a checkpoint.

The agent runtime should not be your only scheduler. It is better treated as the reasoning layer inside a larger backend architecture.

State Model

Agent checkpoint storage contains:

current node
messages
tool outputs
intermediate reasoning state
pending action

Long-term memory contains:

stable user preferences
facts
project context
source references

Operational state still belongs elsewhere:

poll schedule
cursor
status
retry count
dedupe records

A useful rule: memory is not a cursor, and a checkpoint is not a queue. Agent memory stores what the model knows; operational state tracks where the process is and what it has done. Conflating the two leads to subtle bugs that only appear under concurrency or after a restart. The full design space for working memory, durable state, and retrieval layers is covered in Memory Systems in AI Assistants.

Best Fit

Use persistent agent runtime for:

Multi-step research.
Agents that pause and resume.
Human-in-the-loop work.
Tool-heavy reasoning.
Tasks where context accumulates over time.

Weaknesses

Agent persistence is not the same as operational reliability. You still need scheduling, locking, retries, rate limits, and audit logs.

Method 9: Database Sync Plus Change Evaluation

In this pattern, polling is used to sync external data into your own database. The assistant then reacts to local database changes rather than querying external APIs directly on every evaluation cycle.

sync poller
  -> external API
  -> local database
  -> change evaluator
  -> assistant action

This separates data synchronization from assistant intelligence. The sync worker is responsible for keeping local records current; the evaluator is responsible for deciding what to do about changes. Each layer can be tested, monitored, and scaled independently.

How It Runs

The sync worker periodically fetches external changes and writes normalized records into your database. A second worker or change stream detects updated rows and decides whether the assistant should act.

State Model

The sync table stores:

external_id
source_type
raw_payload
normalized_fields
external_updated_at
synced_at
version
content_hash

The sync state stores:

source_cursor
last_sync_at
rate_limit_status
failure_count

The assistant evaluation table stores:

object_id
evaluation_status
last_evaluated_hash
decision
notification_id

Best Fit

Use this pattern for:

CRM sync.
Ticketing systems.
Accounting documents.
Product inventory.
Compliance review.
Search indexing.
Internal dashboards.

Weaknesses

Syncing everything can be expensive and unnecessary. It may also create privacy and retention obligations. Use this pattern when local data has value beyond a single assistant action.

Method 10: Adaptive Polling

Adaptive polling changes frequency based on state, urgency, or recent activity.

active object: poll every 1 minute
waiting object: poll every 1 hour
stale object: poll once per day
completed object: stop polling

How It Runs

After each run, the worker decides when the next run should happen.

If the object changed recently, poll sooner. If nothing has changed for a long time, slow down. If the task is complete, stop.

State Model

The poll state includes:

current_interval
minimum_interval
maximum_interval
backoff_policy
last_activity_at
priority
stop_condition

The source snapshot includes:

status
updated_at
activity_level
expected_next_change

Best Fit

Use adaptive polling for:

Deployment status.
Delivery tracking.
Calendar slot availability.
Price monitoring.
Build jobs.
Long-running provider tasks.
Any source with bursty updates.

Weaknesses

Adaptive polling can be harder to reason about. If a task must run at a strict time, keep it strict. Do not make compliance jobs clever.

Method 11: Semantic Polling With an LLM Evaluator

Semantic polling is used when the condition is fuzzy.

Code can answer:

Is status equal to Complete?
Is price below 100?
Is there a new message?

An LLM can help answer:

Does this email sound urgent?
Is this customer likely unhappy?
Is this research paper relevant?
Does this change require my attention?

How It Runs

The worker first applies cheap deterministic filters. Only candidate items go to the LLM.

new item?
matches source filters?
not already processed?
not obviously irrelevant?

Then the LLM evaluates the smaller candidate set and returns structured output.

{
  "should_notify": true,
  "urgency": "high",
  "reason": "The customer reports a production outage."
}

State Model

The poll definition stores:

semantic_condition
examples
negative_examples
user_preference_summary
model_config

The evaluation log stores:

input_reference
model
prompt_version
structured_output
confidence
cost
latency

The poll state stores:

last_seen_ids
last_evaluated_hashes
last_decision
last_decision_reason

Best Fit

Use semantic polling for:

Important email detection.
Customer sentiment monitoring.
Research alerts.
Sales opportunity detection.
Security triage.
Executive briefings.

Weaknesses

LLM calls cost money and add latency. They can also be inconsistent if prompts and schemas are loose. Use deterministic filters first. Ask the model only when judgment is actually needed.

Decision Table: Choosing a Polling Agent Method

Method	Best Application	Pros	Cons
Scheduled polling worker	Simple recurring assistant tasks	Easy to build, easy to debug, minimal infrastructure	Limited scaling, basic retries, can overload workers if many polls fire together
Queue-based polling workers	Production SaaS assistants with many users	Scalable, resilient, supports retries and backpressure	Requires queue infrastructure, idempotency, dead letter handling
External tool as task queue	Notion, Jira, Linear, Trello based task execution	Human-friendly, easy to inspect, works with existing workflows	External tools are not perfect queues, atomic claim may be difficult
Long-running worker loop	Prototypes and internal tools	Very simple, fast to implement, few moving parts	Weak reliability, poor multi-replica behavior, limited operational control
Webhook-first with polling fallback	Event-driven integrations	Fast reaction, fewer API calls, reconciliation catches missed events	Needs public endpoint, event validation, dedupe, provider webhook support
Provider-side background job polling	Long-running AI provider jobs	Handles slow AI tasks, simple status model, good for async UX	Only manages provider job status, not full business workflow
Durable workflow engine	Long-running multi-step processes	Strong retries, timers, audit history, recovery after crashes	More infrastructure and concepts, heavy for simple polling
Persistent agent runtime	Multi-step reasoning agents	Preserves agent context, supports pause and resume, good for tool-heavy tasks	Not a scheduler or queue replacement, still needs operational backend
Database sync plus change evaluation	Systems where external data has local value	Clean separation, local reporting, fewer repeated external calls	More storage, more sync complexity, possible privacy and retention concerns
Adaptive polling	Bursty sources or variable urgency tasks	Reduces cost, respects rate limits, reacts faster when activity is high	Harder to reason about, not ideal for strict schedules
Semantic polling with LLM evaluator	Fuzzy conditions requiring judgment	Handles natural language intent, useful summaries, flexible decisions	Cost, latency, prompt quality risk, should not replace simple code checks

Recommended Default Architecture

For most production AI assistants, start with this:

polls table
  -> scheduler
  -> queue
  -> stateless workers
  -> deterministic filters
  -> optional LLM evaluator
  -> notification or assistant action

A minimal schema:

CREATE TABLE polls (
    id TEXT PRIMARY KEY,
    user_id TEXT NOT NULL,
    source_type TEXT NOT NULL,
    source_ref TEXT NOT NULL,
    condition_text TEXT NOT NULL,
    schedule_type TEXT NOT NULL,
    interval_seconds INTEGER,
    timezone TEXT,
    next_run_at TIMESTAMP NOT NULL,
    last_run_at TIMESTAMP,
    cursor_value TEXT,
    last_hash TEXT,
    status TEXT NOT NULL,
    failure_count INTEGER NOT NULL DEFAULT 0,
    last_error TEXT,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

CREATE TABLE poll_runs (
    id TEXT PRIMARY KEY,
    poll_id TEXT NOT NULL,
    started_at TIMESTAMP NOT NULL,
    finished_at TIMESTAMP,
    status TEXT NOT NULL,
    items_checked INTEGER,
    items_matched INTEGER,
    decision_summary TEXT,
    error TEXT
);

CREATE TABLE notifications (
    id TEXT PRIMARY KEY,
    poll_id TEXT NOT NULL,
    user_id TEXT NOT NULL,
    dedupe_key TEXT NOT NULL,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    delivered_at TIMESTAMP,
    UNIQUE (dedupe_key)
);

This gives you a clean separation:

scheduler owns time
queue owns buffering
worker owns execution
database owns state
LLM owns semantic judgment
assistant owns user interaction

That separation is the heart of a reliable polling agent.

Example: Hermes Agent Processing Notion Tasks

Now let us apply the architecture to a concrete case.

Assume a Notion database contains tasks. Hermes should run every 10 minutes, take one task in Todo state, set it to InProgress, execute it, and then mark it Complete.

This is best described as:

external tool as task queue
+
scheduled polling worker
+
claim or lease based execution

For a production version, it becomes:

queue-based polling with Notion as the human-facing task inbox

Notion Task Properties

The Notion database should contain fields like:

Name
Status: Todo | InProgress | Complete | Failed
Priority
CreatedAt
ClaimedBy
ClaimedAt
ClaimExpiresAt
RunId
RetryCount
LastError
CompletedAt

The important fields are ClaimedAt, ClaimExpiresAt, and RunId. They make the task claim visible and recoverable.

Hermes Execution State

Hermes should also keep its own execution record:

run_id
notion_page_id
started_at
finished_at
status
input_snapshot
tool_calls
result_summary
error
idempotency_key

This protects you if Notion is edited manually, if an API call fails, or if you need to audit what Hermes actually did.

Execution Flow

Every 10 minutes:
  Hermes scheduler creates a run

Hermes worker:
  finds one Notion task where Status = Todo
  sorts by Priority and CreatedAt
  claims the task by setting Status = InProgress
  writes ClaimedBy, ClaimedAt, ClaimExpiresAt, and RunId
  executes the task
  writes execution logs to Hermes backend
  sets Notion Status = Complete on success
  sets Notion Status = Failed on failure

If Hermes crashes after claiming a task, the lease can expire:

Status = InProgress
ClaimExpiresAt < now

A future run can then recover the task or mark it as failed.

Failure Handling

On success:

Status = Complete
CompletedAt = now
LastError = empty

On recoverable failure:

Status = Todo
RetryCount = RetryCount + 1
LastError = short error message

On non-recoverable failure:

Status = Failed
LastError = clear explanation

For safety, Hermes should also use an idempotency key:

notion_page_id + task_version + action_type

This prevents the same task from being executed twice if a retry happens at the wrong time.

Why This Is Not Just Polling

The polling part is only the wake-up mechanism. The real architecture is task claiming and reliable execution.

A naive implementation says:

Every 10 minutes, find a Todo task and do it.

A reliable implementation says:

Every 10 minutes, claim exactly one eligible task, record the run, execute idempotently, and move the task to a terminal state.

That is the difference between a demo and an agent you can trust.

Common Polling Agent Mistakes

Mistake 1: No Claim Protocol

If two workers can see the same task, they can both execute it.

Use:

ClaimedBy
ClaimedAt
ClaimExpiresAt
RunId

Even if you currently run one worker, design as if a second worker might appear later.

Mistake 2: No Dedupe Key

Every external action should have a dedupe key.

user_id + poll_id + source_object_id + action_type + condition_version

This prevents repeated notifications, repeated emails, repeated task execution, and repeated tool calls. The broader principles behind scoping, storing, and testing these keys apply equally here — see Idempotency in Distributed Systems That Actually Works.

Mistake 3: Calling the LLM Too Early

Do not ask the model to do database filtering.

Bad:

Send all tasks to the LLM and ask which one is Todo.

Better:

Use the Notion API filter to fetch Todo tasks.
Then use the LLM only if task interpretation is needed.

Mistake 4: Treating Notion as the Only Backend

Notion is a good human interface. It is not a complete execution backend.

Keep execution logs, retries, traces, and idempotency records in Hermes.

Mistake 5: Infinite Polling

Every poll should have a stop condition.

Examples:

stop after success
stop after date
stop after max retries
stop when user disables it
stop after repeated authorization failure

A polling agent without a stop condition is a quiet cost leak.

Mistake 6: No Observability

You should be able to answer:

What did the agent run?
Why did it run?
What did it read?
What did it change?
Why did it fail?
Did it notify the user?
Did it run twice?

If you cannot answer those questions, the system is not ready for important work.

Observability Checklist

Track metrics such as:

polls_due
polls_started
polls_succeeded
polls_failed
tasks_claimed
tasks_completed
tasks_failed
claim_expired_count
duplicate_suppressed_count
llm_calls
llm_cost
rate_limit_count
average_run_duration

Log fields such as:

poll_id
run_id
source_type
source_object_id
claim_id
cursor_before
cursor_after
decision
dedupe_key
error

Build an admin view for:

active polls
stuck InProgress tasks
recent failures
high retry tasks
dead letter jobs
expensive LLM evaluations
disabled integrations

Polling agents run in the background, where failures are quiet and problems can compound before anyone notices. Background systems need visibility built in from the start, not added as an afterthought when something goes wrong. For the full observability stack for AI and LLM-backed systems — metrics, traces, structured logs, and SLOs — see Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production.

Final Recommendation

For a serious AI assistant, start with queue-based polling workers and a durable state store. Add webhooks where providers support them. Use adaptive polling when rate limits matter. Use a durable workflow engine when the process is long-running and multi-step. Use persistent agent runtime when the agent needs to reason over time.

For the Hermes and Notion example, the right architecture is:

Notion as the human-facing task inbox
Hermes scheduler every 10 minutes
Hermes worker with claim or lease logic
Hermes backend for execution logs and idempotency
Notion status updates for visibility

The polling interval is not the hard part. The hard part is making sure the agent claims one task, runs it once, records what happened, and leaves the system in a state humans can understand.

That is what turns a polling script into a reliable AI assistant — not the interval, not the model, but the discipline around claiming work, recording it, and leaving the system in a state that humans and future runs can both understand.

What Is the A2A Protocol? Agent Cards and Tasks Explained

Rost — Fri, 26 Jun 2026 10:33:45 +0000

The A2A Protocol, short for Agent2Agent Protocol, is an open standard for communication between independent AI agent systems.

That sentence sounds simple, but it implies something most AI agent demos skip entirely. Most demos still assume one assistant, one runtime, one tool loop, and one owner — the agent can search, call tools, write code, query APIs, maybe use MCP servers, and return an answer.

A2A is designed for a different world, one where agents may be built by different teams, frameworks, vendors, languages, or organizations. It assumes one agent may need to discover another agent, understand what it can do, send it work, exchange messages, receive files or structured outputs, and track a task until completion — making it not just another tool calling format, but a genuine attempt to make AI agents interoperable as peers.

The core concepts are:

Agent Cards
Agents and clients
Tasks
Messages
Parts
Artifacts
Task states
Streaming and asynchronous updates

This article explains those concepts in plain engineering terms, with enough detail to understand where A2A fits in real multi-agent systems.

The Short Definition

A2A is a protocol for agent-to-agent communication.

It lets one agent or client communicate with another agent through a common model. The receiving agent can describe its capabilities, accept work, manage the lifecycle of that work, ask for more input, stream progress, and return concrete outputs.

The point is not to standardize how an agent thinks internally — it is to standardize how agents talk at their boundaries.

An A2A agent might internally use:

Python
Go
JavaScript
LangGraph
CrewAI
Semantic Kernel
custom code
MCP servers
private APIs
vector databases
workflow engines

The caller does not need to know any of that. What the caller does need to know is:

What can this agent do?
How do I talk to it?
What input does it accept?
What output can it produce?
How do I track the work?
How do I receive the result?

Those six questions define the protocol boundary A2A is trying to establish between independently operating agents.

Why A2A Exists

AI systems are moving from single assistants to networks of specialist agents.

A company might have:

A support agent
A billing agent
A legal review agent
A DevOps agent
A data analysis agent
A research agent
A documentation agent
A code review agent

Each agent may have its own tools, permissions, domain knowledge, prompts, memory, retrieval system, and audit rules.

Without a shared protocol, every integration becomes custom — the support agent needs bespoke wiring to the billing agent, the billing agent needs its own to the legal agent, and the research agent needs yet another to the documentation agent. That combinatorial overhead does not scale well as the agent network grows.

A2A gives these agents a common way to interact, reducing the N×M integration problem to a single shared contract. The promise is not magic autonomy; the promise is interoperability.

A2A Is Not MCP

A2A is often compared with MCP, but they solve different problems.

MCP, or Model Context Protocol, is mainly about connecting an AI app or agent to tools, resources, and prompts, while A2A is mainly about connecting agents to other agents.

A useful mental model is:

MCP: agent to tool
A2A: agent to agent

For example, an agent may use MCP to access:

GitHub
a filesystem
a database
Slack
a documentation search system
a cloud API

Practical guides for building those MCP servers are available for Go and Python.

The same agent may use A2A to delegate work to:

a security review agent
a research agent
a planning agent
a compliance agent
a coding agent

The two protocols can and often do work together. A clean architecture is often:

A2A outside the agent boundary.
MCP inside the agent boundary.

That means other agents communicate with your agent using A2A, while your agent internally uses MCP to access tools — a clean separation of concerns that keeps the external interface stable regardless of what changes inside. For a detailed comparison of how the two protocols divide architectural responsibility and when you actually need both, see A2A vs MCP: Do AI Agents Really Need Both Protocols?

Core Roles In A2A

A2A uses a simple role model built around two parties: an agent that exposes capabilities, and a client that wants to use them.

The client might be:

another agent
an orchestrator
an assistant application
a workflow system
a gateway
a test harness
a human-facing app

The agent might be:

a specialist AI service
a domain assistant
a workflow-owning agent
a remote vendor agent
an internal enterprise agent

The important thing is that the agent is not just a function. It owns some capability and exposes it through an agent interface.

Agent Cards

The Agent Card is one of the most important concepts in A2A.

An Agent Card describes an agent — it is the discovery document that tells clients what the agent is, what it can do, how to communicate with it, and what constraints apply.

Think of an Agent Card as a mix of:

service metadata
capability declaration
API discovery document
agent profile
contract surface

A typical Agent Card can describe things such as:

agent name
description
service endpoint
supported protocol features
supported input and output modes
available skills
authentication requirements
provider information
version information
documentation links
optional metadata

The Agent Card is important because agents should not need hardcoded knowledge of every other agent.

A client can inspect the card and decide:

Is this the right agent for the job?
Does it support the content type I need?
Does it support streaming?
Does it require authentication?
What skills does it advertise?
Can it return the kind of artifact I need?

In practical systems, Agent Cards become the foundation for agent registries, developer portals, and internal agent catalogs — the machine-readable equivalent of a service directory where clients can look up what is available before committing to an integration.

Agent Cards Are Capability Boundaries

An Agent Card should not be treated as marketing text — it is a capability boundary that other systems will rely on at runtime.

If your agent card says your agent can perform financial analysis, clients may start delegating financial analysis work to it. If it says the agent accepts files, clients may send files. If it says the agent supports streaming, clients may expect progress events.

Bad Agent Cards create bad systems because routing decisions and capability assumptions cascade through the whole agent network. A useful Agent Card should be:

specific
accurate
stable
versioned
security-aware
honest about limitations

A vague skill such as "does business tasks" is not helpful.

A better skill is:

Analyze SaaS invoice data and produce a monthly spend summary.

Even better, include expected input and output modes.

Input: CSV or JSON invoice records.
Output: Markdown summary and structured JSON totals.

The more precise the Agent Card, the easier it is for other agents to route tasks correctly.

Agent Discovery

Agent discovery is the process of finding an Agent Card.

In simple deployments, discovery may be static. A client already knows the URL of a specific agent.

In larger deployments, discovery may involve:

a registry
a developer portal
an internal catalog
DNS-based discovery
configuration management
environment-specific routing
tenant-aware gateways

The important design choice is whether discovery is public, private, or permissioned.

Not every agent should be discoverable by everyone — an internal payroll agent should not expose the same Agent Card to every caller, and a partner agent may see only partner-safe skills. Agent discovery is not just a convenience feature; it is part of your security and governance model, and scoping visibility is a first-class design decision.

Tasks

A Task represents work being performed by an agent.

This is where A2A becomes more interesting than simple request and response APIs.

Some agent interactions are quick. A client sends a message, and the agent returns a direct response.

But many real agent workflows are not instant.

A task might involve:

searching multiple sources
asking for clarification
calling tools
delegating work
waiting for approval
generating a report
producing files
streaming progress
handling retries
returning multiple artifacts

A2A models this kind of work as a Task — giving the work an identity and a lifecycle, which matters because long-running agent work needs to be tracked, inspected, and potentially canceled or retried.

Task Lifecycle

A task can move through different states.

The exact state model depends on the protocol version and implementation, but the basic idea is straightforward:

submitted
working
input required
completed
failed
canceled
rejected

The important point is that a task is not just a response payload — it is an ongoing unit of work with its own state that a client can query at any time. A client can use the task state to understand what is happening:

Has the agent accepted the task?
Is it still working?
Does it need more input?
Did it finish successfully?
Did it fail?
Was it canceled?
Are there artifacts available?

This is especially useful for workflows that take seconds, minutes, or longer.

For example, a research agent may return a task immediately, then continue working in the background while streaming progress events or making the result available later.

Stateless Message Or Stateful Task

A2A supports both simple and complex interactions.

For a simple interaction, an agent may return a direct Message; for a complex interaction, it may return a Task. This distinction matters because not everything needs task tracking, and over-engineering short interactions into full task workflows adds unnecessary overhead.

If a client asks:

Summarize this one paragraph.

A direct response may be enough.

If a client asks:

Research the top five open source vector databases, compare them, and produce a migration recommendation.

A task is more appropriate.

The practical rule is straightforward: use a direct Message for simple, immediate interactions, and use a Task for long-running, stateful, auditable, or artifact-producing work.

Messages

Messages are the communication units exchanged between client and agent.

A message can contain one or more parts.

A message may represent:

a user request
an agent response
a clarification question
additional input
task-related communication
progress context
structured instructions

Messages are not just strings — agent communication often needs to carry far more than plain text, and the message structure is designed to accommodate that.

A message might include:

text
files
structured JSON
images
references
metadata

The message is the envelope; the parts are the actual typed content inside it.

Parts

A Part is a piece of content inside a message or artifact.

This is how A2A supports multimodal and structured communication.

A part may contain different content types, such as:

text
file data
structured data
binary content by reference
JSON-like data

A part can also include metadata such as:

media type
filename
additional context

The media type matters because it tells the receiving agent how to interpret the content.

For example:

text/plain
application/json
text/markdown
image/png
application/pdf
text/csv

This is one of the underrated parts of A2A. Agent communication should not collapse everything into plain text — if a downstream agent needs a spreadsheet, image, JSON payload, log file, or PDF, the protocol should preserve that content as content rather than mangle it into a paragraph. Good agent systems avoid these unnecessary text bottlenecks by letting each part carry its natural media type all the way to the consumer.

Artifacts

Artifacts are concrete outputs produced by an agent during task processing.

This is different from a general message: a message is communication between agents, whereas an artifact is a concrete deliverable the task has produced.

Examples of artifacts include:

a markdown report
a JSON analysis result
a CSV export
a generated image
a PDF document
a code patch
a test result file
a deployment plan
a diagram
a data extract

This distinction is useful in practice. When a research agent says "I found the answer", that is a message. When it returns market-analysis.md, sources.json, and risk-summary.csv, those are artifacts — concrete outputs that make the task's work inspectable, reusable, and composable. One agent's artifact becomes another agent's input without any loss of structure.

Messages vs Artifacts

A simple way to think about it:

Messages are conversation.
Artifacts are output.

Messages help agents coordinate; artifacts are what the task actually produced.

For example, in a software development workflow:

The client sends a message asking for a bug fix.
The coding agent sends messages with clarification questions.
The coding agent works on the task.
The agent returns artifacts such as a patch file, test output, and explanation.

This separation is helpful because it avoids mixing task coordination with deliverables, making it much easier to log, audit, and pass outputs to downstream consumers.

A Practical Example

Imagine a primary assistant needs help from a documentation agent.

The user asks:

Create developer documentation for our new billing webhook API.

The primary assistant checks an agent registry and finds a documentation agent.

The documentation agent has an Agent Card that says it can:

write API documentation
accept OpenAPI specs
accept Markdown style guides
produce Markdown docs
produce examples in Python and JavaScript
support long-running tasks
return artifacts

The primary assistant sends a message with:

a short instruction
an OpenAPI file
a style guide
metadata about the target audience

The documentation agent creates a Task.

The task enters a working state.

The documentation agent may send messages such as:

I am extracting endpoint descriptions.

Then:

I need clarification on authentication examples.

The primary assistant provides the missing input.

The task continues.

Finally, the documentation agent returns artifacts:

billing-webhooks.md
billing-webhook-examples-python.md
billing-webhook-examples-javascript.md

That is the A2A model in action: not just "call this function" but "delegate this task to another agent, communicate as needed, and track the result through to completion."

Why Tasks Matter For Real Systems

Tasks are what make A2A suitable for serious workflows.

A normal HTTP API call is often too thin for agent work. Agent tasks may involve uncertainty, multiple steps, intermediate results, and follow-up questions.

A Task gives you a place to attach:

status
history
messages
artifacts
errors
metadata
progress
cancellation
audit information

This is useful for:

research workflows
code generation
data analysis
compliance review
document production
incident investigation
multi-step planning
human approval workflows

Without a task model, developers usually rebuild this logic themselves with custom job IDs, queues, status endpoints, and webhook callbacks — A2A tries to standardize the agent-specific version of that pattern so you do not have to reinvent it for every new agent integration.

Streaming And Async Work

A2A supports the idea that agent work may be streaming or asynchronous.

Streaming is useful when the client wants live updates.

For example:

progress events
partial results
intermediate status
generated text
step updates

Async workflows are useful when the task may take a long time or the client cannot hold an open connection.

For example:

background research
large document generation
multi-agent review
data processing
human approval
batch analysis

In practice, a robust A2A system should be designed around three modes: immediate response for simple work, streaming for interactive long-running work, and async for durable background work that may outlive any single connection.

Agent Cards And Streaming Support

An Agent Card can advertise whether an agent supports streaming.

This matters because clients cannot assume every agent supports streaming — some agents may only support simple request and response, some may support task polling, and others may support push notifications or server-sent events. A good client inspects the Agent Card before choosing an interaction pattern, which is why Agent Cards are not just documentation: they directly shape runtime behavior.

A2A And Multimodal Agents

A2A is designed to support more than plain text.

That matters because real agent systems increasingly process mixed inputs and outputs:

text
images
audio
video
PDFs
spreadsheets
structured JSON
logs
code
diagrams

If every agent boundary converts everything into text, important information can be lost.

For example, a visual troubleshooting agent should receive an image as an image, not as a weak text description. A finance agent should receive structured spreadsheet data, not a copied paragraph. A code review agent should receive source files or diffs, not a vague summary.

Parts and media types are how A2A preserves richer content across agent boundaries — and this is one of the places where the protocol is more important than it first appears, because information loss at the boundary compounds across every hop in a multi-agent chain.

A2A Is Not An Agent Framework

A2A does not tell you how to build an agent.

It does not define:

reasoning strategy
planning algorithm
memory system
vector database
prompt template
model provider
tool framework
orchestration runtime
evaluation method

That is a feature, not a bug. A2A is a boundary protocol that lets different agent implementations communicate without requiring them to share the same internal architecture — much like HTTP does not tell you how to build a web application, it only defines how systems communicate. A2A should be understood the same way.

A2A Is Not A Replacement For APIs

A2A also does not replace every API.

If you have a deterministic service with a stable request and response contract, a normal API may be better.

For example:

currency conversion
address validation
invoice lookup
image resizing
search endpoint
feature flag lookup
internal CRUD service

These do not automatically become agents just because they are called by an AI system. A2A makes sense when the remote system genuinely behaves like an agent:

it owns a task
it may ask for more input
it may use tools internally
it may take time
it may produce artifacts
it has capabilities worth discovering
it can operate as a peer in a larger workflow

Do not use A2A just because it is fashionable — use it when the abstraction genuinely fits the problem.

Where A2A Fits In AI System Architecture

A2A fits best at the boundary between independently deployable agents.

A useful architecture might look like this:

User
  |
  v
Primary assistant
  |
  |-- A2A --> Research agent
  |-- A2A --> Coding agent
  |-- A2A --> Compliance agent
  |-- A2A --> Documentation agent

Each specialist agent may internally use tools:

Research agent
  |
  |-- MCP --> web search
  |-- MCP --> document store
  |-- MCP --> vector database

This gives you separate layers:

User interface layer
Agent coordination layer
Tool integration layer
Data and execution layer

A2A lives in the agent coordination layer, MCP often lives in the tool integration layer, and normal APIs, queues, databases, and storage systems live below that — each layer with its own abstraction and its own failure modes. For a cross-cutting map of how LLM inference, memory, routing, tooling, and observability fit together inside production assistants, see AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability.

Architecture Pattern: Orchestrator And Specialists

The most common A2A pattern is probably orchestrator plus specialists.

In this pattern, one primary agent receives the user request and delegates pieces of work to specialist agents.

Example:

Primary assistant
  |
  |-- A2A --> Legal agent
  |-- A2A --> Finance agent
  |-- A2A --> Research agent
  |-- A2A --> Writing agent

This pattern is easy to understand: the orchestrator owns the overall workflow, and specialist agents own domain-specific work. The downside is that the orchestrator can become a bottleneck, and it needs a solid routing strategy to delegate effectively — the underlying model selection and orchestration trade-offs are covered in Multi-Model System Design: When One Model Isn't Enough. Still, for most teams this is the best first multi-agent architecture to reach for before exploring more complex topologies.

Architecture Pattern: Peer Agents

In a peer-to-peer pattern, agents can communicate with each other more directly.

For example:

Research agent --> Data agent --> Charting agent --> Writing agent

This can be powerful, but it is harder to control.

You need strong rules for:

who can call whom
what context can be shared
how loops are prevented
who owns final output
how cost is controlled
how delegation is audited

Peer agent networks sound elegant, but they can become chaotic quickly — use them only when you have strong governance rules and clear ownership over every edge in the graph.

Architecture Pattern: A2A Gateway

A more production-friendly pattern is an A2A gateway.

Instead of every agent directly calling every other agent, traffic flows through a gateway.

The gateway can handle:

authentication
authorization
routing
tenant mapping
logging
rate limits
policy checks
protocol version handling
observability
audit trails

This is especially useful in enterprise environments, where the gateway becomes the control plane for agent communication — enforcing policy in one place rather than re-implementing it across every agent. In smaller systems this may be overkill, but in larger systems with multiple teams and vendors it often becomes necessary sooner than expected.

Security Considerations

A2A security deserves serious attention.

Agent-to-agent communication can move sensitive context across boundaries. It can also delegate work to systems that may have their own tools and permissions.

The core security questions are:

Which agents are allowed to discover this agent?
Which agents are allowed to send it tasks?
What authentication is required?
What permissions are attached to the caller?
Can one agent delegate user authority to another?
What data can be included in messages?
What artifacts can be returned?
How is the task audited?
Can the receiving agent call tools or other agents?
How are secrets protected?

Agent Cards should not contain static secrets, and sensitive Agent Cards should be protected behind authentication rather than published openly. Different clients often need different views of the same agent — an internal caller may see more skills than an external partner, while a public client may see only a limited set of safe capabilities.

Security should not be added after the agent network is built; it should shape the network from the start, because retrofitting auth and permission boundaries across a live agent topology is significantly harder than designing them in.

Observability Considerations

A2A systems need strong observability.

When a task crosses agent boundaries, debugging becomes substantially harder because no single system holds the full picture. You need to know:

which agent created the task
which agent accepted it
what messages were exchanged
what state changes occurred
what artifacts were produced
what errors happened
how long each step took
what tools were used internally
whether another agent was called
who approved risky actions

A useful trace should follow the work across the full chain.

For example:

user request
  -> primary assistant task
  -> research agent task
  -> document search tool call
  -> summarization artifact
  -> final response

Without that end-to-end trace, multi-agent systems become very hard to trust in production — you cannot confidently answer why the system produced a given output, let alone identify where it went wrong. Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production covers the instrumentation and tooling side of this problem in depth.

Common Mistakes

Mistake 1: Calling Every Tool An Agent

Not every tool is an agent.

A calculator is a tool. A file reader is a tool. A database query endpoint is a tool.

If it does not own a task, ask for input, produce artifacts, or behave as an independent peer, it probably does not need A2A.

Mistake 2: Making Agent Cards Too Vague

An Agent Card should not say:

This agent helps with business tasks.

That is useless to any agent trying to route work intelligently. A good card should say what the agent actually does, what it accepts, what it returns, and what constraints apply.

Mistake 3: Ignoring Task State

If you use A2A but treat every interaction as request and response, you are missing much of the value.

The task model is one of the primary reasons to use A2A over a plain API — skipping it means rebuilding the same lifecycle tracking logic in every integration.

Mistake 4: Returning Everything As Text

A2A supports structured and multimodal content. Use it.

If the output is a report, return a report artifact.

If the output is JSON, return structured data.

If the output is a file, return a file.

Do not flatten everything into plain text unless plain text is the right output.

Mistake 5: No Permission Model

Agent networks without permission boundaries are risky.

Every agent should not be allowed to call every other agent with every kind of data — use authentication, authorization, and audit trails to enforce the principle of least privilege across the agent network.

When Should You Use A2A?

Use A2A when you have real agent boundaries.

Good reasons include:

agents are owned by different teams
agents are deployed as separate services
agents are built with different frameworks
agents need to discover each other
agents need to delegate tasks
tasks may be long-running
results may include artifacts
clients should not know internal tools
agent capability metadata matters

Weak reasons include:

it sounds modern
you want to call one function
you have a single-agent app
a normal API would work
MCP already solves your tool integration problem

A2A is powerful when the system is actually multi-agent; it is unnecessary ceremony when the system is not, and the cost of that ceremony — added concepts, infrastructure, debugging surface, and security requirements — is real.

A Minimal Mental Model

If you remember only one thing, remember this:

Agent Card: what the agent can do.
Message: what agents say to each other.
Part: typed content inside a message or artifact.
Task: work the agent owns.
Artifact: output the task produced.

That is the core of A2A — the rest is mostly about making those five concepts reliable, observable, and secure enough to use in real production systems.

Final Thoughts

A2A is not just another AI acronym — it is part of a larger shift from isolated assistants to interoperable agent systems. That shift will not happen everywhere at once, and many applications will remain single-agent systems with good tool access where MCP and normal APIs are entirely sufficient.

But once agents become separately deployed peers, you need stronger boundaries: discovery, task ownership, messages that carry more than text, artifacts as first-class outputs, and security, state, and observability that span agent boundaries. That is the space A2A is trying to occupy, and it is a genuinely different problem from the tool-integration problem MCP solves.

My opinion: do not start with A2A for small projects. Start with a useful agent, good tools, and clear architecture — the AI Systems cluster covers self-hosted assistants, MCP servers, and agent memory as a connected set if you want the broader context. But when your "tool" starts looking like another autonomous specialist with its own task lifecycle, it is probably not just a tool anymore — and that is when A2A becomes interesting.

Sources

A2A Protocol Specification: https://a2a-protocol.org/latest/specification/
A2A Key Concepts: https://a2a-protocol.org/latest/topics/key-concepts/
A2A Life of a Task: https://a2a-protocol.org/latest/topics/life-of-a-task/
A2A Agent Discovery: https://a2a-protocol.org/latest/topics/agent-discovery/
A2A Streaming and Async Operations: https://a2a-protocol.org/latest/topics/streaming-and-async/
A2A and MCP: https://a2a-protocol.org/latest/topics/a2a-and-mcp/

A2A vs MCP: Do AI Agents Really Need Both Protocols?

Rost — Wed, 24 Jun 2026 11:55:05 +0000

AI agent architecture is starting to split into two layers.

One layer is about giving an AI assistant access to tools, data, APIs, files, databases, search systems, calendars, ticketing systems, and other external capabilities — and that is where MCP fits.

The other layer is about getting one AI agent to discover, communicate with, delegate to, and collaborate with another AI agent, possibly built by another team, framework, vendor, or organization — and that is where A2A fits.

The annoying part is that both protocols are often discussed as if they solve the same problem, and they do not. There is overlap at the edges, and that overlap is where most of the confusion comes from. But the clean mental model is simple:

MCP is mostly agent-to-tool and A2A is mostly agent-to-agent.

That does not mean every AI system needs both. In fact, most small agent projects should probably start with MCP and ignore A2A until they have a real multi-agent boundary. But if you are building larger agent systems, especially systems with separately deployed agents, specialist agents, vendor agents, or long-running delegated tasks, A2A starts to make sense.

This article explains the difference, the overlap, the architectural tradeoffs, and when you actually need both.

What Is MCP?

MCP stands for Model Context Protocol.

It is an open protocol for connecting AI applications and agents to external tools, resources, and prompts. In practical terms, MCP lets an AI host such as a desktop assistant, IDE, coding agent, or chat application connect to one or more MCP servers.

An MCP server can expose capabilities such as:

Tools: callable functions the model can use
Resources: readable context such as files, API data, documents, or database records
Prompts: reusable prompt templates or workflows

The official MCP architecture is based on a host, client, and server model.

The MCP host is the application the user interacts with. The MCP client is the protocol component that maintains a connection to a specific MCP server. The MCP server exposes capabilities to the client.

For example, a coding assistant could connect to:

A filesystem MCP server
A GitHub MCP server
A database MCP server
A Sentry MCP server
A Slack MCP server

From the user's point of view, the assistant becomes more useful. From the system architecture point of view, the assistant has gained controlled access to external context and actions.

That is the main value of MCP: it standardizes how an AI application reaches tools and context.

MCP Is Best Understood As Tool Integration

MCP is not only about tools, but tools are the easiest way to understand it.

Without MCP, every AI application needs custom integration code for every external system. One agent framework has its own plugin format. Another has its own tool schema. Another has a different API wrapper pattern. Every integration gets rebuilt again and again.

MCP tries to reduce that waste.

If a tool provider exposes an MCP server, many MCP-compatible clients can use it. If a developer builds an MCP server for an internal system, multiple AI applications can connect to it. Practical implementation guides for MCP servers in Go and MCP servers in Python show how straightforward the integration layer can be once the protocol does the heavy lifting.

That is why MCP has become important so quickly. It solves a boring but painful integration problem.

And boring integration problems are usually where durable standards come from — the ones that survive precisely because they reduce repetitive work that everyone has to do anyway.

What Is A2A?

A2A stands for Agent2Agent Protocol.

It is an open standard for communication and interoperability between independent AI agent systems. For a deeper look at the individual building blocks — Agent Cards, task lifecycle, messages, parts, and artifacts — What Is the A2A Protocol? Agent Cards and Tasks Explained covers each concept in full detail. The official A2A specification describes the protocol as a way for agents built with different frameworks, languages, or vendors to communicate through a common interaction model.

The key phrase is independent agent systems.

A2A is not primarily about giving one assistant access to a calculator, database, or file system. It is about one agent communicating with another agent that has its own capabilities, state, policy, task model, and possibly its own tools behind the scenes.

An A2A agent can advertise what it can do through an Agent Card. Another agent or client can discover that capability, send a task, exchange messages, receive artifacts, and track the task lifecycle.

A2A introduces concepts such as:

Agent Cards
Agents and clients
Tasks
Messages
Parts
Artifacts
Task states
Streaming and asynchronous work

Taken together, these concepts make A2A feel more like an agent collaboration protocol than a simple tool invocation protocol — it is designed around the idea that agents have identity, state, and ongoing relationships with other agents.

A2A Is Best Understood As Agent Collaboration

Imagine a user asks an enterprise assistant:

"Prepare a market entry brief for Japan, include legal considerations, pricing risks, and a launch project plan."

A simple assistant could try to do everything itself. But a larger agent system might delegate pieces of the work:

A research agent gathers market information
A legal agent checks regulatory considerations
A finance agent estimates pricing risk
A project planning agent produces a delivery plan
A writing agent assembles the final brief

If those agents are all internal functions inside one codebase, you may not need A2A. You can just call functions or services directly.

But if those agents are independent systems, possibly owned by different teams or vendors, then a standard agent-to-agent protocol becomes useful.

That is the A2A use case.

A2A vs MCP: The Simple Difference

The simplest comparison is this:

Question	MCP	A2A
Main relationship	Agent to tool	Agent to agent
Main purpose	Connect AI apps to tools, data, and prompts	Let independent agents communicate and collaborate
Typical unit of work	Tool call or resource read	Task, message, artifact, delegation
Best fit	Tool integration	Multi-agent interoperability
Example	Agent calls a database tool	Research agent delegates to legal agent
Scope	Context and capability access	Agent coordination and task exchange

That table is not perfect, but it is useful for building an initial mental model. In short, MCP answers the question "How does this AI application access external capabilities?" while A2A answers "How does this agent work with another agent?"

The distinction matters because tool integration and agent collaboration have different failure modes. A bad tool call might return the wrong data or modify the wrong file, but a bad agent delegation might create an unclear chain of responsibility, leak sensitive context, loop between agents, duplicate work, or produce an artifact nobody can audit. A2A sits one level higher in the architecture, and its failure modes carry correspondingly higher consequences.

Why Developers Confuse A2A and MCP

The confusion is understandable.

Many MCP servers are not just dumb tools. Some MCP servers can perform multi-step work. Some expose high-level capabilities that look agentic. An MCP server could wrap a planning service, a retrieval system, or even another LLM-powered workflow.

At that point, the line gets blurry.

If an MCP tool named research_topic performs a complex research workflow, is it a tool or an agent?

The honest answer is: architecturally, it depends.

If the host treats it as a callable capability with a tool schema, it is functioning as a tool.

If it has its own identity, capabilities, task lifecycle, messages, artifacts, and delegation behavior, it is starting to look like an agent.

This is why "A2A vs MCP" is the wrong framing when it becomes a religious debate. The better framing is:

Is this external capability best modeled as a tool?
Or is it best modeled as an independent agent?

That decision should drive the protocol choice.

The Case For MCP Only

Most AI projects should start with MCP only — that is a slightly opinionated position, but a practical one.

If you are building a coding assistant, internal chatbot, local AI workflow, personal automation agent, or simple enterprise assistant, the first problem is usually not agent-to-agent collaboration. The first problem is tool access.

You need the assistant to read files, query databases, search docs, call APIs, open tickets, summarize logs, inspect metrics, or update records.

MCP fits that very well.

Use MCP only when:

Your agent mainly needs access to tools and data
You control the host application
You control most integrations
The external systems are not really autonomous agents
The workflow is mostly synchronous or short-running
A normal tool call is enough
You do not need agent discovery
You do not need cross-agent task state
You do not need artifacts from independent agents

For many systems, MCP plus good application architecture is enough. A lot of teams will over-engineer A2A into systems that are really just tool-using assistants, and that is not a protocol problem — it is an architecture discipline problem that no protocol can fix for you.

The Case For A2A Only

A2A-only systems are less common, but they can exist.

You might use A2A without MCP when the system is mostly about communication between agents, and each agent already manages its own tools internally.

For example:

A marketplace of specialist agents
A vendor-to-vendor agent integration
A cross-organization workflow
A multi-agent system where each agent has its own private toolchain
A delegation network where clients should not know internal tool details

In this model, A2A is the public boundary between independently managed agents. Agent A does not need to know whether Agent B uses PostgreSQL, Elasticsearch, MCP, LangChain, custom APIs, or shell scripts behind the scenes. Agent A only needs to know what Agent B can do, how to send it a task, and how to receive results.

That is a clean abstraction.

Use A2A only when:

You are exposing agents as independent services
The caller should not know the agent's internal tools
Agent capability discovery matters
Delegation is more important than direct tool access
Tasks may be long-running
Results may include artifacts
Agents may be built by different vendors or teams

A2A is strongest at system boundaries, where independently owned agents need to exchange tasks and artifacts without exposing their internal toolchains. It is not a protocol you need to wire into every layer of every agent runtime.

The Case For Using Both A2A and MCP

The most interesting architecture is not A2A vs MCP. It is A2A plus MCP.

In this pattern, an agent exposes an A2A interface to other agents, but internally uses MCP to access tools.

That gives you two clean layers:

A2A outside: how agents communicate with each other
MCP inside: how each agent accesses tools, data, and services

This is probably the most durable mental model.

A customer support agent might expose an A2A interface. Other agents can delegate support-related tasks to it. Internally, the support agent uses MCP servers for Zendesk, Slack, documentation search, CRM lookup, and internal policy retrieval.

A DevOps agent might expose an A2A interface. Other agents can ask it to investigate an incident. Internally, it uses MCP servers for Prometheus, Grafana, GitHub, Kubernetes, logs, and cloud APIs.

A finance agent might expose an A2A interface. Other agents can request budget analysis. Internally, it uses MCP servers for spreadsheets, accounting systems, invoice databases, and forecasting models.

This pattern preserves clean boundaries between agents. Other agents do not need direct access to every tool — they communicate with the specialist agent, which decides internally which tools are needed to complete the task.

That is how real organizations tend to work too. You do not give everyone direct production database access. You ask the team or service responsible for that domain.

Reference Architecture: A2A Outside, MCP Inside

A practical multi-agent architecture might look like this:

User
  |
  v
Primary assistant or orchestrator
  |
  |-- A2A --> Research agent
  |              |
  |              |-- MCP --> Web search
  |              |-- MCP --> Document store
  |
  |-- A2A --> Coding agent
  |              |
  |              |-- MCP --> GitHub
  |              |-- MCP --> Filesystem
  |              |-- MCP --> CI system
  |
  |-- A2A --> DevOps agent
                 |
                 |-- MCP --> Metrics
                 |-- MCP --> Logs
                 |-- MCP --> Kubernetes

In this design, A2A handles delegation between agents while MCP handles integration between each agent and its tools. The orchestrator does not need to know every tool available to every specialist — it only needs to know which agent is responsible for which type of work, which reduces tool overload and keeps the overall architecture more modular. For a deeper treatment of how inference, memory, routing, and tooling fit together inside a production assistant, AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability covers those layers in detail.

When A2A Is Overkill

A2A is overkill when the "other agent" is really just a function.

If your application has one LLM workflow that calls a few tools, do not add A2A just because it sounds modern. A Python function, HTTP endpoint, queue, or MCP tool may be enough.

A2A may be too much when:

There is only one agent
All components are in one codebase
The workflow is short and synchronous
You do not need discovery
You do not need independent task state
You do not need a separate agent identity
You do not expect third-party agents
You do not need vendor or framework interoperability

Protocols are not free — they add concepts, infrastructure, debugging surface, security concerns, and operational cost. A boring API or a simple function call is sometimes the better engineering choice, and reaching for A2A out of habit rather than necessity is its own kind of over-engineering. Choosing the simpler option is not anti-A2A; it is pro-architecture.

When MCP Is Not Enough

MCP starts to feel insufficient when you use it to represent things that are clearly agents.

For example, suppose an MCP server exposes a tool called:

complete_enterprise_procurement_review

That tool does the following:

Reads vendor data
Checks policy rules
Asks clarifying questions
Delegates legal review
Produces a risk report
Returns multiple artifacts
Runs for 20 minutes
Maintains task state
Requires audit history

At some point, calling that a "tool" becomes awkward because the capability is no longer a simple callable function — it is a workflow-owning specialist with its own state, delegation, and audit requirements. That is exactly where A2A becomes a better fit than stretching the tool abstraction past its natural boundary.

MCP can expose powerful tools, but it does not magically solve agent identity, peer collaboration, task ownership, delegation semantics, or multi-agent audit trails.

If those are your actual problems, you are in A2A territory.

Security: The Part Everyone Underestimates

The security model is where A2A and MCP both become serious.

MCP gives agents access to tools and data. That means an AI system may be able to read files, query databases, call APIs, send messages, update tickets, or trigger infrastructure actions.

A2A allows agents to delegate work to other agents. That means one agent may pass context, request actions, and receive artifacts from another agent.

Both are powerful. Both can be dangerous.

The main security questions are different:

For MCP:

Which tools can this agent use?
What data can it read?
What actions can it perform?
Does the user approve the action?
Can tool metadata manipulate the model?
Are local and remote servers trusted?

For A2A:

Which agents are allowed to talk to each other?
What identity does each agent have?
Can Agent A delegate authority to Agent B?
How much context can be shared?
Who is accountable for the final result?
Can the task chain be audited?

This is why "just connect everything" is a bad strategy. The more protocols you add, the more you need policy, identity, logging, approval flows, and least privilege permissions to keep the system safe and auditable.

A good production architecture should include:

Agent identity
Tool identity
User identity
Scoped permissions
Approval gates for risky actions
Task-level audit logs
Tool-call logs
Delegation logs
Artifact provenance
Rate limits
Timeout policies
Egress controls

If you are building with both A2A and MCP, security is not a bolt-on. It is part of the architecture.

Observability: You Need Traces, Not Just Logs

Multi-agent systems are hard to debug.

A user asks one question. The orchestrator calls two agents. One agent calls three tools. Another agent streams partial progress. A third agent fails and retries. The final answer looks reasonable, but nobody knows which data source influenced it.

That is not acceptable in production.

For MCP-heavy systems, you need to observe:

Tool selection
Tool arguments
Tool results
Tool latency
Tool errors
User approvals
Context injected into the model

For A2A-heavy systems, you need to observe:

Agent discovery
Task creation
Task state changes
Agent-to-agent messages
Artifacts produced
Delegation chains
Failures and retries
Final answer provenance

The more agentic the system becomes, the more important traceability becomes — plain application logs are not enough when work spans multiple agents, tool calls, and artifact handoffs. You need a task trace that follows the full execution path so that any answer can be traced back to its origin. Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production goes into the tooling and instrumentation side of this in depth.

Decision Framework: Do You Need A2A, MCP, Both, Or Neither?

Use this decision framework.

Use neither when simple code is enough

Choose normal functions, APIs, or queues when:

You control all components
There is no need for LLM-native tool discovery
There is no need for agent interoperability
The system is deterministic
The integration is stable and simple

Not every integration needs an AI protocol.

Use MCP when the agent needs tools

Choose MCP when:

The AI app needs external data
The agent needs to call tools
You want reusable integrations
You want tool discovery
You want standard client-server integration
You are building for coding agents, assistants, IDEs, or internal tools

This is the default starting point for most builders.

Use A2A when agents need peers

Choose A2A when:

Agents are independently deployed
Agents need to discover each other
Agents are built by different teams or vendors
Tasks are long-running
Delegation matters
Artifacts matter
You need an agent boundary, not just a tool boundary

This is the right choice when the unit of architecture is the agent.

Use both when specialist agents need tools

Choose both when:

Agents collaborate with each other
Each agent also needs access to tools
You want clean boundaries between delegation and execution
You want specialist agents with private internal toolchains
You want scalable multi-agent architecture

This is the most realistic enterprise pattern.

Common Anti-Patterns

Anti-Pattern 1: Turning Every Tool Into An Agent

Not every function deserves an agent wrapper.

A currency conversion API is probably a tool. A database query is probably a tool. A file reader is probably a tool.

Wrapping every small capability as an A2A agent creates unnecessary complexity.

Anti-Pattern 2: Hiding A Whole Agent Behind One MCP Tool

The opposite mistake is also common.

If an MCP tool secretly runs a long, stateful, multi-agent workflow, the MCP abstraction may become too thin. You lose visibility into task state, delegation, artifacts, and responsibility.

At that point, it may deserve an A2A boundary.

Anti-Pattern 3: Letting Every Agent Call Every Tool

This creates permission chaos.

Specialist agents should have scoped tools. A writing agent probably does not need production database access. A research agent probably does not need permission to deploy infrastructure.

Use least privilege.

Anti-Pattern 4: No Human Approval For Risky Actions

Agentic systems should not silently perform high-impact actions.

Human approval should be required for actions such as:

Sending external emails
Modifying production data
Deploying infrastructure
Deleting files
Changing permissions
Purchasing services
Sharing sensitive data

Protocols make integration easier. They do not remove accountability.

Practical Examples

Example 1: Local Coding Assistant

A local coding assistant uses MCP to access:

Filesystem
Git repository
Test runner
Package manager
Documentation search

It probably does not need A2A.

MCP is enough.

Example 2: Enterprise Support Assistant

A support assistant uses MCP to access:

CRM
Ticketing system
Documentation
Slack
Customer database

At first, MCP is enough.

Later, the company adds specialist agents:

Billing agent
Legal policy agent
Product troubleshooting agent
Escalation agent

Now A2A starts to make sense because the support assistant needs to delegate work to other agents.

Use both.

Example 3: Agent Marketplace

A platform lets third-party agents advertise capabilities and receive tasks from other agents.

The platform does not know each agent's internal implementation.

A2A is a strong fit.

Individual agents may still use MCP internally, but the public boundary is A2A.

Example 4: Data Analysis Agent

A data analysis agent queries a warehouse, reads dashboards, produces charts, and writes a report.

If it is a single agent using tools, MCP is enough.

If it delegates statistical review to one agent, business explanation to another, and compliance review to another, A2A becomes useful.

My Opinionated Take

MCP is the practical default for most builders, while A2A is the architectural boundary that larger systems grow into once they have real agent-to-agent coordination needs.

If you are building your first useful AI agent, start with MCP. The AI Systems cluster covers self-hosted assistants, MCP servers, and agent memory as a connected set, which gives a broader picture of how those pieces fit together in practice. Give the agent safe, well-scoped access to tools and data. Learn where tool descriptions break down. Learn where permissions get messy. Learn where observability is weak.

Do not start with a multi-agent fantasy architecture.

But once your system has multiple independently owned agents, A2A becomes much more interesting. It gives you a cleaner way to represent agent capabilities, task delegation, and cross-agent collaboration.

The mistake is treating A2A and MCP as competitors.

They are better understood as different layers:

MCP connects agents to capabilities.
A2A connects agents to other agents.

You can build useful systems with MCP only.

You can build agent networks with A2A only.

But the most scalable pattern is likely both: A2A for agent collaboration, MCP for tool integration.

Final Verdict: Do AI Agents Really Need Both?

Sometimes — but not always, and the answer depends almost entirely on whether your system has a genuine agent-to-agent boundary or just a collection of tool-using functions.

If your AI agent just needs tools, use MCP.

If your AI system needs independently deployed agents to collaborate, use A2A.

If your specialist agents need tools and also need to collaborate with other agents, use both.

The cleanest architecture is not "A2A vs MCP" — it is A2A at the agent boundary and MCP at the tool boundary, with each protocol handling exactly the problem it was designed for. That separation of concerns is what keeps multi-agent systems understandable, secure, and easier to evolve over time.

Sources

A2A Protocol Specification: https://a2a-protocol.org/latest/specification/
A2A and MCP comparison: https://a2a-protocol.org/latest/topics/a2a-and-mcp/
MCP introduction: https://modelcontextprotocol.io/docs/getting-started/intro
MCP architecture overview: https://modelcontextprotocol.io/docs/learn/architecture
MCP server concepts: https://modelcontextprotocol.io/docs/learn/server-concepts
Linux Foundation A2A adoption update: https://www.linuxfoundation.org/press/a2a-protocol-surpasses-150-organizations-lands-in-major-cloud-platforms-and-sees-enterprise-production-use-in-first-year

Mermaid Diagrams Quickstart and Cheatsheet for Developers

Rost — Tue, 23 Jun 2026 23:28:42 +0000

Mermaid is a text-based diagramming tool for people who would rather write diagrams than drag boxes around a canvas. It uses a Markdown-like syntax to describe flowcharts, sequence diagrams, class diagrams, state machines, timelines, Gantt charts, entity relationship diagrams, and more.

For a technical blog, Mermaid is a very good default. The diagrams live next to the article, they can be reviewed in Git, and they are easy to update when the system changes. Static image diagrams look nice until the first architecture change. Mermaid diagrams are not perfect, but they age much better.

This guide is a practical Mermaid quickstart and cheatsheet for developers, technical writers, and Hugo site owners. It is part of the Documentation Tools in 2026: Markdown, LaTeX, PDF & Printing Workflows hub.

What Is Mermaid?

Mermaid is a diagram-as-code syntax. You write a small text block, and Mermaid renders it as a diagram.

A basic Mermaid diagram looks like this:

this code:

```mermaid
flowchart TD
    A[Write Markdown] --> B[Add Mermaid block]
    B --> C[Render page]
    C --> D[Publish diagram]
```

Is producing diagram:

flowchart TD
    A[Write Markdown] --> B[Add Mermaid block]
    B --> C[Render page]
    C --> D[Publish diagram]

The important idea is simple: the source of the diagram is plain text. That makes it searchable, reviewable, portable, and easy to keep with the documentation it explains.

Why Use Mermaid in a Technical Blog?

Mermaid is useful when your article needs more than prose but less than a full design tool.

Use Mermaid when you want to explain:

Request and response flows
Deployment pipelines
Service dependencies
State transitions
Database relationships
User journeys
Build steps
Decision logic
Project timelines

I would not use Mermaid for every visual. Screenshots, hand-drawn architecture sketches, and polished marketing diagrams still have their place. But for engineering documentation, Mermaid is often the most maintainable option.

Mermaid Quickstart

Basic Markdown Usage

In Markdown, use a fenced code block with mermaid as the language:

```mermaid
flowchart LR
    A[Start] --> B[Process]
    B --> C[Done]
```

Many platforms understand this format directly. mermaid is one of the special language identifiers — alongside diff, geojson, and others — that certain renderers treat as a first-class block type rather than plain syntax highlighting. For a full breakdown of fenced block syntax and supported language identifiers, see the Markdown Code Blocks guide. For Hugo, rendering depends on your theme or site configuration. More on that later.

Test Diagrams Before Publishing

The easiest workflow is:

Write the diagram in your Markdown file.
Paste it into a Mermaid live editor or local preview.
Fix syntax errors.
Commit the Markdown source.
Check the final rendered page.

This avoids the classic problem where a diagram works in one renderer but breaks in another because of a small syntax detail.

Flowchart Syntax

Flowcharts are the most common Mermaid diagram type. Use them for workflows, algorithms, decision trees, and system steps.

Basic Flowchart

this code:

```mermaid
flowchart TD
    A[User opens website] --> B{Is user logged in?}
    B -->|Yes| C[Show dashboard]
    B -->|No| D[Show login page]
```

Is producing diagram:

flowchart TD
    A[User opens website] --> B{Is user logged in?}
    B -->|Yes| C[Show dashboard]
    B -->|No| D[Show login page]

Flowchart Directions

Mermaid flowcharts support several directions:

TD - top to bottom
TB - top to bottom
BT - bottom to top
LR - left to right
RL - right to left

Example:

this code:

```mermaid
flowchart LR
    Browser --> CDN
    CDN --> WebServer
    WebServer --> Database
```

Is producing diagram:

flowchart LR
    Browser --> CDN
    CDN --> WebServer
    WebServer --> Database

For blog articles, LR is often easier to read for architecture diagrams. For step-by-step processes, TD is usually better.

Common Node Shapes

this code:

```mermaid
flowchart TD
    A[Rectangle]
    B(Rounded rectangle)
    C{Decision}
    D((Circle))
    E[(Database)]
    F[[Subroutine]]
```

Is producing diagram:

flowchart TD
    A[Rectangle]
    B(Rounded rectangle)
    C{Decision}
    D((Circle))
    E[(Database)]
    F[[Subroutine]]

Flowchart Arrows

this code:

```mermaid
flowchart LR
    A --> B
    B --- C
    C -.-> D
    D ==> E
    E -- Label --> F
```

Is producing diagram:

flowchart LR
    A --> B
    B --- C
    C -.-> D
    D ==> E
    E -- Label --> F

Subgraphs

Use subgraphs to group related parts of a system.

this code:

```mermaid
flowchart LR
    subgraph Client
        Browser
    end

    subgraph Backend
        API
        Worker
    end

    subgraph Storage
        DB[(PostgreSQL)]
        Cache[(Redis)]
    end

    Browser --> API
    API --> DB
    API --> Cache
    API --> Worker
```

Is producing diagram:

flowchart LR
    subgraph Client
        Browser
    end

    subgraph Backend
        API
        Worker
    end

    subgraph Storage
        DB[(PostgreSQL)]
        Cache[(Redis)]
    end

    Browser --> API
    API --> DB
    API --> Cache
    API --> Worker

Subgraphs are powerful, but use them carefully. A diagram with six subgraphs and twenty arrows is usually a sign that the article needs two smaller diagrams.

Sequence Diagram Syntax

Sequence diagrams show communication between actors or services over time.

this code:

```mermaid
sequenceDiagram
    participant User
    participant App
    participant API
    participant DB

    User->>App: Click login
    App->>API: POST /login
    API->>DB: Validate credentials
    DB-->>API: User record
    API-->>App: Access token
    App-->>User: Show dashboard
```

Is producing diagram:

sequenceDiagram
    participant User
    participant App
    participant API
    participant DB

    User->>App: Click login
    App->>API: POST /login
    API->>DB: Validate credentials
    DB-->>API: User record
    API-->>App: Access token
    App-->>User: Show dashboard

Common Sequence Arrows

->      solid line without arrow
-->     dotted line without arrow
->>     solid line with arrow
-->>    dotted line with arrow
-x      solid line with cross
--x     dotted line with cross

Activation Bars

Activation bars make it clearer when a participant is doing work.

this code:

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Client->>Server: Request data
    activate Server
    Server-->>Client: Response
    deactivate Server
```

Is producing diagram:

sequenceDiagram
    participant Client
    participant Server

    Client->>Server: Request data
    activate Server
    Server-->>Client: Response
    deactivate Server

Alternatives and Conditions

this code:

```mermaid
sequenceDiagram
    participant User
    participant API
    participant Payment

    User->>API: Submit order

    alt Payment succeeds
        API->>Payment: Charge card
        Payment-->>API: Approved
        API-->>User: Order confirmed
    else Payment fails
        Payment-->>API: Declined
        API-->>User: Show error
    end
```

Is producing diagram:

sequenceDiagram
    participant User
    participant API
    participant Payment

    User->>API: Submit order

    alt Payment succeeds
        API->>Payment: Charge card
        Payment-->>API: Approved
        API-->>User: Order confirmed
    else Payment fails
        Payment-->>API: Declined
        API-->>User: Show error
    end

Sequence diagrams are excellent for API articles. They show not just what components exist, but how they talk to each other.

Class Diagram Syntax

Class diagrams are useful for domain models and object relationships.

this code:

```mermaid
classDiagram
    class User {
        +string id
        +string email
        +login()
        +logout()
    }

    class Order {
        +string id
        +float total
        +submit()
    }

    User "1" --> "*" Order
```

Is producing diagram:

classDiagram
    class User {
        +string id
        +string email
        +login()
        +logout()
    }

    class Order {
        +string id
        +float total
        +submit()
    }

    User "1" --> "*" Order

Class Relationships

<|-- inheritance
*-- composition
o-- aggregation
--> association
-- link
..> dependency
..|> realization

Example:

this code:

```mermaid
classDiagram
    Animal <|-- Dog
    Animal <|-- Cat
    User "1" --> "*" Order
    Order *-- OrderItem
```

Is producing diagram:

classDiagram
    Animal <|-- Dog
    Animal <|-- Cat
    User "1" --> "*" Order
    Order *-- OrderItem

Class diagrams can become noisy fast. In a blog post, prefer a small domain slice over a full application model.

State Diagram Syntax

State diagrams explain how something changes over time.

this code:

```mermaid
stateDiagram-v2
    [*] --> Draft
    Draft --> Review: submit
    Review --> Published: approve
    Review --> Draft: request changes
    Published --> Archived: archive
    Archived --> [*]
```

Is producing diagram:

stateDiagram-v2
    [*] --> Draft
    Draft --> Review: submit
    Review --> Published: approve
    Review --> Draft: request changes
    Published --> Archived: archive
    Archived --> [*]

Use state diagrams for:

Order lifecycles
Deployment states
Authentication flows
Background job status
Content publishing workflows

State diagrams are underrated. They often explain business logic better than a long paragraph.

Entity Relationship Diagram Syntax

Entity relationship diagrams are useful for database models.

this code:

```mermaid
erDiagram
    USER ||--o{ ORDER : places
    ORDER ||--|{ ORDER_ITEM : contains
    PRODUCT ||--o{ ORDER_ITEM : appears_in

    USER {
        string id
        string email
    }

    ORDER {
        string id
        datetime created_at
    }

    PRODUCT {
        string id
        string name
    }
```

Is producing diagram:

erDiagram
    USER ||--o{ ORDER : places
    ORDER ||--|{ ORDER_ITEM : contains
    PRODUCT ||--o{ ORDER_ITEM : appears_in

    USER {
        string id
        string email
    }

    ORDER {
        string id
        datetime created_at
    }

    PRODUCT {
        string id
        string name
    }

ER Relationship Markers

||  exactly one
o|  zero or one
}|  one or more
}o  zero or more

ER diagrams are best when they explain relationships, not every column. Keep implementation details in migrations or schema docs.

Gantt Chart Syntax

Gantt charts are useful for project timelines.

this code:

```mermaid
gantt
    title Documentation Migration Plan
    dateFormat  YYYY-MM-DD

    section Planning
    Audit current docs      :a1, 2026-06-01, 5d
    Define structure        :a2, after a1, 3d

    section Writing
    Rewrite guides          :b1, after a2, 10d
    Review and publish      :b2, after b1, 4d
```

Is producing diagram:

gantt
    title Documentation Migration Plan
    dateFormat  YYYY-MM-DD

    section Planning
    Audit current docs      :a1, 2026-06-01, 5d
    Define structure        :a2, after a1, 3d

    section Writing
    Rewrite guides          :b1, after a2, 10d
    Review and publish      :b2, after b1, 4d

Gantt charts are helpful in internal planning posts, but they can age quickly. Use them when the timeline itself is the point.

Timeline Syntax

Timelines are good for release histories, incident writeups, and project summaries.

this code:

```mermaid
timeline
    title API Evolution
    2024 : REST API launched
    2025 : Webhooks added
    2026 : Event streaming introduced
```

Is producing diagram:

timeline
    title API Evolution
    2024 : REST API launched
    2025 : Webhooks added
    2026 : Event streaming introduced

Use a timeline when order matters more than dependency. If what you care about is the sequence of events rather than how they causally connect, a timeline keeps the focus where it belongs and stays easy to read at a glance.

Pie Chart Syntax

Pie charts are supported, but be careful. They are easy to read when there are only a few categories and the values are clearly different.

this code:

```mermaid
pie title Build Time by Step
    "Install dependencies" : 35
    "Run tests" : 45
    "Build assets" : 20
```

Is producing diagram:

pie title Build Time by Step
    "Install dependencies" : 35
    "Run tests" : 45
    "Build assets" : 20

Opinionated advice: if the values are close or there are more than five categories, use a table instead. A well-formatted table communicates precise numbers more honestly than a pie chart where the slices look nearly identical.

Git Graph Syntax

Git graphs can explain branching strategies and release flows.

this code:

```mermaid
gitGraph
    commit
    branch feature
    checkout feature
    commit
    commit
    checkout main
    merge feature
    commit
```

Is producing diagram:

gitGraph
    commit
    branch feature
    checkout feature
    commit
    commit
    checkout main
    merge feature
    commit

This is useful for articles about Git workflows, trunk-based development, release branches, and hotfixes. If you need a quick reference for the underlying branching commands, the GIT Cheatsheet covers the most common ones alongside merge and rebase workflows.

Mermaid Cheatsheet

Diagram Types

flowchart TD
sequenceDiagram
classDiagram
stateDiagram-v2
erDiagram
gantt
timeline
pie
gitGraph
mindmap
journey

Flowchart Basics

flowchart TD
A[Text] --> B[Text]
A -->|Label| B
A -.-> B
A ==> B
A --- B

Flowchart Shapes

A[Rectangle]
A(Rounded)
A{Decision}
A((Circle))
A[(Database)]
A[[Subroutine]]
A>Flag]

Sequence Diagram Basics

sequenceDiagram
participant A
participant B
A->>B: Message
B-->>A: Reply
activate B
deactivate B

Sequence Blocks

alt condition
else other condition
end

opt optional step
end

loop each item
end

par parallel task
and another task
end

Class Diagram Basics

classDiagram
class User
class Order
User --> Order
User "1" --> "*" Order

State Diagram Basics

stateDiagram-v2
[*] --> Idle
Idle --> Running
Running --> Done
Done --> [*]

ER Diagram Basics

erDiagram
USER ||--o{ ORDER : places
ORDER ||--|{ ORDER_ITEM : contains

Comments

Mermaid supports comments with %%.

this code:

```mermaid
flowchart TD
    %% This is a comment
    A --> B
```

Is producing diagram:

flowchart TD
    %% This is a comment
    A --> B

Using Mermaid in Hugo

Hugo content is usually written in Markdown, so Mermaid fits naturally into a Hugo-based technical blog. The exact setup depends on your theme and Markdown rendering configuration.

The common authoring pattern is still the same:

```mermaid
flowchart LR
    Markdown --> Hugo
    Hugo --> HTML
    HTML --> Browser
```

If your Hugo theme already supports Mermaid, this may render without extra work. If it does not, you usually need a render hook, shortcode, partial, or theme configuration that loads Mermaid on pages containing Mermaid diagrams.

A practical Hugo setup should aim for these rules:

Keep Mermaid source inside normal Markdown articles.
Load Mermaid only on pages that need it.
Avoid global JavaScript if most pages do not use diagrams.
Test diagrams during local preview.
Keep the diagram source readable in Git.

For a technical blog, fenced code blocks are usually better than custom shortcodes because they are more portable. If you later move content to GitHub, another static site generator, or a documentation platform, standard fenced Mermaid blocks are easier to reuse.

Mermaid Best Practices

Keep Diagrams Small

A diagram should clarify the article, not replace it. If readers need to zoom, the diagram is probably too large.

Good diagrams usually have:

One idea
Clear direction
Short labels
Few crossing lines
Consistent naming

Prefer Multiple Small Diagrams

Instead of one huge system diagram, use several focused diagrams:

Request flow
Deployment topology
Data model
State lifecycle
Failure path

This is better for readers and better for mobile screens.

Use Stable Names

Use names that match your code, API, or documentation. Do not call the same thing API, Backend, and Server in different diagrams unless those are truly different concepts.

Label Important Arrows

Unlabeled arrows are fine for simple flowcharts. In system diagrams, labels often matter.

this code:

```mermaid
flowchart LR
    Web -->|HTTPS request| API
    API -->|SQL query| DB
    API -->|publish event| Queue
```

Is producing diagram:

flowchart LR
    Web -->|HTTPS request| API
    API -->|SQL query| DB
    API -->|publish event| Queue

Avoid Clever Syntax

Mermaid can do many things. That does not mean every article needs them. Favor syntax that a future maintainer can understand quickly.

Quote Labels When Needed

If a label contains characters that confuse Mermaid, wrap it in quotes.

this code:

```mermaid
flowchart TD
    A["User clicks /checkout"] --> B["POST /api/orders"]
```

Is producing diagram:

flowchart TD
    A["User clicks /checkout"] --> B["POST /api/orders"]

This is a small habit that prevents annoying rendering failures.

Think About Dark Mode

Many Hugo sites support dark mode. Make sure your Mermaid theme or site CSS keeps diagrams readable in both light and dark appearances.

Common Mermaid Mistakes

Mistake 1: Too Much Detail

Bad Mermaid diagrams often try to show every edge case. That makes them technically complete and practically unreadable. The fix is almost always the same: split the diagram into two or three smaller ones, each covering one concern, so readers can follow the logic without having to trace a dozen crossing arrows.

Mistake 2: Long Labels

Long labels create wide boxes and ugly layouts.

Instead of this code:

```mermaid
flowchart TD
    A[The user submits the registration form with their email address and password]
```

Is producing diagram:

flowchart TD
    A[The user submits the registration form with their email address and password]

Prefer this code:

```mermaid
flowchart TD
    A[Submit registration form]
```

Is producing diagram:

flowchart TD
    A[Submit registration form]

Explain details in the paragraph below the diagram.

Mistake 3: Unclear Direction

Pick a direction and stick with it. Most process diagrams should use TD. Most architecture diagrams are easier with LR.

Mistake 4: Treating Mermaid as a Design Tool

Mermaid is not Figma. It is not meant for pixel-perfect diagrams, and trying to force it into that role will only lead to frustration. Its strength is maintainability, not visual perfection — and that trade-off is intentional.

Mermaid SEO Tips for Technical Blogs

Mermaid diagrams can make technical articles more useful, but search engines still need text. Do not rely on diagrams alone.

For SEO-friendly Mermaid articles:

Use descriptive H2 and H3 headings.
Explain each diagram in nearby text.
Include the important keywords in normal prose.
Keep code examples copyable.
Add alt-style explanation below complex diagrams.
Use concise front matter title and description.
Avoid hiding all meaning inside the rendered SVG.

A Mermaid diagram should support the article. It should not be the only place where important information exists.

Copy-Paste Mermaid Examples

API Request Flow

this code:

```mermaid
sequenceDiagram
    participant Client
    participant API
    participant Auth
    participant DB

    Client->>API: GET /account
    API->>Auth: Validate token
    Auth-->>API: Token valid
    API->>DB: Load account
    DB-->>API: Account data
    API-->>Client: 200 OK
```

Is producing diagram:

sequenceDiagram
    participant Client
    participant API
    participant Auth
    participant DB

    Client->>API: GET /account
    API->>Auth: Validate token
    Auth-->>API: Token valid
    API->>DB: Load account
    DB-->>API: Account data
    API-->>Client: 200 OK

CI Pipeline

this code:

```mermaid
flowchart TD
    A[Push commit] --> B[Install dependencies]
    B --> C[Run lint]
    C --> D[Run tests]
    D --> E[Build site]
    E --> F[Deploy]
```

Is producing diagram:

flowchart TD
    A[Push commit] --> B[Install dependencies]
    B --> C[Run lint]
    C --> D[Run tests]
    D --> E[Build site]
    E --> F[Deploy]

This pattern maps naturally to a real CI configuration. For the step-by-step syntax of GitHub Actions workflows, the GitHub Actions Cheatsheet is a handy companion when you want to turn the diagram above into a working pipeline.

Publishing Workflow

this code:

```mermaid
stateDiagram-v2
    [*] --> Draft
    Draft --> Editing
    Editing --> Review
    Review --> Published
    Review --> Editing
    Published --> [*]
```

Is producing diagram:

stateDiagram-v2
    [*] --> Draft
    Draft --> Editing
    Editing --> Review
    Review --> Published
    Review --> Editing
    Published --> [*]

Simple Data Model

this code:

```mermaid
erDiagram
    AUTHOR ||--o{ POST : writes
    POST ||--o{ COMMENT : receives

    AUTHOR {
        string id
        string name
    }

    POST {
        string id
        string title
        datetime published_at
    }

    COMMENT {
        string id
        string body
    }
```

Is producing diagram:

erDiagram
    AUTHOR ||--o{ POST : writes
    POST ||--o{ COMMENT : receives

    AUTHOR {
        string id
        string name
    }

    POST {
        string id
        string title
        datetime published_at
    }

    COMMENT {
        string id
        string body
    }

When Not to Use Mermaid

Do not use Mermaid when:

The diagram needs precise visual layout.
The design must match a brand system exactly.
The visual is mostly decorative.
The diagram has too many nodes to read.
A screenshot would explain the point better.
The content changes rarely and needs polish more than maintainability.

Mermaid is excellent for living technical documentation. It is less good for presentation-grade artwork. For document-quality diagrams in print or PDF contexts, LaTeX offers packages like TikZ and pgfplots that give you far greater layout control — the LaTeX Cheat Sheet covers diagram inclusion alongside the rest of the LaTeX toolkit.

Final Thoughts

Mermaid is one of the best tools for technical blogging because it respects how developers already work: text files, Markdown, Git, code review, and repeatable builds. For everything around the diagrams — headings, lists, tables, code blocks — the Markdown Cheatsheet is the quick-reference companion to keep alongside this guide.

The best Mermaid diagrams are not the most complex ones. They are the diagrams that make a concept obvious and remain easy to edit six months later.

Use Mermaid for the diagrams that should live with your documentation. Keep them small, keep them readable, and treat them as part of the source code of your article.

Implementing CQRS in Go: A Practical Guide to Scalable Architecture

Rost — Tue, 23 Jun 2026 23:28:33 +0000

CQRS is one of those patterns that gets oversold, overcomplicated, and occasionally misdiagnosed as a cure for plain old CRUD boredom.

The useful version is much simpler: separate the code that changes state from the code that reads state, then let each side evolve for its own job. Martin Fowler describes CQRS as using a different model to update information than the one used to read it, while also warning that for most systems it adds risky complexity. Microsoft makes the same core point in more operational terms: separate read and write models so each can be optimised independently.

If you work in Go, that idea maps unusually well to the language. Go is good at explicit boundaries, small interfaces, boring data types, and use-case oriented packages. That makes basic CQRS in Go much less theatrical than it often looks in conference slides. You do not need event sourcing, Kafka, or three databases to start. In fact, both Microsoft's CQRS guidance and Three Dots Labs' Go examples show that a simple implementation can share the same underlying store, with separate command and query handlers added first and fancier infrastructure introduced only when the problem actually demands it.

What CQRS Actually Means

At the core, CQRS draws a hard line between commands and queries. A query reads data and should not modify the system's state. A command changes state and should not return domain data as its main result. Three Dots Labs phrase this in practical Go terms: queries return data and commands make changes, with errors being a normal command result. That is the basic move. Everything else is optional.

A common misunderstanding is that CQRS automatically means separate databases, asynchronous projections, or event sourcing. That is not true. Microsoft's pattern guide explicitly treats separate data stores as the more advanced form, not the default one, and Three Dots Labs show a Go implementation where queries read from the same database as writes because that is sufficient for the system at hand. If your article only teaches one thing clearly, make it this: CQRS is primarily a modelling and application-structure choice, not a mandatory distributed systems package deal.

The other important detail is naming. Commands should model business intent, not storage mutations. Microsoft's example contrasts "Book hotel room" with "Set ReservationStatus to Reserved", and Three Dots Labs recommend names close to the way domain experts speak, such as "ScheduleTraining" or "CancelTraining" rather than generic "Create" and "Delete" verbs. In Go, that naming discipline pays off because command names often become type names, handler names, and package boundaries.

Why Teams Reach for It

CQRS becomes attractive when a single CRUD model starts doing too many jobs badly. Microsoft's guidance lists the usual pressure points: the read and write representations of the same data diverge, concurrent updates create lock contention, read performance suffers under query complexity, and shared entities turn security rules into a tangle. In other words, the problem is not that CRUD is morally wrong. The problem is that one model is being forced to satisfy incompatible concerns at once.

That is especially common in technical products. Writes tend to care about validation, invariants, transactions, and business rules. Reads tend to care about filters, joins, aggregation, caching, sorting, and serving exactly the shape a page or API needs. CQRS lets the write side stay strict and domain-oriented while the read side stays pragmatic and DTO-oriented. Microsoft explicitly recommends a write model focused on validation and consistency, and a read model focused on DTOs or projections optimised for presentation and responsiveness.

There is also a team-level benefit. Three Dots Labs argue that splitting commands and queries improves decoupling, makes execution flow clearer, and speeds up onboarding because developers can inspect a small list of available commands and queries rather than chase logic through random service layers. Microsoft similarly notes CQRS is especially useful in collaborative environments where multiple users update the same data and commands need enough granularity to prevent or resolve conflicts.

My slightly opinionated take is this: most teams adopt CQRS too late, after one "service" has already turned into a soft-centred monolith. But plenty of teams also adopt it too early, mostly because the architecture diagram looked expensive and therefore serious. The right moment is when reads and writes are clearly drifting apart in shape, speed, or rules, not when your todo app has aspirations.

The Benefits and the Bill

Basic CQRS has real benefits even before you add any messaging or separate stores. It gives you smaller command models, smaller query models, clearer use cases, and more obvious places to apply cross-cutting concerns like logging and instrumentation. Three Dots Labs explicitly call out better code organisation, decoupling, and simpler models as immediate wins, while Microservices.io highlights simpler command and query models and support for denormalised, scalable read views.

Once the problem justifies it, CQRS also opens the door to stronger read-side optimisation. Microsoft's guidance notes that separate read models can use DTOs, projections, read-only replicas, or even a different storage technology entirely. It also points to materialised views as a way to avoid heavy joins and ORM-heavy query paths. If you are evaluating which data access layer to use on the write side, Comparing Go ORMs for PostgreSQL covers the trade-offs between GORM, Ent, Bun, and sqlc in practical terms. That is where CQRS starts paying off operationally, not just structurally.

The cost is equally real. Fowler's warning is still the right starting point: for most systems CQRS adds risky complexity. Microsoft lists increased complexity and eventual consistency as core considerations, while Microservices.io adds potential code duplication and replication lag in read views. If you split stores, you also inherit the job of keeping them in sync, usually through events, without relying on a tidy distributed transaction between your database and broker.

Event sourcing does not remove that bill; it changes the shape of it. Microsoft's CQRS guidance says event sourcing can make the event store the single source of truth and let you rebuild materialised views by replaying history, while Event Horizon points to traceability and audit logging as major benefits. But Microsoft also warns that view generation, replay, and event handling add more design complexity, and suggests snapshots to reduce replay costs. That is why I prefer to explain event sourcing as "CQRS plus a second difficult decision", not as the entry ticket.

A useful rule of thumb worth keeping in mind is that basic CQRS is cheap while distributed CQRS is expensive, and conflating the two conversations is one of the most common ways teams end up with far more complexity than the problem ever required.

A Simple CQRS Implementation in Go

A sensible first step in Go is to keep one database and split only the application layer. Commands own business rules and persistence. Queries return read models shaped for callers. This is exactly the sort of basic CQRS that Three Dots Labs recommend before reaching for asynchronous buses or separate read stores.

Start with commands

package blog

import (
    "context"
    "errors"
    "time"
)

type PublishPostCommand struct {
    Title   string
    Slug    string
    BodyMD  string
    Author  string
}

type PostRepository interface {
    NextID(ctx context.Context) (string, error)
    Save(ctx context.Context, post Post) error
}

type Post struct {
    ID          string
    Title       string
    Slug        string
    BodyMD      string
    Author      string
    PublishedAt time.Time
}

type PublishPostHandler struct {
    Repo  PostRepository
    Now   func() time.Time
}

func (h PublishPostHandler) Handle(ctx context.Context, cmd PublishPostCommand) error {
    if cmd.Title == "" || cmd.Slug == "" || cmd.BodyMD == "" {
        return errors.New("title, slug, and body are required")
    }

    id, err := h.Repo.NextID(ctx)
    if err != nil {
        return err
    }

    post := Post{
        ID:          id,
        Title:       cmd.Title,
        Slug:        cmd.Slug,
        BodyMD:      cmd.BodyMD,
        Author:      cmd.Author,
        PublishedAt: h.Now(),
    }

    return h.Repo.Save(ctx, post)
}

This handler does not try to serve a page, shape a list response, or optimise SQL for a card grid. It just enforces intent and persists a valid aggregate. That is the command side doing one job well.

Add queries

package blog

import "context"

type PostView struct {
    ID          string
    Title       string
    Slug        string
    Author      string
    PublishedAt string
    Excerpt     string
}

type LatestPostsQuery struct {
    Limit int
}

type PostReadModel interface {
    Latest(ctx context.Context, limit int) ([]PostView, error)
    BySlug(ctx context.Context, slug string) (PostView, error)
}

type LatestPostsHandler struct {
    ReadModel PostReadModel
}

func (h LatestPostsHandler) Handle(ctx context.Context, q LatestPostsQuery) ([]PostView, error) {
    limit := q.Limit
    if limit <= 0 {
        limit = 10
    }
    return h.ReadModel.Latest(ctx, limit)
}

type GetPostBySlugQuery struct {
    Slug string
}

type GetPostBySlugHandler struct {
    ReadModel PostReadModel
}

func (h GetPostBySlugHandler) Handle(ctx context.Context, q GetPostBySlugQuery) (PostView, error) {
    return h.ReadModel.BySlug(ctx, q.Slug)
}

Notice the read side returns a PostView, not the write model. That mirrors Microsoft's recommendation that the read model be optimised for DTOs and presentation, while the write model is tuned for transactional integrity and domain rules.

Wire it like a Go application, not a shrine

package app

import "your/module/internal/blog"

type Application struct {
    Commands Commands
    Queries  Queries
}

type Commands struct {
    PublishPost blog.PublishPostHandler
}

type Queries struct {
    LatestPosts   blog.LatestPostsHandler
    GetPostBySlug blog.GetPostBySlugHandler
}

That shape is not accidental. Three Dots Labs use a very similar pattern in Wild Workouts: an Application type exposing Commands and Queries, with concrete handlers wired from separate app/command and app/query packages. Their service composition code imports those packages separately and constructs a single application object from them. It is a clean, Go-ish way to make the boundary obvious without Framework Drama. If your dependency graph grows complex as handlers multiply, Dependency Injection in Go covers Wire, Dig, and constructor injection patterns that compose naturally with this handler-based structure.

If you later need asynchronous commands, cross-service events, or a denormalised search index, you can add them from this baseline. Three Dots Labs explicitly present asynchronous command buses and separate query databases as later optimisations, not the starting point.

Go Libraries Worth Knowing

The Go CQRS ecosystem is narrower than the .NET one, which is honestly a blessing. You can survey the real options in an afternoon and avoid adopting three abstractions you do not need.

Watermill

Watermill is the clearest modern choice when you want CQRS plus messaging. Its CQRS component is a high-level API that lets you work with Go structs rather than raw messages, and its building blocks include an EventBus, EventProcessor, CommandBus, and CommandProcessor. The docs also cover event handler groups for ordered processing on shared topics, a read-model example, and custom marshaling metadata. Outside the CQRS layer, Watermill supports a wide range of pub/sub back ends including RabbitMQ, Kafka, NATS Jetstream, Redis Streams, Google Cloud Pub/Sub, SQL, HTTP, and others. Pkg.go.dev marks Watermill as production-ready with a stable public API since v1.0.0, and the current published module version is v1.5.2, with GitHub listing that release on 13 May.

commandBus, err := cqrs.NewCommandBusWithConfig(pub, cfg)
eventBus, err := cqrs.NewEventBusWithConfig(pub, cfg)
commandProcessor, err := cqrs.NewCommandProcessorWithConfig(router, cfg)
eventProcessor, err := cqrs.NewEventProcessorWithConfig(router, cfg)

Use Watermill when commands and events need to cross process boundaries, when you want retries and redelivery semantics to be first-class, or when you know your "simple" service is already halfway to event-driven reality. The downside is that you are now having broker, topic, ordering, and idempotency conversations whether you wanted to or not. That is not a flaw in Watermill. That is the cost of the problem space.

Event Horizon

Event Horizon is a CQRS and event sourcing toolkit for Go. Its maintainers describe it as used in production systems, but also note that the API is not final. The toolkit provides aggregate, command, and event registration helpers, official event store implementations for memory and MongoDB variants, projection and repository support, and examples that include an outbox-pattern based application. The release stream is still active, with GitHub showing v0.17.0 on 16 June and earlier releases adding features such as snapshots, retryable projections, persistent command scheduling, and the outbox pattern.

eh.RegisterAggregate(func(id uuid.UUID) eh.Aggregate {
    return &InvoiceAggregate{ID: id}
})

eh.RegisterCommand(func() eh.Command {
    return &CreateInvoiceCommand{}
})

Event Horizon makes the most sense when event sourcing is the point, not an optional future extension. If you want audit-friendly streams, replayable history, projections, and an event-store centric model, it is a serious option. If you only want cleaner application services in a monolith, it is probably more machinery than you need. The "API is not final" note also means you should budget for a little more adaptation over time than you would with Watermill.

Go-MediatR

Go-MediatR is not a full CQRS framework, but it is useful for in-process CQRS. Its README describes it as a mediator pattern implementation used with CQRS, with request/response dispatch for commands and queries, notification dispatch for events, and pipeline behaviours for cross-cutting concerns. The project also has tagged releases, with GitHub listing v1.4.0 as the latest release and calling out thread-safe handler registration and concurrency-related improvements.

resp, err := mediatr.Send[*CreateProductCommand, *CreateProductResponse](ctx, cmd)
post, err := mediatr.Send[*GetPostBySlugQuery, *PostView](ctx, query)

This is a good fit if you want handler-based commands and queries, but not a broker, projection engine, or event store. It is especially friendly for teams coming from MediatR in .NET. The trade-off is equally clear: you still have to design your own persistence, read-model refresh strategy, and out-of-process integration story. In other words, it gives you the application boundary, not the whole architecture.

Older frameworks and reference material

There are older Go CQRS libraries that are still instructive, but I would treat them as reference material before I treated them as greenfield defaults.

jetbasrawi/go.cqrs describes itself as a Go CQRS reference implementation with sample applications based on Greg Young's principles. However, pkg.go.dev shows no valid go.mod, no tagged version, and no stable version, while GitHub shows no releases and the package metadata was published 7.4 years ago. That is useful history, not a strong signal for a fresh production adoption in 2026.

andrewwebber/cqrs is similar: it provides event sourcing, command issuing and processing, event publishing, and read-model generation from published events, but the package metadata was also published 7.4 years ago. I would absolutely read it if you want to understand how earlier Go CQRS libraries approached the problem. I would be cautious about making it the foundation of a new codebase unless you are happy becoming part-time maintainer of your own architecture stack.

A Practical Go Project Layout

A typical Go CQRS layout should make use cases obvious, not bury them under generic abstractions. Wild Workouts is a good reference here. The repository separates bounded contexts under internal, keeps commands and queries in distinct application packages, and wires them into an Application type exposing Commands and Queries. Service composition pulls together adapters, handlers, and dependencies explicitly. The patterns described here align with the broader guidance in Go Project Structure: Practices & Patterns, which covers the wider set of layout decisions teams face as Go codebases grow.

A pragmatic layout looks like this:

internal/
  blog/
    app/
      app.go
      command/
        publish_post.go
        unpublish_post.go
      query/
        get_post_by_slug.go
        latest_posts.go
    domain/
      post.go
      slug.go
    adapters/
      postgres/
        post_repository.go
        post_read_model.go
    ports/
      http/
        handler.go
    service/
      application.go

This layout has a few advantages.

First, command and query handlers live close to the use cases they implement. That makes it harder to hide business behaviour in repositories or handlers named after transport layers. Three Dots Labs do this directly in Wild Workouts, where app/command and app/query are separate packages and the top-level Application groups handlers by responsibility.

Second, the domain package can stay focused on invariants and behaviour, while the query side is free to return DTOs and projections. That aligns with Microsoft's write-model and read-model guidance and avoids the common CQRS anti-pattern where the query side is forced back through domain objects just for ideological purity.

Third, this structure scales from the smallest useful CQRS to heavier variants. You can keep one PostgreSQL database and two repository implementations today, then add a search index or event-driven read projection later without having to rewrite the entire application shape. Three Dots Labs explicitly describe that progression from basic CQRS to asynchronous command buses and separate query stores only when the system needs them.

When CQRS Fits and When It Does Not

CQRS makes sense when reads and writes are truly different problems. Microsoft recommends it for workloads where read and write models need independent optimisation, where multiple users collaborate on the same data, and where clear separation helps with performance, scalability, and security. Microservices.io adds another classic fit: denormalised, high-performance views built from domain events or materialised projections. Three Dots Labs also point to complex business logic, maintainability, and future extension toward asynchronous commands or specialised read stores as strong reasons to adopt it in Go.

In practice, that often means systems with rich domain rules, expensive read models, reporting views that do not map neatly to aggregates, or microservices that publish events and build projections elsewhere. In those contexts, the Saga pattern for distributed transactions often appears alongside CQRS as the coordination mechanism for multi-step business operations that span service boundaries. It also fits products where the write side must be strict and auditable while the read side must be fast and shaped for UI or API consumption. If you are already talking about projections, replicas, or rebuilding views from events, you are probably in CQRS territory whether you use the label or not.

CQRS does not make sense when your service is a straightforward data editor. Fowler says outright that for most systems CQRS adds risky complexity, and Three Dots Labs say simple CRUD services that receive and return essentially the same data are not a good fit. In their own Wild Workouts example, a simpler users service does not use Clean Architecture and CQRS because the patterns would not pay their rent there.

That is the part worth saying plainly in a technical blog: CQRS is not a maturity badge but a deliberate trade, and it only makes sense when you actually need what it gives you. If your admin panel writes rows and reads the same rows back, do not separate the model just because you can. If your command handlers are mostly "set field X on record Y", you do not have a CQRS problem. You have a normal application, and that is perfectly respectable software.

Closing Thoughts

The best way to implement CQRS in Go is to start with the boring version. Split command handlers from query handlers. Let commands model business intent. Let queries return read models. Keep the same database if that is all you need. Then, only when the system forces your hand, add asynchronous buses, projections, separate stores, or event sourcing. That progression is consistent with Fowler's warning about complexity, Microsoft's staged CQRS guidance, and the pragmatic Go examples from Three Dots Labs.

If you need a library, Watermill is the strongest general-purpose choice for message-driven CQRS in Go, Event Horizon is compelling when event sourcing is the centre of gravity, and Go-MediatR is a good light touch when you only need in-process command and query dispatch. Everything else should earn its place very carefully. For a broader map of code structure, integration, and data access patterns in production Go systems, the App Architecture guide is a useful companion.

That, in the end, is the most Go-like answer to CQRS: use the pattern, not the costume.

Digital Gardens: Grow Knowledge Instead of Just Publishing It

Rost — Mon, 22 Jun 2026 11:47:24 +0000

The dominant model for publishing knowledge online has not changed much since the early 2000s: write something, polish it, publish it, move on. Blog posts are finished when they are published.

That model creates a hidden cost. The knowledge that does not make it into a finished piece — the half-formed ideas, the developing hypotheses, the notes that are useful but not polished — stays private. Publicly, you appear to know only what you have been willing to finalize and ship.

Digital gardens are a different publishing philosophy. Instead of treating knowledge as a series of finished articles, a garden treats it as an evolving network of ideas at different stages of development. Some notes are rough seedlings. Some are well-developed and stable. All of them are public, linked, and growing.

The term gained momentum through writers like Maggie Appleton, who documented the history and practice of digital gardening, and Andy Matuschak, whose public evergreen notes embody the philosophy. For engineers who write technically, it offers an alternative to the pressure of the polished post.

The Garden Metaphor

The gardening metaphor is specific, not decorative.

A traditional blog is agriculture. You plant a crop, grow it to maturity, harvest it (publish), and the field is ready for the next planting. The previous crop is gone. Posts decay in chronological order, replaced by newer ones.

A digital garden is horticulture. You plant things, tend them, some grow faster than others, some get pruned, some survive for years. Nothing is harvested and discarded — it persists and develops.

The practical implication: garden content is organized by connection and stage of development, not by publication date. You navigate by following links, not scrolling backward through time.

Growth Stages

The most practical feature of a digital garden is the idea of visible growth stages. Instead of binary published/draft status, garden notes exist on a spectrum:

Seedling — a rough idea, a question, or a brief note that might grow into something. Published, but clearly labeled as incomplete. A seedling signals to the reader: "this exists, you might find it interesting, it is not finished."

Growing — a developing note with real content, links to other notes, and an emerging structure. Worth reading, but still actively being refined.

Mature — a stable, well-developed note that has been revisited multiple times and is unlikely to change substantially. Mature notes are the evergreen core of the garden.

Archived — notes that have been superseded, merged into a better note, or no longer represent current thinking. Kept for historical context rather than current use.

The stages can be whatever labels you choose. The important behavior is that they are visible to readers. Showing the stage communicates honesty about the state of the knowledge, and it removes the pressure to polish everything before sharing it.

Digital Garden vs Blog vs Wiki

These three publishing models are often confused or conflated. They have genuinely different purposes.

Property	Blog	Wiki	Digital Garden
Organization	Chronological	Hierarchical	Networked
Content state	Finished	Collaborative	Evolving
Navigation	Feed / archive	Category / search	Links / graph
Voice	Editorial	Institutional	Personal
Updates	New posts replace old	Pages updated in place	Notes refined continuously

A blog is best for finished, time-stamped writing — announcements, tutorials, experience reports that are complete at publication.

A wiki is best for shared, maintained reference material — team runbooks, product documentation, institutional knowledge that many people contribute to.

A digital garden is best for personal knowledge that evolves — developing ideas, technical thinking in progress, cross-linked concepts that grow more connected over time.

The three are not mutually exclusive. A site can have a blog for polished articles, a wiki for shared reference, and a garden for personal developing knowledge. Many technically-oriented sites run exactly this combination.

Gardening for Engineers

The digital garden model has specific advantages for technical writing.

Technical Knowledge Evolves

A 2021 article about Kubernetes ingress controllers is outdated by 2024. A 2021 article about distributed tracing concepts is still largely accurate. Technical content ages at different rates depending on whether it describes concepts or configurations.

Garden notes can model this explicitly. A note about tracing concepts might be labeled Mature and linked from a note about OpenTelemetry implementation that is labeled Growing — the concept is stable, the tool-specific implementation is evolving. The reader can see the difference at a glance.

Thinking in Progress Is Valuable

Engineers often have half-formed but useful thinking: a hypothesis about why a system behaves a certain way, a developing opinion about an architecture tradeoff, an emerging pattern across several production incidents.

Under the blog model, that thinking stays private until it is polished enough to publish. Under the garden model, it can be shared as a seedling, visible to collaborators and readers who might contribute to its development.

Links Replace Duplication

Technical concepts recur. Idempotency applies to payment APIs, job queues, distributed transactions, and HTTP APIs. Under the blog model, each article that needs to explain idempotency either duplicates the explanation or cross-references an old post that is increasingly out of date.

In a garden, one note about idempotency can be linked from every context where it applies. The note is maintained once and improves with each link.

Implementing a Digital Garden

Adding Status Fields

The simplest garden implementation adds a status field to existing content. In Hugo, this is a frontmatter field:

---
title: "Write-through caching improves read consistency"
status: "growing"
---

You can then use this in templates to show a visible indicator — a badge, a color, a note in the header — that communicates the note's development stage to the reader.

Status values can be simple:

# status options
status: seedling     # rough, early-stage
status: growing      # developing, has structure
status: mature       # stable, well-developed
status: archived     # no longer current

Linking as the Primary Navigation

A garden navigates by links, not by date or category. Every note should link to at least two or three related notes. The link is not decorative — it is the primary way a reader discovers related content.

In a Hugo site, this is standard internal linking. In Obsidian Publish or Quartz, the graph view makes the link network visible. Even without a graph view, consistent internal linking gives readers a navigable web.

The habit: every time you write or update a note, add at least one new link that did not exist before.

Graph View

A graph view renders the link network visually. Tools like Obsidian Publish and Quartz include one by default. It makes visible which notes are well-connected (a sign of mature, integrated thinking) and which are isolated (a sign of underdeveloped seedlings or missing links).

For engineers, graph views are familiar — the mental model is similar to a dependency graph or a call graph. Dense clusters represent strong conceptual areas. Isolated nodes are knowledge gaps.

Hugo Implementation

For sites already running Hugo, a garden layer is a small addition. The key pieces are:

A status field in frontmatter
A template partial that renders a visible status badge
Internal links that connect related pages
An optional JavaScript graph widget (D3 or Cytoscape) that renders the link network

A minimal frontmatter addition:

---
title: "Partial indexes reduce write overhead for subset queries"
status: "mature"
lastmod: "2026-06-18"
---

A partial that surfaces the badge:

{{ with .Params.status }}
<span class="garden-status garden-status--{{ . }}">{{ . }}</span>
{{ end }}

The result: every page shows its development stage, and readers understand they are navigating a living knowledge base rather than a finished archive.

The Tension Between Garden and Blog

Running a digital garden alongside a blog creates a useful tension that most published technical writers encounter.

The blog demands finished, polished, complete articles. The garden accepts rough, developing, incomplete notes. The tension is productive: garden notes are where you develop ideas. Blog articles are where you harvest them.

A garden note that you have refined over six months is often a better foundation for a blog article than starting from scratch. The structure is there, the links are clear, the argument is tested. The article becomes the harvest of the garden work.

This is a more honest model than pretending that blog articles appear fully formed. Most good technical writing is the result of accumulated thinking that was never publicly visible. The garden makes that thinking visible at the right moment.

Tools for Digital Gardens

Obsidian Publish turns an Obsidian vault into a public website with graph view and bidirectional links. It requires a subscription but takes minimal setup. Good for engineers already using Obsidian.

Quartz is an open-source Hugo-based static site generator built specifically for Obsidian-style note gardens. It includes a graph view, bidirectional links, and search out of the box. Free, self-hosted, actively maintained.

Logseq Publish exports a Logseq graph as a public site with graph view and block-level linking. Well-suited for outliner-style note taking.

Foam is a VS Code extension that adds bidirectional links and graph view to a local Markdown workspace, with GitHub Pages publishing support. Good for engineers who prefer VS Code over dedicated note tools.

Plain Hugo with a status field and consistent internal links produces a functional garden with no additional dependencies. Less visual than the above options, but fully self-hosted and maintainable.

For engineers already running a Hugo site, the plain Hugo approach is the lowest-friction starting point. Obsidian and Quartz are worth considering when you want a richer graph view and are willing to manage a second publishing pipeline.

The Relationship to Second Brain and PARA

Digital gardening complements the broader second brain philosophy but is not identical to it. A second brain is a personal system for capturing, organizing, and retrieving all knowledge. A digital garden is a specific choice about what to make public and how to present it.

The PARA method handles the private organizational layer — projects, areas, resources, archives. The garden handles the public layer — what you share and how it grows. The two complement each other cleanly: PARA organizes your working context; the garden represents your developing thinking.

A practical workflow:

Fleeting note (captured during work)
  → processed into evergreen note (personal Zettelkasten)
    → linked into garden section as seedling
      → refined over months into mature garden note
        → harvested into blog article when complete

Each step is optional. Some evergreen notes stay private. Some garden seedlings never become blog articles. That is fine — the value at each stage is real.

Common Failures

Over-Polishing Seedlings

The value of a seedling is that it is rough. If you find yourself spending an hour perfecting a note before publishing it as a seedling, you are back to the blog model. Publish the rough version. The polish comes later.

Gardens Without Links

A collection of standalone notes with no links is a pile, not a garden. The linking is not optional — it is the structure. A garden note without links is a seedling that never grows.

Never Pruning

Gardens need maintenance. Notes that become obsolete, wrong, or superseded by better notes should be updated or archived. A garden that grows without pruning becomes a tangle.

Expecting Readers to Navigate Without Signposts

A public garden without clear status indicators is confusing. Readers need to know whether they are reading a rough draft or a stable reference. A simple status badge is the minimum viable signpost.

Practical Starting Point

The easiest way to start a digital garden is to pick three existing pieces of knowledge you want to develop publicly and publish them as seedlings this week.

Use a simple frontmatter status field. Label them as seedlings. Add one or two links to related content. Do not wait until they are finished — that is the whole point.

Over the following weeks, revisit them. Update the content. Add links. Promote them to Growing when they have real structure. The garden starts from the first published seedling, not from the finished design.

For engineers who write technical content and want that writing to compound rather than age, digital gardening is a practical publishing model that makes the invisible visible — the developing ideas, the growing understanding, the accumulating connections that actually constitute expertise.

PARA Method for Engineers: Organize Knowledge by Action

Rost — Sun, 21 Jun 2026 12:18:24 +0000

Organizing notes by topic sounds logical until you have notes on PostgreSQL in five different folders and cannot find the one that matters for today's problem.

The issue is not discipline. The issue is that topic-based organization asks the wrong question. "What is this about?" is useful for libraries. For engineers, the better question is "What am I doing with this?" That is the premise of PARA.

PARA is a simple four-bucket system created by Tiago Forte as the organizational backbone of his Building a Second Brain framework. The idea is that all information can be sorted into four categories: Projects, Areas, Resources, and Archives. Each category represents a different level of actionability, and that distinction drives where every note lives.

This guide applies PARA to engineering work specifically — codebases, documentation, learning material, and the tension between active project work and long-term reference.

The Problem With Topic-Based Organization

Most engineers organize knowledge the way they organize code: by domain.

databases/
  postgresql/
  redis/
api/
  rest/
  graphql/
devops/
  kubernetes/
  terraform/

That structure makes sense when you are browsing. It breaks down when you need something for a specific task. You remember a useful note about database migration safety, but it could be in databases/postgresql/, devops/deployments/, api/versioning/, or nowhere because you saved it somewhere temporary.

Topic folders force you to decide where knowledge belongs before you understand its context. PARA delays that decision — instead of asking what something is about, it asks what you are currently doing with it.

The Four Buckets

Projects

A project is active, time-bound work with a defined outcome.

For engineers, projects are things like:

Migrate billing service to queue v2
Upgrade PostgreSQL from 14 to 16
Write architecture decision record for auth service redesign
Implement rate limiting on public API
Publish article about distributed tracing

Every project has a completion state. When you finish, the project moves to Archives. When you are not actively working on it, it is not a project.

The key constraint: a project note should only contain what you need for that project. Reference material belongs in Resources. Reusable concepts belong in your Zettelkasten or personal notes. Project notes are working documents, not knowledge stores.

Areas

An area is an ongoing responsibility without a deadline.

For engineers, areas include:

System architecture
Infrastructure reliability
Code review quality
Professional development
API design standards
Security posture
On-call responsibilities
Mentoring

Areas do not finish. You are always responsible for infrastructure reliability. You always care about your professional development. The difference between a project and an area is that areas do not have exit criteria — they are things you maintain, not things you complete.

A useful rule: if you can imagine shipping it or closing the ticket, it is a project. If it is just part of what your role means, it is an area.

Resources

Resources are reference material you collected because it might be useful later.

For engineers:

API documentation bookmarks
Cheat sheets
Benchmark results
Architecture diagrams for third-party systems
Conference talks you want to revisit
Library documentation
Research papers
Interesting blog articles

Resources have no active home in your current work. They are collected because you expect to need them eventually. The important discipline here is that resources should not masquerade as projects. A collection of Kubernetes documentation is a resource. A running task to "learn Kubernetes for the platform migration" is a project.

PARA in Practice for Engineers

Here is a concrete example of what an engineer's PARA structure might look like in Obsidian:

Projects/
  billing-queue-migration/
  postgresql-16-upgrade/
  rate-limiting-rfc/
  blog-distributed-tracing/

Areas/
  architecture-standards/
  infrastructure/
  on-call-runbooks/
  career-development/

Resources/
  api-references/
  database-cheatsheets/
  benchmark-results/
  conference-notes/

Archives/
  2025-q4-projects/
  deprecated-services/
  old-runbooks/

The folder structure itself is not sacred. What matters is the discipline of placing notes in the right category based on their relationship to your current work.

Mapping a Typical Engineer's Knowledge

Many engineers start with an undifferentiated pile of notes. Migrating to PARA requires a single audit pass:

Projects — anything with a ticket, a deadline, or a deliverable you are currently working toward.

Areas — recurring responsibilities that define your role.

Resources — reference material you collected without a specific project in mind.

Archives — everything else.

A working rule: when in doubt, Archive it. You can always retrieve it later. An overcrowded Projects folder is more damaging than an underused Archive.

PARA and Zettelkasten: A Practical Hybrid

PARA and Zettelkasten are often compared as competing systems. They are not competing. They solve different problems.

Zettelkasten is for ideas. It captures atomic concepts, links them by meaning, and lets understanding emerge from the connections. Zettelkasten notes are not tied to projects — they belong to no active bucket. A note about idempotency applies to ten different projects, past and future.

PARA is for action. It organizes working context around what you are actively doing, responsible for, or collecting for later use.

A practical hybrid:

Projects/
  billing-queue-migration/
    migration-plan.md
    open-questions.md
    → links to Zettelkasten: [[Idempotency keys turn retries into safe operations]]
    → links to Zettelkasten: [[Outbox pattern separates persistence from delivery]]

Areas/
  architecture-standards/
    current-adr-index.md
    → links to Zettelkasten: [[Database constraints are concurrency control]]

Resources/
  benchmark-results/
    q1-2026-postgres-benchmarks.md

In this model, PARA folders hold working documents and context. Zettelkasten notes hold reusable knowledge. Project notes link to Zettelkasten concepts — the project uses the concept without owning it.

This is more durable than trying to make PARA do the job of Zettelkasten. Projects end. Concepts stay.

Common Failures

Over-Archiving

Some engineers use Archives as a dump for anything they feel guilty discarding. When Archives become large and unsorted, they lose their value. Archives should contain completed work in reasonable shape, not a graveyard of unsorted notes.

A periodic archive sweep — quarterly works well — keeps it manageable. Delete duplicates. Consolidate. Ask whether the old project note still contains anything worth keeping as a Resource or Zettelkasten note before archiving it.

Areas Becoming Dumping Grounds

When Areas grow without pruning, they start to look like a topic-based folder system. An Area called databases/ that contains unsorted notes from three years is not a responsibility — it is a pile.

Keep each Area tight. An area should represent something you are actively accountable for, not a topic you are broadly interested in. Interest goes into Resources. Accountability goes into Areas.

Resources Growing Without Review

Resources are easy to collect and easy to forget. A bookmark dump in Resources/ with 400 unsorted links is harder to use than a bookmark manager. Resources should be curated lightly — remove outdated material, keep the signal.

Skipping the Weekly Review

PARA works best with a weekly ten-minute review of your Projects folder. For each active project:

Is this still active?
What is the next concrete action?
Is there anything to move to Archives?

Without that review, Projects accumulate stale entries and the system loses its value as a current view of your work.

Implementation in Obsidian

Obsidian is a natural fit for PARA because folders map directly to the four buckets and Dataview queries can surface project status automatically.

A basic setup:

vault/
  ├── Projects/
  ├── Areas/
  ├── Resources/
  ├── Archives/
  └── Zettelkasten/     ← concept notes, linked freely

A simple Dataview query to surface active project notes:

LIST FROM "Projects"
WHERE !contains(file.path, "Archives")
SORT file.mtime DESC

Tags can mark status without moving files:

tags: [project, active]
tags: [project, paused]
tags: [project, done]

When a project completes, tag it done, then move the folder to Archives/YEAR-QN/. Simple, auditable, reversible.

Implementation in Plain Files

You do not need Obsidian. PARA works equally well in a Git repository with plain Markdown:

knowledge/
  projects/
    2026-billing-migration/
      README.md
      migration-plan.md
      decisions.md
  areas/
    architecture/
      adr-index.md
  resources/
    databases/
      postgres-16-release-notes.md
  archives/
    2025/
      feature-x-launch/

Git gives you history, diff, search, and portability. That is often more than enough for a personal system.

When PARA Makes Sense

PARA is well suited when:

You juggle multiple active projects at the same time
You need to quickly find what relates to today's work
You want a system that is folder-friendly and tool-agnostic
You combine it with a Zettelkasten or concept-note layer for reusable ideas

PARA is less useful when:

You work on a single long-running project with no clear buckets
You are primarily doing research-oriented work with no active deliverables
You prefer emergent structure over explicit categorization

For engineers doing a mix of active project work and long-term learning, PARA and Zettelkasten together cover most cases: PARA for context, Zettelkasten for thinking.

Decision Framework

When a new note arrives, ask these questions in order:

Is this tied to something I am actively working toward? → Projects
Is this part of an ongoing responsibility I own? → Areas
Is this reference material I might need later? → Resources
Is this finished or inactive? → Archives
Is this a reusable concept or idea not tied to any project? → Zettelkasten

That is the full decision tree. Five options. One rule per option. It takes about ten seconds per note.

Final Thoughts

PARA works because it matches how engineers actually use knowledge — not for browsing, but for acting. You do not open your notes to see what is in databases/. You open them because you are working on a specific problem right now, and you need the relevant material to surface quickly.

The discipline of separating active projects from reference material, and both from finished work, reduces the cognitive overhead of maintaining a personal knowledge base. In combination with a personal knowledge management foundation and a Zettelkasten for concept-level notes, PARA gives you the organizational backbone that keeps everything findable when it matters.

Start with one folder per bucket. Run one audit to sort your existing notes. Review Projects weekly. The rest will follow naturally.

Evergreen Notes: Write Notes That Compound Over Time

Rost — Sun, 21 Jun 2026 12:18:21 +0000

Most engineering notes are written once and forgotten. You capture something during a debugging session, paste it into a doc, and rediscover it two years later with no context for why it mattered.

The problem is not effort. Engineers write constantly — code comments, Slack messages, Confluence pages, Jira descriptions, pull request explanations, architecture diagrams. The problem is that most of those notes are written for a specific moment and age poorly. They do not compound. They accumulate.

Evergreen notes are the alternative. The idea is simple: write each note so that it stays useful indefinitely, improves when you revisit it, and connects to other notes in a way that makes the whole system more valuable over time.

The term was popularized by researcher Andy Matuschak, whose own public notes demonstrate the idea at scale. For engineers, the principle has direct applications in technical writing, documentation, architecture decisions, and the long-term capture of hard-won lessons.

What Makes a Note Evergreen

Atomic

An evergreen note contains one idea. Not one topic — one idea.

A note called "PostgreSQL" is not evergreen. It is a container waiting to be filled. A note called "Partial indexes reduce write overhead when queries target a small subset" is evergreen. It states a specific, portable claim.

The atomic constraint is important because it controls reuse. A container note can only be linked as a vague topic. An atomic note can be linked wherever that specific idea applies — in a discussion of query optimization, in a comparison of indexing strategies, in a project note about a specific performance problem.

Standalone

An evergreen note should be understandable without its original source.

That means writing in your own words. A note that says "See the linked article — good stuff on caching" is not evergreen. A note that says "Write-through caching updates the cache synchronously with the database on every write, improving read consistency at the cost of higher write latency" is evergreen. You can read it a year later without chasing the original source.

This is harder than it sounds. Writing a standalone note requires actually understanding what you read, not just tagging it. That processing step is where most of the learning happens.

Evolving

Evergreen notes improve over time rather than going stale.

A fleeting note has a lifecycle: you write it, it serves a moment, it becomes irrelevant. An evergreen note should be worth revisiting and refining six months or two years later. You might add a counterexample, update it with a production experience, link it to a new pattern, or simply rewrite it more precisely.

The word "evergreen" is intentional: these notes do not die after harvest. They persist and improve.

Linked

Evergreen notes connect to other notes rather than sitting in isolation.

A standalone note about write-through caching connects naturally to notes about read-heavy workloads, cache invalidation, eventual consistency, and database write performance. Each link makes both notes more useful — the connection surfaces context that neither note contains alone.

The linking habit is what turns a collection of individual insights into a network of connected understanding.

Note Types and When to Use Each

Understanding evergreen notes requires understanding what they are not.

Fleeting notes are temporary captures. A line scribbled during a debugging session, a bookmark to revisit, a question to follow up on. Fleeting notes serve a moment. They should be processed quickly and either discarded or promoted into something more durable. Most fleeting notes never become evergreen notes, and that is fine.

Literature notes are summaries of external sources — a documentation page, a postmortem, a book chapter, a conference talk. Literature notes preserve what a source said. They are a step toward understanding, not understanding itself. A literature note says "this source claims X." An evergreen note says "I believe X for these reasons."

Evergreen notes synthesize what you have come to understand. They live at the output of the learning process, not the input.

Note type	Purpose	Lifespan	Example
Fleeting	Quick capture	Hours to days	"Look into why Postgres vacuum missed this row"
Literature	Source summary	Medium term	"Redis docs say AOF fsync default is 1s"
Evergreen	Portable idea	Years	"Fsync-on-write durability trades throughput for crash safety"

Writing Evergreen Technical Notes

The structure of a good evergreen technical note follows a simple logic: claim, evidence, implication.

# Write-through caching improves read consistency at the cost of write latency

Write-through caching updates the cache at the same time as the underlying store
on every write. Every read hits fresh data because the write path ensures
consistency before the write is acknowledged.

The tradeoff is write latency — every write now requires two operations (store
and cache) to complete before the caller receives a confirmation.

This pattern suits read-heavy workloads where cache staleness has real
business impact, such as product inventory counts or user settings.

Links:
- [[Read-through caching shifts cache population to read time]]
- [[Cache invalidation is a coordination problem]]
- [[Write-behind caching trades consistency for write throughput]]

That note is useful without the source. It states the claim, explains the tradeoff, gives a context where it applies, and links to related ideas.

What to Avoid

Time-sensitive references age badly. "As of Postgres 14, this behavior works this way" is a literature note, not an evergreen note. Write the principle instead: "The planner skips index scans when estimated row count exceeds a threshold relative to table size." That claim survives version changes even if the threshold changes.

Tool-specific commands without context are snippets, not notes. A note that is just a kubectl command copied from a StackOverflow answer is not evergreen. A note about why that command works — what Kubernetes resource it affects and what problem it solves — has a chance.

Assumptions about reader knowledge degrade fast. Write as if explaining to a competent colleague who is not inside your current context.

Good Candidates for Evergreen Notes in Engineering

Almost any hard-won lesson with broad applicability is a good candidate:

Architecture tradeoffs and the reasoning behind decisions
Debugging patterns that apply across systems
API design rules and their edge cases
Performance characteristics with real-world numbers attached
Security assumptions that turned out to be wrong
Test strategy lessons from projects where the approach failed
Deployment constraints that changed how the team worked

The common thread: specific enough to be actionable, general enough to apply more than once.

The Evergreen Workflow

Step 1: Capture Fleeting Notes

Capture quickly without overthinking. The goal is not to produce an evergreen note in the moment — it is to preserve the raw material for one.

During a debugging session:

Found that the cache was returning stale user permissions after role changes.
The TTL was 5 minutes but the role update was immediate.
Need to think through how to handle this — invalidation on write?
Or shorter TTL? Or event-driven update?

That is a fleeting note. It is not an evergreen note, but it contains the seeds of several.

Step 2: Process Into Evergreen Notes Within 48 Hours

Processing is where the value appears. Take the raw capture and extract the ideas that are worth preserving.

From that debugging note, you might write:

# Role-based cache entries require invalidation on write, not just TTL expiry

When cached data encodes permissions or roles, TTL-based expiry is not safe.
A user whose role is downgraded keeps elevated permissions until the TTL expires.
Write-time invalidation — or event-driven cache updates on role change — is required
for correctness in permission-sensitive caches.

Links:
- [[Cache invalidation is a coordination problem]]
- [[Authorization decisions should not be cached at rest without validation]]

The debugging context is gone. The portable idea remains.

Step 3: Connect to Existing Notes

After writing the note, spend two minutes asking:

What existing note does this relate to?
What concept does this depend on?
What does this extend or contradict?

Add links in both directions. The new note links to existing notes. Existing notes that are now richer for the connection link back.

Step 4: Revisit and Improve

Evergreen notes do not have a single correct state. Every time you encounter the idea again — in a production incident, a design review, a code review comment — consider returning to the note and making it better.

You might:

Add a more concrete example
Update the claim based on new evidence
Remove a caveat that turned out not to matter
Add a link to a new related note
Rewrite the opening sentence for clarity

That cycle of refinement is what makes notes compound rather than decay.

Evergreen Notes and Documentation

There is a useful distinction between personal evergreen notes and team documentation.

Personal evergreen notes are your understanding, written for future you. They can be rough, opinionated, and incomplete. Their value is in being reusable for your thinking.

Team documentation is for shared understanding. It needs accuracy, accessibility, and maintenance ownership.

The two layers complement each other. Your evergreen notes about why a system was designed a certain way can become the raw material for the architecture decision record. Your debugging notes can feed the runbook. Your API design notes can inform the style guide.

The direction of flow is usually: evergreen notes → polished documentation, not the reverse.

Evergreen Notes and RAG Systems

As AI-augmented knowledge tools become more practical, well-written evergreen notes become increasingly valuable as retrieval source material. The retrieval versus representation problem in knowledge management is essentially about quality of source material — and evergreen notes, being atomic, standalone, and written for comprehension, chunk well for vector search.

A Zettelkasten of atomic evergreen notes is a natural foundation for a personal RAG system. The atomic structure aligns with retrieval chunk size. The standalone property means retrieved notes need no additional context to be useful. The linking structure provides graph traversal opportunities beyond keyword search.

This is increasingly relevant for engineers who want to query their own knowledge base with an LLM rather than starting from scratch each time.

Common Pitfalls

Writing Too Broadly

A note that covers an entire topic is not an evergreen note — it is a draft article. If your note is longer than a single screen and covers more than one claim, break it into smaller notes and link them.

Writing Too Narrowly

A note that is too specific to one context has no reuse value. "Fixed the billing service cache bug on 2024-03-14" is a log entry, not an evergreen note. Raise the abstraction level until the idea applies in at least three different contexts.

Confusing "Evergreen" With "Never Changes"

Evergreen does not mean immutable. It means the note remains worth returning to. A note about Go generics written in 2022 is still evergreen if you update it to reflect how patterns evolved in 2024. A note that you never touch because you believe it is permanently correct is a note that will eventually become wrong in silence.

Skipping the Processing Step

The most common failure is treating evergreen notes as a collection target rather than a writing practice. You cannot grow a collection of high-quality atomic notes by saving bookmarks. The evergreen note is not the article you read — it is what you extracted from it in your own words.

Tools

Obsidian

Obsidian is the most popular tool for evergreen notes. Its local Markdown files, bidirectional links, and graph view align well with the practice. A simple structure:

vault/
  fleeting/
    daily/
  literature/
  evergreen/
  maps/       ← index notes for clusters of evergreen notes

The graph view in Obsidian makes link clusters visible — useful for discovering which concepts form natural groups that might become index notes or published articles.

Plain Markdown With Git

A Git repository of Markdown files works well and has no dependency on any specific tool. Standard Markdown links connect notes. Search is handled by your editor or grep. Version history comes from Git.

knowledge/
  evergreen/
    caching/
    api-design/
    performance/
  literature/
  fleeting/

The discipline is the same regardless of tool — one idea per note, written in your own words, linked to related notes.

Starting From Zero

The most useful way to start is not to migrate your existing notes. It is to write one evergreen note today.

Take something you learned in the last week. Write it as a claim. Explain it in your own words in one paragraph. Add links to zero or one related ideas.

That is a complete evergreen note. Repeat once per week for six months and you have a working system.

The compounding effect takes time to become visible. Engineers who maintain evergreen notes for a year often report that their notes start answering questions before they finish asking them — because they have already written the answer in a previous context.

Final Thoughts

The reason evergreen notes work is not that they are better at storage. They are better at thinking. The discipline of writing one portable idea per note, in your own words, with links to related ideas, forces understanding that passive collection does not.

For engineers, this has practical consequences. The notes from a production incident that you process into evergreen format are more useful than the incident log. The design tradeoff you distill into an atomic note is more useful than the architecture diagram. The debugging pattern you generalize from a specific bug is more reusable than the ticket.

Used alongside the PARA method for organizing active work, evergreen notes give you the conceptual layer that PARA does not provide — a growing network of reusable understanding that persists across projects, across roles, and across years.

Cost Optimization for LLM Systems: Where the Money Actually Goes

Rost — Fri, 19 Jun 2026 09:52:51 +0000

LLM costs scale linearly with usage. A system processing 10,000 requests a day at $0.01 per request costs $100 daily — $365 a year. At enterprise scale, that's over $10,000.

Cost optimization isn't about cutting corners. It's about spending tokens where they matter.

Every token you waste is a token you could have spent on a better answer.

Token budgeting

The simplest way to control costs is to set limits. Per session, per task, or per day.

Strategy 1: Per-Session Budgets

Per-session budgets are straightforward:

class SessionBudget:
    def __init__(self, budget_tokens: int = 10000):
        self.budget = budget_tokens
        self.used = 0

    def allocate(self, tokens: int) -> bool:
        if self.used + tokens <= self.budget:
            self.used += tokens
            return True
        return False

    def remaining(self) -> int:
        return self.budget - self.used

Strategy 2: Per-Task Budgets

Per-task budgets are more useful. Different tasks need different amounts of context:

task_budgets:
  classify:
    max_tokens: 100
    model: qwen2.5-1.5b
  summarize:
    max_tokens: 500
    model: qwen2.5-7b
  code_review:
    max_tokens: 2000
    model: qwen2.5-coder-7b
  reason:
    max_tokens: 4000
    model: qwen2.5-32b

Strategy 3: Adaptive Budgets

Adaptive budgets adjust based on what actually happens. If classification tasks consistently use 80 tokens, stop allocating 100:

class AdaptiveBudget:
    def __init__(self):
        self.task_history = {}

    def allocate(self, task_type: str) -> int:
        if task_type in self.task_history:
            return int(self.task_history[task_type] * 1.5)
        return 1000

    def record(self, task_type: str, tokens_used: int):
        if task_type not in self.task_history:
            self.task_history[task_type] = tokens_used
        else:
            self.task_history[task_type] = (
                0.9 * self.task_history[task_type] + 0.1 * tokens_used
            )

The exponential moving average (0.9 weight) means recent usage matters more than history. Adjust the weight based on how volatile your workloads are.

API vs local inference

Local inference is cheaper at scale. The break-even depends on your hardware and API rates.

Model	API ($/M tokens)	Local cost/hour	Break-even
GPT-4o	$2.50 / $10.00	—	N/A
Claude Sonnet 4	$3.00 / $15.00	—	N/A
Qwen2.5-72B	$0.50 / $2.00	~$0.50	~4 hours/day
Qwen2.5-32B	$0.30 / $1.20	~$0.20	~2 hours/day
Qwen2.5-7B	$0.10 / $0.40	~$0.05	~1 hour/day

The hardware math:

Hardware	Upfront	Monthly electricity	Break-even vs API
RTX 3090 (used)	$600	$15	~4 months
RTX 4090	$1,500	$20	~6 months
RTX 5080	$1,000	$18	~5 months
DGX Spark	$2,000	$30	~8 months

At moderate usage — an hour or more per day — local inference pays for itself. At high usage, the savings are dramatic. The catch is upfront capital. A RTX 5080 is $1,000. An API bill you can pause. Hardware you can't.

Fallback strategies

When your preferred model is too expensive or too slow, fall back to something cheaper. The key is knowing when quality is "good enough."

Strategy 1: Quality-Based Fallback

Quality-based fallback tries models until the output meets a threshold:

class QualityFallback:
    def __init__(self, quality_threshold: float = 0.8):
        self.threshold = quality_threshold
        self.models = [
            {"model": "claude-sonnet-4", "cost": 0.015},
            {"model": "qwen2.5-72b", "cost": 0.002},
            {"model": "qwen2.5-32b", "cost": 0.001},
            {"model": "qwen2.5-7b", "cost": 0.0004},
        ]

    def route(self, prompt: str) -> str:
        for model_config in self.models:
            result = self.call_model(model_config["model"], prompt)
            if self.evaluate_quality(result) >= self.threshold:
                return result
        return self.call_model(self.models[0]["model"], prompt)

The problem is evaluation itself. How do you measure quality without calling another model? Some systems use a small classifier. Others use heuristic checks — length, structure, keyword presence. None of these are perfect.

Strategy 2: Latency-Based Fallback

Latency-based fallback is simpler. Route to the fastest model that meets your time budget:

class LatencyFallback:
    def __init__(self, max_latency: float = 5.0):
        self.max_latency = max_latency
        self.models = [
            {"model": "qwen2.5-1.5b", "latency": 0.5},
            {"model": "qwen2.5-7b", "latency": 2.0},
            {"model": "qwen2.5-32b", "latency": 10.0},
            {"model": "claude-sonnet-4", "latency": 5.0},
        ]

    def route(self, prompt: str) -> str:
        for model_config in sorted(self.models, key=lambda x: x["latency"]):
            if model_config["latency"] <= self.max_latency:
                return self.call_model(model_config["model"], prompt)
        return self.call_model(self.models[0]["model"], prompt)

Caching

Caching is the most underrated cost optimization. Identical prompts happen more often than you think — classification requests, FAQ-style queries, repeated tool calls.

Strategy 1: Prompt Caching

Exact prompt caching is simple:

import hashlib

class PromptCache:
    def __init__(self, max_size: int = 1000):
        self.cache = {}
        self.max_size = max_size

    def get(self, prompt: str) -> str | None:
        key = hashlib.sha256(prompt.encode()).hexdigest()
        return self.cache.get(key)

    def set(self, prompt: str, response: str):
        key = hashlib.sha256(prompt.encode()).hexdigest()
        if len(self.cache) >= self.max_size:
            self.cache.pop(next(iter(self.cache)))
        self.cache[key] = response

Strategy 2: Semantic Caching

Semantic caching is more useful. It catches prompts that are different but mean the same thing:

from sentence_transformers import SentenceTransformer

class SemanticCache:
    def __init__(self, similarity_threshold: float = 0.95):
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
        self.cache = {}
        self.threshold = similarity_threshold

    def get(self, prompt: str) -> str | None:
        prompt_embedding = self.model.encode([prompt])[0]
        for cached_prompt, cached_response in self.cache.items():
            cached_embedding = self.model.encode([cached_prompt])[0]
            similarity = self.cosine_similarity(
                prompt_embedding, cached_embedding
            )
            if similarity >= self.threshold:
                return cached_response
        return None

    def set(self, prompt: str, response: str):
        self.cache[prompt] = response

The threshold matters. 0.95 is aggressive — only very similar prompts match. 0.85 is more forgiving but risks returning wrong answers. Measure your miss rate and adjust.

Response caching for common queries is worth it too. If users ask "what's the weather" or "what time is it" repeatedly, cache the pattern, not just the exact prompt:

class ResponseCache:
    def __init__(self):
        self.common_queries = {
            "what is the weather": "Check weather API",
            "what is the time": "Check system time",
            "who is the president": "Check current president",
        }

    def get(self, query: str) -> str | None:
        query_lower = query.lower()
        for common_query, response in self.common_queries.items():
            if common_query in query_lower:
                return response
        return None

This isn't sophisticated, but it works. Common queries are common for a reason.

When optimization helps

Optimization matters when you're processing high volumes, running mixed workloads, or paying API costs that add up.

It doesn't matter when you're prototyping, using a single model, or processing low volumes. The complexity of budgeting, fallback, and caching isn't worth it for a system that makes 100 requests a day.

Get the basic flow working first. Add optimization when the bill comes in.

Tradeoffs

Strategy	Cost	Quality	Complexity
No optimization	Highest	Consistent	Lowest
Token budgeting	Moderate	Variable	Medium
Fallback models	Low-Medium	Variable	Medium
Caching	Lowest	High (for cache hits)	Medium
Hybrid	Optimized	Optimized	Highest

Production systems usually run hybrid. Budget per session, fall back on quality or latency, cache what you can. The complexity is real, but so are the savings.

Model Routing Strategies — capability-based, cost-aware, latency-aware routing
LLM Guardrails in Practice — input validation, output filtering, safety
Multi-Model System Design — architecture for multiple models
LLM Architecture — system design pillar: routing, cost, guardrails, and orchestration