Skip to content

Best Practices

The following are rules I recommend for any software stack so that both human and AI agents work quickly and effectively. I use my work on SAF to identify, develop, and demonstrate these rules. This is a living document, and will continue to evolve as I learn more.

Embrace Type Safety at Every Layer

Many of the following rules depend on the stack having a type system. Unless every interface in the stack can by typed, development will be fundamentally hampered, as static analysis provides an immediate and reliable feedback loop which is critical for rapid and reliable development.

Example applications:

Return Errors

Some languages (such as Go) expect errors to be returned by functions. Others (such as Java) let you specify what they throw. Many do neither; lacking a way to specify what kinds of exceptions they emit one way or another.

For those in the last camp, instead of throwing or raiseing exceptions or errors, adopt a common interface which returns either an error or a result (never both, never neither). With TypeScript for example, an operation which may fail in normal operations should result in something like this:

typescript
const { result, error } = await unsafeOperation();
if (error) {
  switch (true) {
    case error instanceof ErrorClass:
      return res.status(errorCode);
    default:
      throw error satisfies never;
    }
  }
}

// At this point, TypeScript knows that result is defined.
use(result.someValue);

Note that logic may still throw exceptions, but this should be truly exceptional. Exceptions should only be thrown in cases where, if it actually happens in a test or in production, it will be considered an issue to be fixed.

Packages should only return instances of Errors they define. Propagating downstream errors (such as from a dependency like a database error) upstream is a form of tight coupling. Even unhandled errors should be caught at the package or service boundary and obfuscated (such as a generic "UnhandledError") to ensure consumers cannot tie themselves to deeper dependencies.

Benefits:

  • There's a clear delineation between expected and unexpected errors.
  • Expected errors are typed, and TypeScript nudges the developer to handle them.
  • If a function starts returning a new error type, consumers can be forced to handle it by using the throw error satisfies never pattern.
  • By masking dependency exceptions, libraries protect against an entire category of nasty coupling.

Example applications:

Keep Files Small

This is most applicable to parts of the codebase where there are a number of the same thing, e.g. components, routes, handlers, pages, and database queries. Each of these should live in a single file, rather than being grouped by domain (such as all CRUD operations for a single entity in one file). If any single instance of those grows to a larger size, it should be broken up into smaller pieces, such as sub-components, library methods, or transactions which depend on smaller queries.

Benefits:

  • Less necessary context for agents, scrolling for humans
  • Edits are faster as there's less surface area for the agent to work with when making changes.
  • Automated workflows on files, such as creating or updating a new route or page, can be done more easily by just providing a file path.

Example application:

Keep Code Modular

Software should be broken up into packages which have a well-defined purpose, a public interface, and an up-to-date list of dependencies. These packages should also not become too large in any of these regards.

Benefits:

  • Similar to "Keep Files Small"; it limits the required context for humans or agents to work on one surface at a time.

Example application:

SAF itself is broken up into a good number of monorepo packages which are also listed in the sidebar of the documentation.

Specify and Enforce Shared APIs, Models, and Strings

Anything that is shared across system boundaries should have a clear, independent source of truth, and that source of truth should be enforced by every consumer, ideally by type checking or some other form of static analysis.

Assets to be owned by one package, shared with other packages, and enforced by static analysis across all usages include:

  • URLs to product pages
  • Product copy
  • Product events
  • HTTP/gRPC request and response objects
  • Business models
  • Environment variables

For serialized data, prefer flattened data stuctures with identifiers for related data rather than nested objects.

Benefits:

  • Quick feedback when a shared contract changes which would break dependent code, such as what is written on a page or what pages there are in a SPA or what metadata a product event requires, showing how much work there is to do and where to do it.
  • Sharing strings for product copy reduces the likelihood of typos and inconsistencies, shifting fixes left. This is particularly useful for tests which check that a given string exists in a UI, or needs to navigate to a given page.
  • Flattened data structures allows for passing only the data that is needed, and avoids coupling those structures to the source, e.g. through one or several API calls, GraphQL, websockets, a cache, or other.

Example applications:

  • @saflib/links provides a simple structure and some methods for sharing and using links to specific pages between packages.
  • @saflib/utils provides an interface for objects with textual HTML element properties which can be easily v-binded in a Vue template, and located by Playwright.

Build and Maintain Fakes and Adapters for Service Boundaries

Any non-trivial app will reach a point where it depends on separate services, either first or third party, and those integration points should provide fake implementations for consumers to use for testing. This allows consuming services and packages to have unit or integration tests which don't depend on a live or fully-featured service.

Fakes should be available for both frontend and backend testing. Packages that must provide fake implementations are:

  • Frontend SDKs
  • gRPC clients
  • 3rd party integrations

A fake implementation should have:

  • An in memory store so unsafe operations can be tested
  • A set of store states (scenarios) which can be tested against, e.g. different subscription states for a mocked payment service

Beyond that, more complexity should only be added judiciously, such as consistency checks.

Third party integrations often provide more features than the product needs; the package or module responsible for integrating with that third-party service should adapt the interface to solely the needs of the product.

Benefits:

  • Fakes and stubs are necessary to ensure any component can be tested (from unit through E2E) with confidence and speed.
  • By limiting how complex they may become, the cost to maintain and understand them is bounded.
  • An adapter helps control and manage how tightly coupled a product is to any third-party service, while also reducing the surface area that needs to be faked.

Example application:

  • @saflib/sdk provides a helper method for typing fake server handlers.

On Ownership of Fakes

Fakes are often included with or adjacent to the test code that uses them out of expediency, but they should instead be part of the integration module or client package. In practice, test writers will write and contribute additional scenarios, but these should be kept and managed in a central location to reduce redundant work and updates when shared models change. And there's no reason for every consumer to have and maintain its own fake implementation.

Have Thorough Test Coverage

Where possible test logic in pure functions with no dependencies.

In reality, this is often not practical, at least when it comes to web applications.

A solid automated test suite will include:

  • Pure unit tests on library or utility code.
  • Unit tests on database queries with an in-memory instance of the database.
  • Integration tests for each operation such as web and gRPC endpoints, FSM states, and background jobs, with in-memory database instances and faked external and internal service dependencies.
  • Integration (or component) tests for frontend web pages and components, where network requests are faked. Focus mainly on testing renders, not interactions.
  • E2E tests with all external dependencies faked and all internal dependencies fully running.

Packages should be measured for how covered their code is by their tests. A majority of code should be tested at least, with at least one unit or integration test per file (component, endpoint, query, etc). These files should be immediately adjacent to the file they test (no separate "tests" directories).

Tests should stick to testing the package, API, or user interface they are targeting as much as possible. They should avoid reaching behind and checking, for example, database entries directly, because the tests should focus on the interface and not its dependencies. This is not a hard-and-fast rule however; here are some exceptions:

  • Checking product events or key metrics are recorded correctly after a user behavior or endpoint call, to ensure analytics and monitoring don't break or become inaccurate.
  • Viewing emails sent in order to make sure they do get sent, or to get some contents such as an email verification code, which are necessary for the test to continue.

Benefits:

  • A great deal of peace of mind that the product continues to work as expected even when a great deal has changed.
  • Generated code that is tested is far less likely to need to be reworked when used later in the process.
  • For automated, agentic workflows, sufficient automated test coverage is required for workflow evals to work.

Example application:

  • SAF packages often have testing-specific documentation, workflows, and/or exports to help enable greater and more consistent testing.

Document the Codebase, Make it Accessible

There needs to be a source of truth for how to do things the right way specific to your stack, and the closer those decisions are to the code the better.

  • Reference documentation should be generated from source, such as through Typedoc or Redoc.
  • Explanation docs should live in the package they address, such as in markdown files in a separate docs/ directory.
  • How-To Guides are naturally expressed with agentic workflows, and human-readable versions should be generated from those.

Benefits:

  • Written documentation provides a source of truth for how to do things the right way specific to your stack. Workflows can be developed toward and checked against them.
  • If something is not documented, it can be assumed to be undecided.

Example applications:

  • Each package in the docs has a mix of explanation docs and generated reference docs. Instead of maintaining them separately, how-to guides are effectively generated from workflows such as this one.
  • @saflib/dev-tools provides a CLI tool for generating documentation from code, workflows, and CLI commands. For example, this workflow was generated from this code.