Instrumentation
For basic observability of web applications, SAF provides the logic for:
- Logging
- Metrics
- Error reporting
All SAF-provided libraries support these systems, and this library provides all the tools for new systems to do the same.
Tracing is the one major area of instrumentation that SAF does not provide. That's mostly due to the other three being simpler to set up and mostly sufficient. Still, SAF is designed in a way where tracing can be added, in particular by enforcing a consistent context definition (through this library) and using wrapping functions in libraries such as @saflib/express
and @saflib/drizzle-sqlite
where spans can be systematically added at package boundaries.
Stores
Instrumentation makes heavy use of Node's AsyncLocalStorage
to store context and reporters.
Context
Context is information about what is currently running, in what environment. Any subsystem is expected to provide this context, and it can be used to:
- Affect behavior of the operation, mainly through the
auth
field - Add context to instrumentation, which is basically every other field
See SafContext
for more details (storage: safContextStorage
).
Reporters
Reporters are functions for reporting telemetry to various services. They depend on the context, and are not serializable, so these are kept in a separate AsyncLocalStorage
instance.
The main functions to use are:
log
- a Winston logger which applications can add transports to.logError
- a convenience function for loggingError
objects. It logs to bothlog
and anyErrorReporter
callbacks (so errors appear both in logging systems like Loki and error reporting services like Sentry).
See SafReporters
for more details (storage: safReportersStorage
).
Use in Applications
The main functions to use inside HTTP handlers, gRPC handlers, cron jobs, and alike are:
Use these for all logging and auth purposes. They will error if the application has not provided them, and what they return is typed to be what you should expect. These are mainly to avoid existence-check boilerplate, otherwise you could also just use safContextStorage
and safReportersStorage
's getStore
methods directly.
Logging and Testing
By design, these helper functions will error if context and reporters have not been provided by the application, but the application may not be in the mix if you're testing smaller pieces in isolation. So these functions will return stubs if and only if the NODE_ENV
environment variable is test
. It will also log errors to console so they show up in test output.
If you want to check that certain logs are being made in tests, you can use getSafReporters
to get the universal loggers and spy on them. See @saflib/cron
's unit tests for example.
Integrate Logging
When you set up a new service, you will need to integrate logging with your chosen collectors or external services and do some other setup.
setServiceName
- sets the service name, which is used to identify the service in logs and metrics. The service name should match the service package name (minus any organization prefix) and the docker image/service name, for consistency.addErrorCollector
- adds a callback for when errors are reported by the application. Callbacks receive aErrorCollectorParam
object, which is based off Sentry'scaptureContext
parameter for captureContextaddTransport
- adds a Winston transport to thelog
logger.collectSystemMetrics
- opts intoprom-client
's default metrics, which are a superset of Prometheus's recommended metrics.
Call all these before initializing any servers or other long-running processes, and start with setServiceName
since any log
calls will fail without one.
If you have some service-specific context (which is likely, especially for shared clients to databases and other services), you should put those in a sibling {service-name}-common
package and provide them to each of your subsystems. Some, such as @saflib/grpc
, provide helpers for this.
Provide Context and Reporters
It's the job of subsystem libraries such as @saflib/express
and @saflib/grpc
to provide context and reporters for each operation. They can do this preferably with the safContextStorage
and safReportersStorage
's run
method.
They should use the following functions and variables:
createLogger
to create a Winston logger. ProvidesubsystemName
andoperationName
. To do this, a logger will have to be created for each "request" or "run".generateRequestId
. Only needed if not provided by the caller, which it should be if the operation does not originate from the subsystem itself such as for cron jobs.safContextStorage
andsafReportersStorage
to provide a context. Userun
method ideally, orenterWith
if necessary.defaultErrorReporter
for a standard error reporter.makeSubsystemReporters
when you want to log outside of an operation, such as when initializing a subsystem.
See examples throughout @saflib
.
Recording Metrics
Metrics should be recorded through subsystem libraries as well, using prom-client
. @saflib/express
uses express-prom-bundle
to record metrics for HTTP requests, and the other SAF libraries use prom-client
directly to provide a similar histogram metric. See examples.
Beyond these basic RED metrics, SAF application code does not currently provide any other guidance or built-in metrics for back-end, but there's certainly room for more, potentially through traces and finite state machines.