Node.js async the right way

9 June 2024

Every technology stack has idiosyncrasies that can trip people up when they first encounter them. For whatever reason, I’ve particularly experienced this with engineers coming to Node.js from other platforms. Node is versatile, but there’s a specific paradigm of usage that can be surprising to the uninitiated. So, having been through this journey with people a few times now, here is my primer for anyone new to Node. The paradigm it talks about is asynchronous programming, or async for short.

What is async?

Fundamentally, async programming is a way to implement concurrency without multithreading. To some that may sound like a weakness right off the bat, but multithreading brings with it a whole raft of challenges that many people (myself included) find difficult to use safely. Async has its own challenges too of course, so like anything there are tradeoffs. But for workloads that are not CPU-bound and feature plenty of IO (e.g. web application backends), async can be a real sweet spot.

A core abstraction in async programming is the event. Events occur when things happen; in the backend of a web application that could be a new incoming request or it might be the continuation of an existing request after a database query has finished. At any given moment, only one event is scheduled in to execute code. When control leaves that code for some reason, running a database query perhaps or returning a response to a client, another event is scheduled in. In this way, rapidly scheduling events in and out, requests are handled concurrently but not in parallel. The defining trait of async programming is concurrency without parallelism.

Another key abstraction is the promise, which encapsulates the deferred result of an async function. Async functions are invoked immediately but as soon as they await internally, their promise is returned and the calling code continues until it, too, yields control by awaiting or returning. At that point, other events are scheduled to run, including but not limited to those within the async function itself.

Although async functions are often awaited immediately at the callsite, that doesn’t have to be the case. Promises can be passed around like any value, so it’s possible there will be some distance between an async call and its await. In some code paths, the await may not happen at all. But regardless of when or whether the promise is awaited, events for the async function are always scheduled in.

The cornerstone of Node’s async implementation is the event loop. On each iteration (or tick) of this loop, Node pops waiting callback functions from a first-in-first-out queue and invokes them in sequence. Whenever your code awaits, control is passed back to the event loop and a callback for its continuation is pushed onto the queue. That’s why so many core Node interfaces are callback-oriented, because callback functions were the original abstraction for async. Promises came much later, then async and await were syntactical sugar added later still.

Node has two separate mechanisms for event dispatch: the DOM EventTarget interface implemented by web browsers, which I won’t cover here, and its own EventEmitter class. I prefer EventEmitter because it’s simpler. In backend code, I’ve never needed the DOM’s tree hierarchy or its capture and bubble event phases.

Events, coupling and cohesion

Pretty universally, engineers hate coupling. Coupling can be defined as the degree to which a given component is dependent on its counterparts’ implementation details. When components are tightly coupled, changes made to any one are more likely to require equivalent changes to the other(s). This makes maintenance harder and can harm reliability, because widespread changes are harder to reason about and more likely to introduce bugs. So instead we strive for cohesion, where components depend on standard interfaces that are less likely to change, allowing implementation details to vary more freely in isolation.

A nice side-effect of using events for async is that they’re a great foundation for building cohesive codebases. If you’re ever concerned that some of your modules are complected, it can be helpful to consider events as a separation mechanism.

One way to think about events is that they reverse the direction of function calls between your components. Instead of using functions like commands, where you’re telling some other component to do a thing, you emit an event indicating that a thing has happened. If other components are interested in the thing, they can opt in by listening for that event and likewise any other events they want to hear about. This reduces the coupling between your components to just the event names and the structure of their associated data.

To show what I mean more concretely, consider this toy example of some code that signs up new users for an online account:

export function initSignup(
  billing: BillingModule,
  search: SearchModule,
  sessions: SessionsModule,
  users: UsersModule,
): SignupHandler {
  return {
    async signup(emailAddress: string): Promise<Session> {
      const user = await users.create(emailAddress);

      billing.init(user);
      search.init(user);

      return sessions.create(user);
    },
  };
}

Here the signup logic has direct dependencies on modules for billing and search, but perhaps it doesn’t have to. Instead it could emit a user-created event, which those other modules listen for:

export function initSignup(
  emitter: EventEmitter,
  sessions: SessionsModule,
  users: UsersModule,
): SignupHandler {
  return {
    async signup(emailAddress: string): Promise<Session> {
      const user = await users.create(emailAddress);

      emitter.emit('user-created', user);

      return sessions.create(user);
    },
  };
}

The difference may not seem a big deal in this example, but in production systems it’s not uncommon for the volume of dependencies between modules to be far heavier. Replacing direct dependencies with event dispatch in those conditions can improve readabilty and reduce your maintenance burden.

Events and testability

As a general rule, I much prefer integration tests over unit tests and I try to use mocks sparingly. But sometimes you can’t avoid it. The rise of LLMs and agentic workflows in particular have forced me to mock things more often, because I want my tests to be deterministic, fast and cheap. None of those things are true of LLMs. For the same reason they help with cohesion over coupling, events can be useful here too.

In a close-coupled, no-events codebase, you can use dependency injection to pass mocks into whatever system you’re testing. But often this requires you to write large amounts of boilerplate code, because there are so many dependencies and they’re so deeply nested. The resulting tests can be bloated and hard to read, but even worse they can be quite brittle, requiring fixes when seemingly unrelated components change.

If your code is separated with events, this becomes easier and cleaner. All you need is a reference to the event emitter, then you can intercept whichever events you’re interested in and leave the others to continue as normal.

Putting the sin into single-threading

A common concern for people to have when they start out with Node.js is performance. How can a single-threaded process possibly scale to handle thousands of requests per second? The key to this is keeping the single-threadedness front-and-centre, then designing production infrastructure around it explicitly.

Firstly, avoid blocking IO. It should go without saying that blocking IO is an absolute performance-killer in a single-threaded application. Perhaps the most fundamental premise of this essay is that all your IO must be non-blocking. Be hyper-vigilant about it in your projects. That means linting for known blocking methods, testing for blocking behaviour where possible, regular load testing to check throughput, monitoring and alerting in production to make sure all processes are operating efficiently. It’s almost impossible to put too much effort into this. I’ve seen multiple cases where a performance issue in production, or even a full-blown outage, was caused by unintentional blocking IO.

Secondly, make your compute instances small: just one core and sufficient memory to handle the typical workfload of a single, fully-utilised core. Then set autoscaling parameters to provision and deprovision instances in line with CPU and memory usage. The perfect settings will depend on cold-start times and how bursty your traffic is, but autoscaling on 80% utilisation often works well for me. The best thing about small instances is how cheap they are. When traffic is low, you autoscale down to just a handful of instances (or even zero) to save money. As long as you’re not doing something silly on startup (like blocking IO or heavy dependency trees) your Node processes will start very quickly, so your cold start time for autoscaling back up is effectively the lowest bound allowed by your infrastructure.

With instances scaling up and down like this, you want to make sure they’re truly stateless. Sequential requests from a single user may be handled by different instances, so don’t hold any state outside the request object. Instead use a memory store like Redis as your synchronisation layer. Treat it as shared memory and use it for tasks such as caching, rate-limiting, distributed timers, fraud detection and so on. Redis implements some beautiful data structures, so look beyond vanilla GET/SET and use the most appropriate abstraction for whatever you want to do. In particular, sorted sets are a feature I’ve returned to time and again, for a wide range of uses.

Statelessness also extends to global variables. If a module has top-level state that’s set or unset asynchronously, it could cause a race condition or a resource leak. Consider this (bad) example of a global variable that is used to store a database connection pool:

let dbConnections: ConnectionPool;

async func init(options: DatabaseOptions) {
  // Bad: this is a race condition and possible resource leak
  if (!dbConnections) {
    dbConnections = await db.connect(options);
  }
}

The problem here is re-entrancy. If init is called concurrently, it might leak database connections. Remember that async means concurrency without parallelism. Just because a process is single-threaded, it doesn’t mean your code will run atomically. Every time you await, it’s an opportunity for another event to execute an identical code path as one that’s already running. So it’s better if you make functions pure, pass dependencies around locally, avoid global state and avoid side-effects.

If for some reason you absolutely can’t avoid global state, there is a pattern that can protect you. The key is to modify the global state first, before you perform the async operation:

let dbConnections: ConnectionPool;
let isConnecting: boolean;

async func init(options: DatabaseOptions) {
  // Less bad: not a race condition because `isConnecting` protects against re-entrancy
  if (!dbConnections && !isConnecting) {
    isConnecting = true;
    dbConnections = await db.connect(options);
  }
}

Here init can be invoked concurrently without problems. It’s less clean than a pure function would be, but single-threaded execution guarantees there will be no race conditions because the state is updated before control is passed to db.connect.

Conclusion

To recap:

Understand the Node internals. They’re conceptually simple.
Lean in to events as an architectural pattern.
Autoscale in production using single-core instances.
Never do blocking IO. Proactively check for it.
Avoid global state. The only state you need is the request object.
Use redis (or valkey) for synchronisation/shared memory.