Skip to main content

Node.js multithreading with worker threads: pros and cons

著者:
James Walker
James Walker

2023年2月27日

0 分で読めます

Node.js presents a single-threaded event loop to your application, which allows CPU-bound operations to block the main thread and create delays. The worker_threads module addresses this problem by providing a mechanism for running code in parallel using a form of threading.

In a previous post, you learned what worker threads are, their common use cases, and how to add them to your project. In this article, we'll look at the pitfalls of worker threads and how they differ from the multithreading implementations in other programming languages. We'll also tour five prominent libraries that make the worker_threads module easier to use.

Worker thread pitfalls and gotchas

The benefits of worker threads can be easily summarized: They're the only way to get something similar to multithreading when you're programming with Node.js. CPU-intensive operations, background processing, and any parallel code execution, other than async I/O, can be implemented using worker threads.

The module and the concept it implements come with several caveats though. You need to be aware of these before you start implementing your workers, as some situations shouldn't be parallelized with this mechanism.

Worker threads aren't true threads

The first and most prominent restriction of worker threads is that they aren't threads in the conventional sense. Truly multithreaded applications allow concurrently executing multiple threads that share the same state by default. In Node.js, memory updated in one thread will not be visible to the others, and implementing multithreaded code requires careful memory management to prevent race conditions.

Node.js worker threads operate independently of the JavaScript code in the main process. They work by spawning an isolated instance of Node's V8 JavaScript runtime. The new runtime can then be used to execute a JavaScript file out of the main event loop.

Since the file is executed in this way, there's no implicit memory sharing between the main program and the worker "thread." Instead, an event-based messaging system is provided so values can be exchanged between the processes.

1javascript
2const {
3    Worker,
4    isMainThread,
5    parentPort,
6    workerData
7} = require("worker_threads");
8
9if (isMainThread) {
10    const worker = new Worker(__filename, {workerData: "hello"});
11    worker.on("message", msg => console.log(`Worker message received: ${msg}`));
12    worker.on("error", err => console.error(error));
13    worker.on("exit", code => console.log(`Worker exited with code ${code}.`));
14}
15else {
16    const data = workerData;
17    parentPort.postMessage(`You said \"${data}\".`);
18}

This code, which you looked at in detail in part one, creates a worker process that receives a value (hello) from the main thread and sends it back in a different form (You said "hello".). The postMessage() function is used to send data to the opposite end of the main-worker thread divide. Variables set on one side aren't visible to the other.

There is an exception to this rule: you can use a SharedArrayBuffer to directly share memory between the threads by specifically allocating it as shared memory:

1javascript
2const {Worker, isMainThread, parentPort} = require("worker_threads");
3
4// Allocate memory for 4 integers
5const sab = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * 4);
6const arr = new Int32Array(sab);
7
8if (isMainThread) {
9    const worker = new Worker(__filename, {workerData: arr});
10    worker.on("message", msg => {
11   	 if (msg.type === "update") {
12   		 console.log(arr);
13   	 }
14    });
15    worker.postMessage({type: "init", arr});
16}
17else {
18    parentPort.on("message", msg => {
19   	 if (msg.type === "init") {
20   		 msg.arr[0] = 1001;
21   		 parentPort.postMessage({type: "update"});
22   	 }
23    });
24}

Saving this code to shared.js and executing the file emits the following output:

1$ node shared.js
2Int32Array(4) [ 1001, 0, 0, 0 ]

This works because the worker thread can access the shared memory region created by the main thread. When the main thread later inspects the data, it sees the changes written by the worker. This still isn't true state-sharing, because you have to manually make the array available to the worker using its workerData constructor option.

Spawning too many worker threads is expensive

Creating a worker thread is not the same as spawning a new thread in a multithreaded language. Each worker thread runs its own instance of the V8 JavaScript engine, so using too many workers will consume significant resources on your host.

Although workers start up quickly, there's always an associated overhead. It's a relatively expensive operation that renders worker threads unsuitable for lightweight operations. They're best reserved for parallel processing CPU-bound activities where the performance savings will easily outweigh the process spawn cost.

You can mitigate the inefficiencies by reusing a pool of worker threads, which allows you to avoid repeatedly incurring the expense of creating new ones. Libraries such as Piscina and Poolifier abstract away the complexity of managing a worker pool.

Using worker threads for I/O is wasteful

The nature of worker threads means they're unsuitable for I/O tasks. You don't need worker threads to read a file or fetch data over the network — better async alternatives are already built into Node.js.

The worker_threads documentation specifically advises against using the module for these situations. The expense of creating and maintaining the worker's process with its own V8 engine is much less efficient than Node's async I/O implementations. You'll end up harming performance, wasting resources, and writing redundant code if you implement these tasks as worker threads.

Debugging worker threads can be challenging

Pooled worker threads can be challenging to debug because there's not always a clear link between an event, the worker it's handled by, and the effect that's created. Trying to debug what's happening using console.log() statements is tedious and error-prone.

You can produce more useful diagnostic information by attaching an AsyncResource to your pool. This provides full async stack traces that track what's happening inside the pool, allowing you to see the full sequence of activities that lead up to an effect occurring.

Sharing memory using a SharedArrayBuffer also creates opportunities for problems to arise. You must use atomics or implement your own concurrency management system to prevent race conditions when accessing and modifying the shared memory. If race conditions do occur, they can cause strange symptoms in your application, and are often hard to identify, especially when they relate to memory that's used in many different places.

Top Node.js threading libraries

Node's built-in worker_threads module focuses on the basics of creating worker threads and exchanging data with them. Here are five popular libraries that wrap the module to provide a more convenient interface or higher-level features, such as thread pooling.

Piscina

Piscina makes it easier to work with pools of workers. You can create your own task queues, track their completion, and cancel a task executing on a worker if it turns out to be redundant.

Here's a simple Piscina example. Save this code to main.js:

1javascript
2const Piscina = require("piscina");
3const piscina = new Piscina({filename: __dirname + "/worker.js"});
4
5(async function() {
6    const result = await piscina.run({msg: "hello"});
7    console.log(result);    // "You said hello"
8})();

Now add this code to worker.js:

1javascript
2module.exports = ({msg}) => `You said ${msg}`;

Install the Piscina package with the following command:

1$ npm install piscina

When you run node main.js, you'll see You said hello appear in your terminal. Piscina provides a more convenient interface around the worker threads API.

Bree

Bree is a job scheduler for Node.js. It lets you execute async tasks at a specified interval. You can configure each task with concurrency limits, retry support, and cancellation. Bree uses worker threads internally to run task code outside the main loop.

Install Bree using npm:

1$ npm install bree

Now create a file called bree-main.js with the following code:

1javascript
2const Bree = require("bree");
3
4const bree = new Bree({
5    jobs: [
6   	 {
7   		 name: "bree-job",
8   		 interval: "5s",
9   		 timeout: 0
10   	 }
11    ]
12});
13
14(async () => {
15    await bree.start();
16})();

Add the following code to jobs/bree-job.js:

1javascript
2const d = new Date();
3console.log(`The time is ${d.getHours()}:${d.getMinutes()}:${d.getSeconds()}`);

Running node bree-main.js will emit the time immediately and then every five seconds:

1Worker for job "bree-job" online
2The time is 11:45:30
3Worker for job "bree-job" exited with code 0
4Worker for job "bree-job" online
5The time is 11:45:35
6Worker for job "bree-job" exited with code 0

Poolifier

Poolifier is another worker pool implementation. It lets you handle multiple workers without the complexity of managing the pool yourself. Pools can be either fixed, meaning they contain a set number of reused workers, or dynamic, which means workers are added as required until the user-configured limit is reached.

You can create a simple pool to run a specified file in a worker thread by adding the following code to main.js:

1javascript
2const {FixedThreadPool, DynamicThreadPool} = require("poolifier");
3
4// 4 fixed workers
5const fixedPool = new FixedThreadPool(4, __dirname + "/fixed-worker.js");
6fixedPool.execute({}).then(res => {
7    console.log(res);
8}).catch(e => {
9    console.error(e);
10});
11
12// Between 2 and 12 workers, dynamically
13const dynamicPool = new DynamicThreadPool(2, 12, __dirname + "/dynamic-worker.js");
14dynamicPool.execute({}).then(res => {
15    console.log(res);
16}).catch(e => {
17    console.error(e);
18});

Define the code to run in the fixed pool by adding the following content in fixed-worker.js:

1javascript
2const {ThreadWorker} = require("poolifier");
3
4module.exports = new ThreadWorker(
5    () => "Running in the fixed thread pool",
6    {
7   	 async: false
8    }
9);

Now add the code to run in the dynamic pool to dynamic-worker.js:

1javascript
2const {ThreadWorker} = require("poolifier");
3
4module.exports = new ThreadWorker(
5    () => "Running in the dynamic thread pool",
6    {
7   	 async: false
8    }
9);

Install the poolifier package from npm, and then run main.js with Node. You should see the following output as both thread pools start and run their jobs:

1Running in the fixed thread pool
2Running in the dynamic thread pool

The process will run until you terminate it by pressing Ctrl+C. Poolifier keeps the thread pools available to handle new tasks, blocking the process from exiting while some pools still exist.

Worker threads vs. other programming languages

Different programming languages implement multithreading in varying ways. In the case of Node.js, it's the multiprocess system provided by the worker_threads module. Here's what some other languages offer.

  • C/C++: As low-level languages, C and C++ both feature true multithreading via the pthreads POSIX threading library. C++ also has a thread object in its standard namespace for even simpler concurrency. You're responsible for using atomics and mutexes to properly synchronize memory to avoid race conditions.

  • Java: Java also has real multithreading using the Thread class or its Runnable interface. These are supported by a comprehensive concurrency suite that helps you manage the threads you've created.

  • Python: Python has a threading library, but the most popular Python language implementation, CPython, can only execute one thread at a time. This means that although threading seems to be available, it will not speed up CPU-intensive code. Python also offers the multiprocessing module, which uses a similar approach to Node.js worker threads.

  • Ruby: Ruby has a similar situation to Python. The language has comprehensive thread support but MRI Ruby, the most popular implementation, only supports one thread at a time. Newer alternative interpreters, such as JRuby and Rubinius, do have true multithreading support.

  • Rust: Rust has comprehensive multithreading support. Rust lets you easily create threads and share data between them with a low risk of errors. The language's design renders many common concurrency bugs impossible, so it's a great choice for projects that will heavily rely on multithreading.

Threading in C/C++, Rust, and Java will be much quicker than the subprocess model of Node.js. These languages expose real threads, with shared state and all the memory management concerns it brings. Higher-level interpreted languages like Python, Ruby, and Node.js don't offer native thread implementations, instead using heavier worker subprocess solutions.

Wrapping up

Worker threads give Node.js developers a way to run code in parallel by starting new child processes. This isn't real multithreading, however each "thread" is an independent process that lacks access to its parent's context. Communication between threads is only possible using allocated shared memory and messages exchanged via an event listener.

The worker_threads module is still an invaluable part of the Node.js ecosystem. There's no other way of achieving multithreading and parallel processing within the confines JavaScript imposes as a synchronous blocking language. However, it's important to recognize the limitations of worker threads so you can make an informed choice about when to use them. Adding a worker in the wrong situation could reduce your app's performance and increase resource utilization.

Highly CPU-intensive code where performance is critical will run more performantly when using real threads in another programming language. Worker threads are sufficient for most Node.js use cases, though, such as job queues in web apps or background video processing on your desktop.