Tutorial on WebAssembly and Rust: Flawless Audio Processing

With support across all modern web browsers, WebAssembly (Wasm) is revolutionizing web user experiences. It provides a streamlined binary format for running libraries, or entire programs, written in diverse programming languages directly within a web browser.

Developers are always seeking ways to boost productivity, such as:

  • Creating a single application codebase that performs seamlessly across multiple platforms
  • Delivering a smooth and visually appealing user experience on both desktop and mobile devices
  • Leveraging open-source libraries to prevent redundant development efforts

WebAssembly empowers front-end developers with all three, offering a path to web app UIs that rival native mobile or desktop experiences. It even enables the use of libraries written in languages beyond JavaScript, including C++ or Go!

This Wasm/Rust tutorial guides you through building a simple pitch-detector application, similar to a guitar tuner. This application utilizes the inherent audio capabilities of web browsers and maintains a consistent 60 frames per second (FPS) performance, even on mobile devices. Familiarity with JavaScript is assumed, but prior knowledge of the Web Audio API or Rust is not required.

Note: Currently, the Web Audio API technique employed in this article is not compatible with Firefox. Therefore, Chrome, Chromium, or Edge are recommended for this tutorial, despite Firefox’s strong Wasm and Web Audio API support.

Exploring This WebAssembly/Rust Tutorial

  • Learn to create a basic function in Rust and invoke it from JavaScript through WebAssembly
  • Utilize the browser’s modern AudioWorklet API for high-performance audio processing
  • Establish communication channels between JavaScript workers
  • Combine these elements within a minimal React application

Note: If you’re more interested in the practical implementation, feel free to skip ahead to the tutorial.

Why Choose Wasm?

Several compelling reasons justify incorporating WebAssembly into your projects:

  • Execute code within the browser that was written in virtually any programming language.
    • This includes integrating existing libraries (numerical, audio processing, machine learning, etc.) written in languages other than JavaScript.
  • Depending on the language, Wasm can achieve near-native execution speeds. This has the potential to bring web application performance much closer to native experiences for both mobile and desktop.

When Wasm Might Not Be the Best Fit

While WebAssembly’s popularity is undeniable, it’s not a one-size-fits-all solution for every web development scenario:

  • For simple projects, sticking with JavaScript, HTML, and CSS is likely to be faster.
  • Older browsers like Internet Explorer lack native Wasm support.
  • Using WebAssembly typically introduces additional tools, such as language compilers, into your development workflow. This might not align with teams prioritizing simplicity in their development and CI/CD processes.

Why This Wasm/Rust Tutorial?

Among the many languages that compile to Wasm, Rust was chosen for this example. Created by Mozilla in 2010, Rust’s popularity is steadily increasing. In Stack Overflow’s 2020 developer survey, Rust secured the top spot spot for “most-loved language.” However, its appeal for this Wasm tutorial extends beyond mere trends:

  • Rust’s small runtime minimizes the code transmitted to the browser, resulting in a smaller website footprint.
  • Excellent Wasm support in Rust ensures seamless interoperability with JavaScript.
  • Rust delivers near C/C++-level performance while maintaining a highly secure memory model. Unlike some languages, Rust’s compiler performs rigorous safety checks during compilation, greatly reducing the risk of crashes due to empty or uninitialized variables. This often translates to simpler error handling and a higher likelihood of preserving a positive UX, even when unexpected issues arise.
  • Rust is not garbage-collected. This grants Rust code full control over memory allocation and deallocation, leading to consistent performance—a crucial aspect in real-time applications.

Despite its benefits, Rust has a steep learning curve. Selecting the right programming language ultimately hinges on various factors, including the expertise and preferences of the development team responsible for maintaining the codebase.

Maintaining Smooth Web Apps with WebAssembly

Since we’re working with WebAssembly and Rust, how can we leverage Rust’s capabilities to achieve the performance gains that initially drew us to Wasm? For an application with a dynamic, frequently updating GUI to feel “smooth,” it needs to refresh the display in sync with the screen’s refresh rate. Typically, this is 60 FPS, meaning our application must redraw its user interface within approximately 16.7 ms (1,000 ms / 60 FPS).

Our application needs to detect and display the current pitch in real time, implying that the combined computation for detection and drawing should ideally remain within this 16.7 ms per-frame budget. In the upcoming section, we’ll explore how to leverage browser support for offloading audio analysis to a separate thread, allowing the main thread to proceed with its tasks. This separation is a significant performance win, as both computation and drawing then have the full 16.7 ms timeframe available.

Understanding Web Audio Fundamentals

Our application will utilize a high-performance WebAssembly audio module for pitch detection, ensuring this computation doesn’t burden the main thread.

Why not simplify things and perform pitch detection on the main thread?

  • Audio processing tends to be computationally intensive due to the sheer volume of samples processed each second. Reliable audio pitch detection, for instance, often involves analyzing the spectra of 44,100 samples per second.
  • JavaScript’s JIT compilation and garbage collection occur on the main thread, and we aim to prevent these from impacting the audio processing code to maintain consistent performance.
  • If audio frame processing consumes a significant portion of the 16.7 ms frame budget, the UX will suffer, resulting in choppy animations.
  • We want our app to run without hiccups, even on less powerful mobile devices.

Web Audio worklets enable applications to maintain a smooth 60 FPS because audio processing is offloaded and cannot impede the main thread. If audio processing lags, it might lead to audio delays, but the UX will remain responsive.

WebAssembly/Rust Tutorial: Getting Started

This tutorial requires Node.js and npx to be installed. If you don’t have npx, use npm (included with Node.js) to install it:

1
npm install -g npx

Setting Up Our Web App

We’ll use React for this Wasm/Rust tutorial.

In your terminal, execute the following commands:

1
2
npx create-react-app wasm-audio-app
cd wasm-audio-app

This utilizes npx to run the create-react-app command (provided by Facebook’s corresponding package) to set up a new React application in the wasm-audio-app directory.

The create-react-app CLI simplifies the process of generating React-based single-page applications (SPAs), making it easy to initiate new React projects. However, the generated project includes boilerplate code that needs modification.

While unit testing throughout development is recommended, it falls outside this tutorial’s scope. Therefore, we’ll remove src/App.test.js and src/setupTests.js.

Application Structure Overview

Our application will consist of five primary JavaScript components:

  • public/wasm-audio/wasm-audio.js: Houses JavaScript bindings for the Wasm module responsible for pitch detection.
  • public/PitchProcessor.js: This is where audio processing happens. It executes in the Web Audio rendering thread and interacts with the Wasm API.
  • src/PitchNode.js: Implements a Web Audio node, which connects to the Web Audio graph and runs on the main thread.
  • src/setupAudio.js: Uses web browser APIs to obtain access to an available audio recording device.
  • src/App.js and src/App.css: These files constitute the user interface of the application.
A flowchart for the pitch detection app. Blocks 1 and 2 run on the Web Audio thread. Block 1 is the Wasm (Rust) Pitch Detector, in the file wasm-audio/lib.rs. Block 2 is Web Audio Detection + Communication, in the file PitchProcessor.js. It asks the detector to initialize, and the detector sends detected pitches back to the Web Audio interface. Blocks 3, 4, and 5 run on the main thread. Block 3 is the Web Audio Controller, in the file PitchNode.js. It sends the Wasm module to PitchProcessor.js, and receives detected pitches from it. Block 4 is Web Audio Setup, in setupAudio.js. It creates a PitchNode object. Block 5 is the Web Application UI, comprised of App.js and App.css.  It calls setupAudio.js at startup. It also pauses or resumes audio recording by sending a message to PitchNode, from which it receives detected pitches to display to the user.
Wasm audio app overview.

Let’s dive directly into the core of our application and define the Rust code for the Wasm module. We’ll then proceed to code the different Web Audio-related JavaScript components and conclude with the UI.

1. Implementing Pitch Detection with Rust and WebAssembly

Our Rust code will determine the musical pitch from an array of audio samples.

Obtaining Rust

Follow the instructions on the these instructions to set up the Rust development environment.

Installing Tools for WebAssembly Development in Rust

wasm-pack enables building, testing, and publishing Rust-generated WebAssembly components. If you haven’t already, install wasm-pack.

We’ll utilize cargo-generate to bootstrap a basic audio analyzer in Rust, accessible through WebAssembly in the browser. cargo-generate streamlines creating new Rust projects by using existing Git repositories as templates.

Use the cargo tool that comes with your Rust installation to install cargo-generate:

1
cargo install cargo-generate

Once the installation completes (which may take a while), you’re ready to create the Rust project.

Creating the WebAssembly Module

From your app’s root directory, clone the project template:

1
$ cargo generate --git https://github.com/rustwasm/wasm-pack-template

When prompted for a project name, enter wasm-audio.

A Cargo.toml file should now be present in the wasm-audio directory, containing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
[package]
name = "wasm-audio"
version = "0.1.0"
authors = ["Your Name <you@example.com"]
edition = "2018"

[lib]
crate-type = ["cdylib", "rlib"]

[features]
default = ["console_error_panic_hook"]

[dependencies]
wasm-bindgen = "0.2.63"

...

Cargo.toml defines a Rust package (or “crate” in Rust terminology), similar to what package.json does for JavaScript applications.

The [package] section defines metadata that is used when publishing the package to the official package registry of Rust.

The [lib] section specifies the output format for the Rust compilation process. “cdylib” instructs Rust to generate a dynamic system library loadable from other languages (JavaScript in this case), while “rlib” includes a static library with metadata about the generated library. Although not strictly necessary for our purpose, “rlib” can be helpful when developing additional Rust modules that depend on this crate.

The [features] section enables the optional console_error_panic_hook feature, which translates Rust’s unhandled error mechanism (panic) into console errors viewable in browser developer tools for easier debugging.

Finally, [dependencies] lists the crates this package depends on. The only dependency initially present is wasm-bindgen, which automates generating JavaScript bindings for our Wasm module.

Pitch Detection in Rust

Our app aims to detect a musician’s voice or an instrument’s pitch in real time. To maximize speed, we’ll offload the pitch calculation to a WebAssembly module. We’ll use the “McLeod” pitch method, implemented in the existing Rust pitch-detection library, for single-voice pitch detection.

Similar to Node.js’s npm, Rust has its own package manager called Cargo. This simplifies installing packages from the Rust crate registry.

Add the pitch-detection dependency by modifying the Cargo.toml file, adding the relevant line to the dependencies section:

1
2
3
[dependencies]
wasm-bindgen = "0.2.63"
pitch-detection = "0.1"

This tells Cargo to download and install the pitch-detection dependency during the next cargo build, or in our case, the next wasm-pack build since we’re targeting WebAssembly.

Creating a JavaScript-Callable Pitch Detector in Rust

First, let’s add a file containing a utility function that will be useful later:

Create wasm-audio/src/utils.rs and paste the contents of the contents of this file into it.

Replace the generated code in wasm-audio/lib.rs with the following code, which performs pitch detection using a fast Fourier transform (FFT) algorithm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
use pitch_detection::{McLeodDetector, PitchDetector};
use wasm_bindgen::prelude::*;
mod utils;

#[wasm_bindgen]
pub struct WasmPitchDetector {
  sample_rate: usize,
  fft_size: usize,
  detector: McLeodDetector<f32>,
}

#[wasm_bindgen]
impl WasmPitchDetector {
  pub fn new(sample_rate: usize, fft_size: usize) -> WasmPitchDetector {
    utils::set_panic_hook();

    let fft_pad = fft_size / 2;

    WasmPitchDetector {
      sample_rate,
      fft_size,
      detector: McLeodDetector::<f32>::new(fft_size, fft_pad),
    }
  }

  pub fn detect_pitch(&mut self, audio_samples: Vec<f32>) -> f32 {
    if audio_samples.len() < self.fft_size {
      panic!("Insufficient samples passed to detect_pitch(). Expected an array containing {} elements but got {}", self.fft_size, audio_samples.len());
    }

    // Include only notes that exceed a power threshold which relates to the
    // amplitude of frequencies in the signal. Use the suggested default
    // value of 5.0 from the library.
    const POWER_THRESHOLD: f32 = 5.0;

    // The clarity measure describes how coherent the sound of a note is. For
    // example, the background sound in a crowded room would typically be would
    // have low clarity and a ringing tuning fork would have high clarity.
    // This threshold is used to accept detect notes that are clear enough
    // (valid values are in the range 0-1).
    const CLARITY_THRESHOLD: f32 = 0.6;

    let optional_pitch = self.detector.get_pitch(
      &audio_samples,
      self.sample_rate,
      POWER_THRESHOLD,
      CLARITY_THRESHOLD,
    );

    match optional_pitch {
      Some(pitch) => pitch.frequency,
      None => 0.0,
    }
  }
}

Let’s break down the code:

1
#[wasm_bindgen]

wasm_bindgen is a Rust macro that facilitates binding between JavaScript and Rust. When compiled to WebAssembly, this macro instructs the compiler to generate JavaScript bindings that act as lightweight wrappers for calling into and out of the Wasm module. This minimal abstraction layer, coupled with the shared memory between JavaScript and Wasm, contributes to Wasm’s impressive performance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#[wasm_bindgen]
pub struct WasmPitchDetector {
  sample_rate: usize,
  fft_size: usize,
  detector: McLeodDetector<f32>,
}

#[wasm_bindgen]
impl WasmPitchDetector {
...
}

Rust doesn’t have the concept of classes. Instead, an object’s the data of an object is defined using a struct, and its its behaviour are defined through impl blocks or traits.

We expose the pitch-detection functionality through an object rather than a simple function to initialize the internal data structures of the McLeodDetector only once during the WasmPitchDetector creation. This approach avoids costly memory allocation during operation, ensuring the detect_pitch function remains fast.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
pub fn new(sample_rate: usize, fft_size: usize) -> WasmPitchDetector {
  utils::set_panic_hook();

  let fft_pad = fft_size / 2;

  WasmPitchDetector {
    sample_rate,
    fft_size,
    detector: McLeodDetector::<f32>::new(fft_size, fft_pad),
  }
}

The panic! macro in Rust signals an unrecoverable error, prompting Rust to report the error and terminate the application. Using panic! can be particularly beneficial during early development before error handling is fully implemented, as it helps identify incorrect assumptions quickly.

Calling utils::set_panic_hook() once during setup guarantees that panic messages appear in the browser’s developer tools.

Next, we define fft_pad, representing the zero-padding applied to each analysis FFT. Padding, along with the windowing function employed by the algorithm, helps smooth the results as the analysis moves across the incoming audio samples. A pad length equal to half the FFT length is suitable for many instruments.

Finally, Rust automatically returns the result of the last statement. Consequently, the WasmPitchDetector struct statement serves as the return value for new().

The remaining Rust code in the impl WasmPitchDetector block defines the API for pitch detection:

1
2
3
pub fn detect_pitch(&mut self, audio_samples: Vec<f32>) -> f32 {
  ...
}

This snippet demonstrates a member function definition in Rust. It adds a public detect_pitch member function to WasmPitchDetector. The first argument is a mutable reference (&mut) to an instance of the same type, automatically passed when the function is called.

Additionally, our member function accepts an array of 32-bit floating-point numbers of arbitrary size and returns a single number representing the calculated pitch in Hz.

1
2
3
if audio_samples.len() < self.fft_size {
  panic!("Insufficient samples passed to detect_pitch(). Expected an array containing {} elements but got {}", self.fft_size, audio_samples.len());
}

This part checks if enough samples are provided for a valid pitch analysis. If not, the panic! macro is invoked, terminating Wasm execution and logging the error message to the browser console.

1
2
3
4
5
6
let optional_pitch = self.detector.get_pitch(
  &audio_samples,
  self.sample_rate,
  POWER_THRESHOLD,
  CLARITY_THRESHOLD,
);

This invokes the third-party library to calculate the pitch from the provided audio samples. You can adjust POWER_THRESHOLD and CLARITY_THRESHOLD to fine-tune the algorithm’s sensitivity.

The function implicitly returns a floating-point value using the match keyword, which functions similarly to a switch statement in other languages. Some() and None allow us to gracefully handle cases without encountering null pointer exceptions.

Building WebAssembly Applications

While cargo build is typically used to build Rust applications, we’ll use wasm-pack for generating our Wasm module as it offers a more convenient syntax for targeting Wasm. Additionally, it allows publishing the resulting JavaScript bindings to the npm registry, which is beyond the scope of this tutorial.

wasm-pack supports various build targets. We’ll use the web target since we’re consuming the module directly from a Web Audio worklet. Other targets include bundlers like webpack or Node.js environments. Execute the following command from the wasm-audio/ subdirectory:

1
wasm-pack build --target web

If successful, this creates an npm module under ./pkg.

This newly generated module comes with its own auto-generated package.json file. You could publish this to the npm registry. For simplicity, we’ll directly copy this pkg directory into our project under the public/wasm-audio folder:

1
cp -R ./wasm-audio/pkg ./public/wasm-audio

This completes our Rust Wasm module, which is now ready to be used by our web application, specifically by PitchProcessor.

2. Implementing the PitchProcessor Class (Extending Native AudioWorkletProcessor)

This application leverages a recently standardized audio-processing approach with broad browser support: the Web Audio API. We’ll perform computationally intensive tasks within a custom AudioWorkletProcessor and create a corresponding custom AudioWorkletNode class (named PitchNode) to bridge back to the main thread.

Create a new file called public/PitchProcessor.js and add the following code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
import init, { WasmPitchDetector } from "./wasm-audio/wasm_audio.js";

class PitchProcessor extends AudioWorkletProcessor {
  constructor() {
    super();

    // Initialized to an array holding a buffer of samples for analysis later -
    // once we know how many samples need to be stored. Meanwhile, an empty
    // array is used, so that early calls to process() with empty channels
    // do not break initialization.
    this.samples = [];
    this.totalSamples = 0;

    // Listen to events from the PitchNode running on the main thread.
    this.port.onmessage = (event) => this.onmessage(event.data);

    this.detector = null;
  }

  onmessage(event) {
    if (event.type === "send-wasm-module") {
      // PitchNode has sent us a message containing the Wasm library to load into
      // our context as well as information about the audio device used for
      // recording.
      init(WebAssembly.compile(event.wasmBytes)).then(() => {
        this.port.postMessage({ type: 'wasm-module-loaded' });
      });
    } else if (event.type === 'init-detector') {
      const { sampleRate, numAudioSamplesPerAnalysis } = event;

      // Store this because we use it later to detect when we have enough recorded
      // audio samples for our first analysis.
      this.numAudioSamplesPerAnalysis = numAudioSamplesPerAnalysis;

      this.detector = WasmPitchDetector.new(sampleRate, numAudioSamplesPerAnalysis);

      // Holds a buffer of audio sample values that we'll send to the Wasm module
      // for analysis at regular intervals.
      this.samples = new Array(numAudioSamplesPerAnalysis).fill(0);
      this.totalSamples = 0;
    }
  };

  process(inputs, outputs) {
    // inputs contains incoming audio samples for further processing. outputs
    // contains the audio samples resulting from any processing performed by us.
    // Here, we are performing analysis only to detect pitches so do not modify
    // outputs.

    // inputs holds one or more "channels" of samples. For example, a microphone
    // that records "in stereo" would provide two channels. For this simple app,
    // we use assume either "mono" input or the "left" channel if microphone is
    // stereo.

    const inputChannels = inputs[0];

    // inputSamples holds an array of new samples to process.
    const inputSamples = inputChannels[0];

    // In the AudioWorklet spec, process() is called whenever exactly 128 new
    // audio samples have arrived. We simplify the logic for filling up the
    // buffer by making an assumption that the analysis size is 128 samples or
    // larger and is a power of 2.
    if (this.totalSamples < this.numAudioSamplesPerAnalysis) {
      for (const sampleValue of inputSamples) {
        this.samples[this.totalSamples++] = sampleValue;
      }
    } else {
      // Buffer is already full. We do not want the buffer to grow continually,
      // so instead will "cycle" the samples through it so that it always
      // holds the latest ordered samples of length equal to
      // numAudioSamplesPerAnalysis.

      // Shift the existing samples left by the length of new samples (128).
      const numNewSamples = inputSamples.length;
      const numExistingSamples = this.samples.length - numNewSamples;
      for (let i = 0; i < numExistingSamples; i++) {
        this.samples[i] = this.samples[i + numNewSamples];
      }
      // Add the new samples onto the end, into the 128-wide slot vacated by
      // the previous copy.
      for (let i = 0; i < numNewSamples; i++) {
        this.samples[numExistingSamples + i] = inputSamples[i];
      }
      this.totalSamples += inputSamples.length;
    }

    // Once our buffer has enough samples, pass them to the Wasm pitch detector.
    if (this.totalSamples >= this.numAudioSamplesPerAnalysis && this.detector) {
      const result = this.detector.detect_pitch(this.samples);

      if (result !== 0) {
        this.port.postMessage({ type: "pitch", pitch: result });
      }
    }

    // Returning true tells the Audio system to keep going.
    return true;
  }
}

registerProcessor("PitchProcessor", PitchProcessor);

The PitchProcessor works in tandem with PitchNode but operates on a separate thread to avoid blocking the main thread during audio processing.

Here are the PitchProcessor’s primary responsibilities:

  • Handles the "send-wasm-module" event from PitchNode by compiling and loading the Wasm module within the worklet. It then notifies PitchNode of completion by sending a "wasm-module-loaded" event. This asynchronous callback-based communication is necessary due to the thread boundary between PitchNode and PitchProcessor.
  • Responds to the "init-detector" event from PitchNode by configuring the WasmPitchDetector.
  • Processes audio samples received from the browser’s audio graph, delegates pitch detection to the Wasm module, and sends back any detected pitch to PitchNode, which then forwards it to the React layer through its onPitchDetectedCallback.
  • Registers itself under a unique name. This allows the browser to identify and instantiate our PitchProcessor later when PitchNode is constructed, guided by PitchNode’s base class, the native AudioWorkletNode. This process is further illustrated in setupAudio.js.

The following diagram illustrates the event flow between PitchNode and PitchProcessor:

A more detailed flowchart comparing the interactions between the PitchNode and PitchProcess objects at runtime. During initial setup, PitchNode sends the Wasm module as an array of bytes to PitchProcessor, which compiles them and sends them back to PitchNode, which finally responds with an event message requesting that PitchProcessor initialize itself. While recording audio, PitchNode sends nothing, and receives two types of event messages from PitchProcessor: A detected pitch or an error, if one occurs from either Wasm or the worklet.
Runtime event messages.

3. Integrating Web Audio Worklet Code

The PitchNode.js file provides an interface to our custom pitch-detection audio processing. The PitchNode object facilitates the delivery of detected pitches from the WebAssembly module operating within the AudioWorklet thread to the main thread and subsequently to React for rendering.

In src/PitchNode.js, we’ll subclass the built-in AudioWorkletNode from the Web Audio API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
export default class PitchNode extends AudioWorkletNode {
  /**
   * Initialize the Audio processor by sending the fetched WebAssembly module to
   * the processor worklet.
   *
   * @param {ArrayBuffer} wasmBytes Sequence of bytes representing the entire
   * WASM module that will handle pitch detection.
   * @param {number} numAudioSamplesPerAnalysis Number of audio samples used
   * for each analysis. Must be a power of 2.
   */
  init(wasmBytes, onPitchDetectedCallback, numAudioSamplesPerAnalysis) {
    this.onPitchDetectedCallback = onPitchDetectedCallback;
    this.numAudioSamplesPerAnalysis = numAudioSamplesPerAnalysis;

    // Listen to messages sent from the audio processor.
    this.port.onmessage = (event) => this.onmessage(event.data);

    this.port.postMessage({
      type: "send-wasm-module",
      wasmBytes,
    });
  }

  // Handle an uncaught exception thrown in the PitchProcessor.
  onprocessorerror(err) {
    console.log(
      `An error from AudioWorkletProcessor.process() occurred: ${err}`
    );
  };

  onmessage(event) {
    if (event.type === 'wasm-module-loaded') {
      // The Wasm module was successfully sent to the PitchProcessor running on the
      // AudioWorklet thread and compiled. This is our cue to configure the pitch
      // detector.
      this.port.postMessage({
        type: "init-detector",
        sampleRate: this.context.sampleRate,
        numAudioSamplesPerAnalysis: this.numAudioSamplesPerAnalysis
      });
    } else if (event.type === "pitch") {
      // A pitch was detected. Invoke our callback which will result in the UI updating.
      this.onPitchDetectedCallback(event.pitch);
    }
  }
}

PitchNode performs the following crucial tasks:

  • Sends the WebAssembly module as a byte stream, received from setupAudio.js, to the PitchProcessor running in the AudioWorklet thread. This allows the PitchProcessor to load the pitch-detection Wasm module.
  • Handles the event emitted by PitchProcessor upon successful Wasm compilation and sends another event to configure pitch detection.
  • Receives detected pitches from the PitchProcessor and forwards them to the UI function setLatestPitch() via onPitchDetectedCallback().

Note: This code executes on the main thread, so refrain from performing extensive processing on detected pitches within this object to prevent frame rate drops.

4. Setting Up Web Audio

To enable our web application to access and process live microphone input from the client machine, we need to:

  1. Obtain user permission for browser microphone access
  2. Access the microphone output as an audio stream object
  3. Attach code to process the incoming audio stream, producing a sequence of detected pitches

We’ll accomplish these steps in src/setupAudio.js. Additionally, we’ll asynchronously load the Wasm module to initialize our PitchNode before attaching it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import PitchNode from "./PitchNode";

async function getWebAudioMediaStream() {
  if (!window.navigator.mediaDevices) {
    throw new Error(
      "This browser does not support web audio or it is not enabled."
    );
  }

  try {
    const result = await window.navigator.mediaDevices.getUserMedia({
      audio: true,
      video: false,
    });

    return result;
  } catch (e) {
    switch (e.name) {
      case "NotAllowedError":
        throw new Error(
          "A recording device was found but has been disallowed for this application. Enable the device in the browser settings."
        );

      case "NotFoundError":
        throw new Error(
          "No recording device was found. Please attach a microphone and click Retry."
        );

      default:
        throw e;
    }
  }
}

export async function setupAudio(onPitchDetectedCallback) {
  // Get the browser audio. Awaits user "allowing" it for the current tab.
  const mediaStream = await getWebAudioMediaStream();

  const context = new window.AudioContext();
  const audioSource = context.createMediaStreamSource(mediaStream);

  let node;

  try {
    // Fetch the WebAssembly module that performs pitch detection.
    const response = await window.fetch("wasm-audio/wasm_audio_bg.wasm");
    const wasmBytes = await response.arrayBuffer();

    // Add our audio processor worklet to the context.
    const processorUrl = "PitchProcessor.js";
    try {
      await context.audioWorklet.addModule(processorUrl);
    } catch (e) {
      throw new Error(
        `Failed to load audio analyzer worklet at url: ${processorUrl}. Further info: ${e.message}`
      );
    }

    // Create the AudioWorkletNode which enables the main JavaScript thread to
    // communicate with the audio processor (which runs in a Worklet).
    node = new PitchNode(context, "PitchProcessor");

    // numAudioSamplesPerAnalysis specifies the number of consecutive audio samples that
    // the pitch detection algorithm calculates for each unit of work. Larger values tend
    // to produce slightly more accurate results but are more expensive to compute and
    // can lead to notes being missed in faster passages i.e. where the music note is
    // changing rapidly. 1024 is usually a good balance between efficiency and accuracy
    // for music analysis.
    const numAudioSamplesPerAnalysis = 1024;

    // Send the Wasm module to the audio node which in turn passes it to the
    // processor running in the Worklet thread. Also, pass any configuration
    // parameters for the Wasm detection algorithm.
    node.init(wasmBytes, onPitchDetectedCallback, numAudioSamplesPerAnalysis);

    // Connect the audio source (microphone output) to our analysis node.
    audioSource.connect(node);

    // Connect our analysis node to the output. Required even though we do not
    // output any audio. Allows further downstream audio processing or output to
    // occur.
    node.connect(context.destination);
  } catch (err) {
    throw new Error(
      `Failed to load audio analyzer WASM module. Further info: ${err.message}`
    );
  }

  return { context, node };
}

This assumes the WebAssembly module is located at public/wasm-audio, which we set up earlier in the Rust section.

5. Creating the Application UI

Let’s define a basic user interface for our pitch detector. Replace the contents of src/App.js with the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import React from "react";
import "./App.css";
import { setupAudio } from "./setupAudio";

function PitchReadout({ running, latestPitch }) {
  return (
    <div className="Pitch-readout">
      {latestPitch
        ? `Latest pitch: ${latestPitch.toFixed(1)} Hz`
        : running
        ? "Listening..."
        : "Paused"}
    </div>
  );
}

function AudioRecorderControl() {
  // Ensure the latest state of the audio module is reflected in the UI
  // by defining some variables (and a setter function for updating them)
  // that are managed by React, passing their initial values to useState.

  // 1. audio is the object returned from the initial audio setup that
  //    will be used to start/stop the audio based on user input. While
  //    this is initialized once in our simple application, it is good
  //    practice to let React know about any state that _could_ change
  //    again.
  const [audio, setAudio] = React.useState(undefined);

  // 2. running holds whether the application is currently recording and
  //    processing audio and is used to provide button text (Start vs Stop).
  const [running, setRunning] = React.useState(false);

  // 3. latestPitch holds the latest detected pitch to be displayed in
  //    the UI.
  const [latestPitch, setLatestPitch] = React.useState(undefined);

  // Initial state. Initialize the web audio once a user gesture on the page
  // has been registered.
  if (!audio) {
    return (
      <button
        onClick={async () => {
          setAudio(await setupAudio(setLatestPitch));
          setRunning(true);
        }}
      >
        Start listening
      </button>
    );
  }

  // Audio already initialized. Suspend / resume based on its current state.
  const { context } = audio;
  return (
    <div>
      <button
        onClick={async () => {
          if (running) {
            await context.suspend();
            setRunning(context.state === "running");
          } else {
            await context.resume();
            setRunning(context.state === "running");
          }
        }}
        disabled={context.state !== "running" && context.state !== "suspended"}
      >
        {running ? "Pause" : "Resume"}
      </button>
      <PitchReadout running={running} latestPitch={latestPitch} />
    </div>
  );
}

function App() {
  return (
    <div className="App">
      <header className="App-header">
        Wasm Audio Tutorial
      </header>
      <div className="App-content">
        <AudioRecorderControl />
      </div>
    </div>
  );
}

export default App;

And replace App.css with some basic styles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
.App {
  display: flex;
  flex-direction: column;
  align-items: center;
  text-align: center;
  background-color: #282c34;
  min-height: 100vh;
  color: white;
  justify-content: center;
}

.App-header {
  font-size: 1.5rem;
  margin: 10%;
}

.App-content {
  margin-top: 15vh;
  height: 85vh;
}

.Pitch-readout {
  margin-top: 5vh;
  font-size: 3rem;
}

button {
  background-color: rgb(26, 115, 232);
  border: none;
  outline: none;
  color: white;
  margin: 1em;
  padding: 10px 14px;
  border-radius: 4px;
  width: 190px;
  text-transform: capitalize;
  cursor: pointer;
  font-size: 1.5rem;
}

button:hover {
  background-color: rgb(45, 125, 252);
}

With that, our app should be ready to run, but there’s a potential issue we need to address first.

WebAssembly/Rust Tutorial: Addressing a Final Hurdle

Upon running yarn followed by yarn start, switching to the browser (using Chrome or Chromium with developer tools open), and attempting to record audio, we encounter errors:

At wasm_audio.js line 24 there's the error, "Uncaught ReferenceError: TextDecoder is not defined," followed by one at setupAudio.js line 84 triggered by the async onClick from App.js line 43, which reads, "Uncaught (in promise) Error: Failed to load audio analyzer WASM module. Further info: Failed to construct 'AudioWorkletNode': AudioWorkletNode cannot be created: The node name 'PitchProcessor' is not defined in AudioWorkletGlobalScope."
Wasm requirements have wide support—just not yet in the Worklet spec.

The first error, TextDecoder is not defined, arises when the browser attempts to execute wasm_audio.js. This, in turn, prevents the Wasm JavaScript wrapper from loading, leading to the second error in the console.

The root cause is that the Wasm package generator in Rust expects TextDecoder (and TextEncoder) to be globally available. This assumption holds true for modern browsers when the Wasm module runs on the main thread or even a worker thread. However, worklets (like the AudioWorklet context used here) don’t yet include TextDecoder and TextEncoder in their specification and are therefore unavailable.

The Rust Wasm code generator relies on TextDecoder to convert the flat, packed, shared-memory representation of Rust strings into the string format JavaScript uses. In essence, to handle strings produced by the Wasm code generator, we need to TextEncoder and TextDecoder must be defined.

This issue highlights the relative novelty of WebAssembly. As browser support matures and common WebAssembly patterns are natively supported, such issues are likely to disappear.

For now, we can work around this by providing a polyfill for TextDecoder.

Create a new file named public/TextEncoder.js and import it into public/PitchProcessor.js:

1
import "./TextEncoder.js";

Ensure this import statement precedes the wasm_audio import.

Finally, paste the contents of this implementation into TextEncoder.js (code courtesy of @Yaffle on GitHub).

The Firefox Question

As mentioned earlier, the way we combine Wasm with Web Audio worklets in our app will not work in Firefox. Even with the above shim, clicking the “Start Listening” button will result in this:

Unhandled Rejection (Error): Failed to load audio analyzer WASM module. Further info: Failed to load audio analyzer worklet at url: PitchProcessor.js. Further info: The operation was aborted.
    

That’s because Firefox doesn’t yet support importing modules from AudioWorklets—for us, that’s PitchProcessor.js running in the AudioWorklet thread.

Our Completed Application

Reload the page. The app should now load without errors. Click “Start Listening” and grant browser permission to access your microphone. You’ll now have a functional, albeit basic, pitch detector written in JavaScript using Wasm:

A screenshot of the app showing its title, "Wasm Audio Tutorial," a blue button with the word Pause on it, and the text "Latest pitch: 1380.1 Hz" underneath that.
Real-time pitch detection.

Programming with WebAssembly and Rust: A Real-time Web Audio Solution

This tutorial guided you through building a web application from scratch that performs computationally demanding audio processing using WebAssembly. WebAssembly allowed us to leverage Rust’s near-native performance for pitch detection. Furthermore, this processing was offloaded to a separate thread, freeing the main JavaScript thread to concentrate on rendering and ensuring smooth frame rates, even on mobile devices.

Key Takeaways: Wasm/Rust and Web Audio

  • Modern browsers provide high-performance APIs for capturing and processing audio (and video) within web applications.
  • Rust’s combination of performance, safety, and great tooling for Wasm makes it a strong contender for projects integrating WebAssembly.
  • Wasm enables efficient execution of compute-intensive tasks within the browser.

While advantageous, Wasm also presents a couple of challenges:

  • Tooling for Wasm within worklets is still under development. As seen in our example, we had to implement our own versions of TextEncoder and TextDecoder for communication between JavaScript and Wasm within the AudioWorklet context. Additionally, importing JavaScript bindings for Wasm modules within an AudioWorklet is not yet supported in Firefox.
  • Although our application was relatively simple, setting up and loading the WebAssembly module from the AudioWorklet involved a fair amount of configuration. Integrating Wasm into projects does introduce additional tooling complexity, which is crucial to consider.

For your convenience, this GitHub repo contains the complete project code. If you’re interested in back-end development, you might also find it valuable to explore using Rust with WebAssembly within Node.js.

Licensed under CC BY-NC-SA 4.0