React Instrument Tuner with Machine Learning

Thumbnail cover image React Instrument Tuner with Machine Learning

Try it!

Github

I'm also a musician, so when considering a challenging project for myself, I felt that there were no reliable and modern instrument tuners for the web. There are mobile apps galore to solve this issue, but for some reason an accurate web based tuner was missing, despite there being a wealth of open source tools to help build one.

I am open to any and all constructive criticism so please leave an issue/contact me on github or email me!

Goals and Scope

The goals for this iteration of my app were as follows:

  • Implement an instrument tuner in "equal temperament" in A 440hz.
  • There are a plethora of temperaments out there, and my choice here is because equal temperament is the primary type used in modern western music, and limiting the scope of work to include equal temperament only lessened the amount of initial work significantly.
  • Equal temperament calculates each note based on a defined starting point. In most modern genres that starting point is 440hz for middle A. Limiting the project to only this option initially will help get everything off the ground quicker. I feel that adding other choices here, for example A 442hz (a popular standard in concert halls and orchestras), would be an ideal next step. I may even add A 432hz! The conspiracy theorists love that one! A = 432Hz!
  • Implement a fast and accurate algorithm for determining current frequency.
  • A note played on any given instrument doesn't simply produce the fundamental frequency of that note - that would be much too easy! Every instrument and voice produces a unique range of frequencies that create the sound we are used to hearing. A saxophone’s frequency spectrum playing the note A looks hugely different to an opera singer singing A. And even Frank Sinatra's A looks different to Dean Martin's. But we still need to be able somehow determine its intonation in an objective way! So using a well researched method to determine the fundamental pitch being played is extremely important.
  • Build it in React!
  • I am currently enjoying using React in other applications and my for own learning - this site is built with NextJs, and my other personal sites are built with GatsbyJs, so employing React here felt like a good opportunity to practice some more advanced techniques.
  • Make the animations intuitive and natural
  • Luckily, React has quite a few excellent animation libraries built on CSS animations so I had a range of options. I went with Framer Motion.

Ml5.js and the CREPE model for pitch detection

While researching possible options for frequency detection, I came across ml5. It is a great machine learning library for the web that includes an excellent pitch detection algorithm using the CREPE model.

The model itself was developed by researchers at NYU in February 2018 and solves an issue that exists with most pitch estimation methods, which is that when poor quality audio, unusual instruments, and fast fluctuating pitch curves are inputted the results can be inaccurate. Their CREPE (Convolutional Representation for Pitch Estimation) model obtains excellent results within 10 cents even with poor audio quality - a massive help for professional musicians working on perfect intonation control

Setting up the audio stream

Audio Context

The first thing I did was set up an AudioContext interface. I've chosen to create a custom React Hook to do this with useRef to hold our AudioContext, as it is a mutable object and I needed some way to hold the audio coming in that is mutable - useState or anything to do with state wouldn't work here. Also it is going inside useEffect to handle asking for the interface when the page loads.

import { useRef, useEffect } from "react";

export default function useAudioContext() {
  const audio = useRef();
  useEffect(() => {
    audio.current = new (window.AudioContext || window.webkitAudioContext)();
  }, []);
  return audio;
}

getUserMedia & ml5

More fun with useEffect - here i'm using async await to ask for a mic stream, then setting up how I'm going to get the pitch from ml5. Finally I'm requiring a dependancy of my tunerStarted useState hook to tell the useEffect hook to run.

  useEffect(() => {
      (async () => {
        const micStream = await navigator.mediaDevices.getUserMedia({
          audio: true,
          video: false,
        });
        pitchDetectorRef.current = window.ml5.pitchDetection(
          "/crepe",
          audioContextRef.current,
          micStream,
          () => setModelLoaded(true)
        );
        audioStream.current = micStream;
      })();
  }, [audioContextRef, tunerStarted]);

Notice the use of a new useRef here as well of pitchDetectorRef, which holds the returned pitch from ml5.

Also I'll be including the ml5.js file with a link to its cdn with react-helmet in our JSX, as I've found that it is more reliable then using the npm ml5 package. Additionally I’ll be including the required model data that the pitch detection algorithm uses, although this time shipping it with our own code in the public folder as ./crepe. I found this to be the most reliable way to deploy the model and ml5 script.

useInterval

After some experimenting, too much data typically got sent to the pitch detection algorithm, therefore ending up with too many changing frequencies. This custom hook is an attempt to smooth that out a bit. Ideally I would just try something as simple as using setInterval to have the function get called every x number of seconds, but unfortunately React and setInterval don't play nice according to my research, but there is a way to get something similar. This took some looking in to but I eventually landed on a great post on this topic.

import { useEffect, useRef } from "react";

export default function useInterval(callback, delay) {
  const savedCallback = useRef();

  // Remember the latest callback.
  useEffect(() => {
    savedCallback.current = callback;
  }, [callback]);

  // Set up the interval.
  useEffect(() => {
    function tick() {
      savedCallback.current();
    }
    if (delay !== null) {
      let id = setInterval(tick, delay);
      return () => clearInterval(id);
    }
  }, [delay]);
}

Semitones, cents, and calculating intended notes

Because the model only returns the fundamental frequency that was calculated, there are still quite a few things to address in order to present useable information to the user. Namely, I need to calculate which note that frequency is closest to, and then convert the difference into cents, which is the standard measurement of intonation by musicians.

First I've set up the note A as being equal to 440hz as discussed, then I set equal temperaments' frequency ratio, and a scale of all twelve notes in an array. I have chosen at this point to use only sharps to identify what note is appearing. An alternative would be to also show it's enharmonic equivalent in flats, and to give the user a choice - this will be in a future version.

  const A = 440;
  const equalTemperment = 1.059463;
  const scale = [
    "A",
    "A#",
    "B",
    "C",
    "C#",
    "D",
    "D#",
    "E",
    "F",
    "F#",
    "G",
    "G#",
  ];

I have decided to approach this by working backwards. Here are the functions I have written to adequately separate the work:

Find the number of semitones away from A440, rounding to the nearest semitone.

There are 12 notes in modern western music spread across multiple octaves, all of which are separated by a semitone (half-step), hence the decision to work out the distance from A440 in semitones as an easy means of figuring out the intended note. Here is the base equation although we'll be solving for n.

fn = f0* (a)n

f0 = the frequency of one fixed note which must be defined. A common choice is setting the A above middle C (A4) at f0 = 440Hz. n = the number of half steps away from the fixed note you are. If you are at a higher note, n is positive. If you are on a lower note, n is negative. fn = the frequency of the note n half steps away. a = (2)1/12 = the twelfth root of 2 = the number which when multiplied by itself 12 times equals 2 = 1.059463094359...

  function getNumSemitonesFromA (freq) {
    let diffInSemitones = 0;
    if (!freq) {
      return null;
    }
    return (diffInSemitones = Math.round(
      Math.log(freq / A) / Math.log(equalTemperment)
    ));
  }

Get our note from the difference in semitones

Here I'm matching up the result of our previous function to the relevant item in our scale array. Remember we have to handle negative numbers here because our equation returns negative numbers when the semitone detected is below A, and positive if it is above.

  function getNoteFromSemitones(freq, diffInSemitones) {
    const scaleSize = scale.length;
    const normalizedDiff =
      ((diffInSemitones % scaleSize) + scaleSize) % scaleSize;
    return scale[normalizedDiff]
  }

Find the difference in cents from the intended note

Now I have to figure out how far away I actually am from our desired note in cents.

The difference in cents will depend on where one is in the audio spectrum - formula for this was found here.

A Cent is a logarithmic unit of measure of an interval, and that is a dimensionless "frequency ratio" of f2/f1.

c = 1200 × log2 (f2 / f1)

f1 is our "correct" or intended frequency, which we need to solve for in our function. And f2 is our current frequency.

This function is also solving for what the players' correct frequency should be, using our diffInSemitones value from our first function.

  function getDifferenceInCents (freq, diffInSemitones) {
    let centDiff = 0;
    if (!freq) {
      return null;
    }
    //Use the difference in semitones to figure out what the correctFreq should be.
    const correctFreq = A * Math.pow(equalTemperment, diffInSemitones);
    // Below is equation to convert diff in hz to cents. look it up...
    // We're rounding it off. too many decimals..
    centDiff = Math.round(1200 * Math.log2(freq / correctFreq));
    return centDiff;
  }

Implementing my custom hook, useInterval

In addition to the use of useRef earlier to hold the AudioContext object, I also set up quite a few useStates to hold my semitones, pitch frequencies, and others which are initiated at the beginning of my component, please see the source code.

	useInterval(() => {
    if (!tunerStarted) {
      return;
    }
    if (!pitchDetectorRef.current) {
      return;
    }
    pitchDetectorRef.current.getPitch((err, detectedPitch) => {
      setNote(getNoteFromSemitones(pitchfreq, getNumSemitonesFromA(pitchfreq)));
      setPitchFreq(Math.round(detectedPitch * 10) / 10);
      setDiff(
        getDifferenceInCents(detectedPitch, getNumSemitonesFromA(detectedPitch))
      );
    });
  }, 1000 / 80);

Framer Motion and JSX

For animating the intonation line and colours I decided to use a wonderful animation library called framer motion. It's flexibility and ease of use were vital as I have plans on building this out into a more feature rich application in the future.

There is significant use of styled-components here as well which is what is allowing me to make those custom component wrappers.

      <AnimationWrapper>
        <InfoDiv animate={{ backgroundColor: color }}>
          <h2>{note}</h2>
          <p>{diff}</p>
        </InfoDiv>
        <motion.hr
          className="diff-hr"
          animate={{
            y: -diff * 4.7,
            backgroundColor: color,
          }}
        />
        <h2 className="small-note">{note}</h2>
      </AnimationWrapper>

Conclusion

This has been a fairly challenging project for me. I started my self directed coding journey not long ago, and choosing this as my first big project was ambitious to say the least. Therefore I would really appreciate any feedback at all - even if you just don't like my function names I want to hear it! So please do email me!

Written by Tom Caraher on 2023-05-17