Member-only story

Real-time Audio Classification on Beatbox (Full Pipeline)

25 min readMay 31, 2024

The human mouth is capable of making a whole range of voices that one might not have thought of before. I was inspired by Tom Thum and his video on beatboxing around 7 years ago, and began my journey of converting my mouth into an orchestra. The following article will show the steps needed for a full pipeline to classify beatbox sounds using convolutional neural networks in real-time.

Where did the data come from?

This project focuses on audio files, specifically in the .wav format, that belong to either class of sounding like a kick drum, a high hat, a synth, a trumpet, or a snare. The whole dataset comprises of 234 datapoints with some class imbalance; these files were obtained in the following manner: 30 for each class from Seth Adam’s GitHub (originally obtained from Kaggle), 49 recorded by myself (9 high hats and 10 each for the other classes), and the remaining 35 from Echo Sound Works “Aura One Shots” free sample pack.

And why is it just this data?

Rather than expanding the number of classes to include more precise classifications used in beatbox such as throat bass, liprolls, inward snares, outward snares, or the myriad number of variations on kick drums and synths, this project intends to create a tool for the beginner beatboxer and as such is focusing on the 5 most distinct and useful sounds to learn: kick drum, high hat, synth, trumpet, and snare.

Real-time Audio Classification on Beatbox (Full Pipeline)

Where did the data come from?

And why is it just this data?

Written by Raphael Khalid

No responses yet