Audio multi-class segmentation, a tutorial

So far this year, I published an audio segmentation tool on Github, auditok. Audio segmentation, in its simplest form, lets us figure out where a sound starts and where it ends within an audio stream.

auditok makes use of log energy of raw audio signal to detect acoustic activities (Figure 1) but cannot tell which class of sound (bird, speech, phone ring, etc.) corresponds to a given acoustic activity.


Figure 1: example of  auditok  audio activity detection output

Since recognizing the nature of sounds that an audio stream contains is a very exciting idea and a very desirable feature, many research works on this theme have been carried out over the past decade.

I initially wanted to publish many introductory articles on this subject before I bring out a practical application. However, after a recent experiment I ran with auditok in an effort to try a segmentation by classification test of audio streams, I ended up with  an article in form of an interactive Jupyter  notebook. Segmentation by classification is a more advanced form of audio segmentation. Not only can it detect the presence of audio activities, it also ranges them into audio classes (Figure 2).


Figure 2: example of an  output when auditok is “tuned” with a GMM classifier

As you can see, there are many advantages of segmentation by classification over energy-based segmentation. Despite its simplicity, energy-based segmentation can’t recognize the class of sound, systematically misses  low energy audio activities (e.g. breath) and treats adjacent audio activities as one single activity (see end of audio stream).

If you want to check out a static version (read only) of the notebook, you can find it here.

If you want to try out everything yourself, please visit this repositorygithub_logo where you will find the interactive notebook, installation instructions as well as training data and models.


Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s