PixelPlayer – Identify and Extract Musical Instrument Sounds from Videos with MIT’s AI

Pranav Dar Last Updated : 06 Jul, 2018

3 min read

Overview

Researchers from MIT have developed an AI system, called PixelPlayer, that identifies and isolates instrument sounds from videos
The system, developed through self-supervised learning, was trained on over 60 hours of video
Three neural networks are at play in this system – one for video, one for audio, and a third for separating the sound

Introduction

There are countless times when I listen to music on YouTube and I’m mesmerized by one of the instruments in the video. But isolating and extracting that instrument’s sound has so far been a difficult and cumbersome task for casual listeners and amateur musicians. Unless you owned and knew how to use a sophisticated tool, you were out of luck.

This is where machine learning and AI have become so useful. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a deep learning model that takes a video as input and identifies and isolates the sound of specific instruments. It even has the ability to make that instrument’s sound louder or softer.

The model, or system, has been built using self-supervised learning, which doesn’t require any pre-labelled data. Of course this makes it difficult to interpret fully how the system arrived at a certain result (how it isolated the instrument in this case) but this is something the researchers are working to understand.

So how does it all work?

The system, called PixelPlayer, was trained on over 60 hours of videos and can identify 20 different instruments. The deep learning model first locates the image regions which are producing sound. It then separates the sound into a number of components that represent the sound from each pixel in the image (this is where the system’s name comes from).

There are a number of neural networks at play within the system – one that analyzes the visuals in the video, another that works on the audio part, and a third that first “associates specific pixels with specific soundwaves”, and then separates the different sounds.

The part which surprised the researchers is that the system even recognizes actual musical elements. Their research found that “certain harmonic frequencies seem to correlate to instruments like violin, while quick pulse-like patterns correspond to instruments like the xylophone”.

You should read the research paper which outlines PixelPlayer in more detail, including details of the experiments and their results. Check out the video below which shows this technology working it’s magic:

Our take on this

To put things into context, this isn’t the first attempt at using machine learning and AI in the music industry. We have previously seen Google’s entry into this sphere with nSynth, a Data Science Music challenge from Michigan University, among other things. A lot of professional musicians are using AI to not only make music, but to create videos from scratch as well!

This kind of AI can potentially be used to understand environmental sounds as well. I can see this being incorporated into the self-driving car technology to make it even safer. I personally can’t wait for MIT to release the code on GitHub. Have you ever worked on any sound processing projects or datasets? Connect with me in the comments below.

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Pranav Dar

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

AVbytes

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Steve Pitt

Once this is perfected then it will be a game changer in the world of music production and remixing in particular. For years now we've had decent algorithms for changing the tempo of a music file without impacting pitch (for the most part) which was a breakthrough on its own. But the ability to demix tracks into their constituent instruments will allow for an unprecedented level of flexibility in production. It will also provide music educators and students the ability to focus on specific instruments to deconstruct arrangements and isolate tough to hear parts that might be of interest to composers. Looking forward to more development of this and seeing it in action of files with a large number of instruments.

Jennon

The biggest benefit when you buy musical instruments online is that you will save money. There are many different online stores so you can get a great deal on a new or used instrument.

Sisulu Yenkong

Hey please how do I download the app?

Reading list

PixelPlayer – Identify and Extract Musical Instrument Sounds from Videos with MIT’s AI

Overview