The music industry is in the middle of a machine learning revolution. From Google’s NSynth to projects aimed at transforming every aspect of the digital age of music, machine learning has changed the landscape of how music is created. And here comes another breakthrough – an AI that can translate one song into a completely different one. And you won’t be able to tell which one has been generated by a machine.
Researchers at Facebook’s AI research lab (FAIR) have developed an algorithm that can translate one style, instrument or genre of music into a completely different one. In a scintillating video demonstration (at the end of this article), you’ll see how Bach’s symphony was turned into Beethoven’s orchestra, without a glitch! The system even accepts whistles as input and converts them into music!
The developers built their model by leveraging two technologies that have recently become available:
At the core of this system is, as you might have guessed, a deep neural network. In the architecture of this model, the researchers employed a single, universal encoder and applied it to all the input sounds. This had the distinct advantage of training fewer networks, and also enabled the team to translate from music domains that were not heard/used during the training phase. The below image shows the framework of this model. That dashed line you see? It’s employed only during the training period.
FAIR has become the first AI research arm to have created an unsupervised learning process for completely translating one form of music into another, using a neural network. But to allow the system to translate music in an unsupervised manner, the researchers distorted the input music intentionally. What this did was it forced the system to ignore what made the music unique – like the style, genre and instrument used – and concentrate specifically on the core structure of the song.
According to the researchers, “The system was implemented in the PyTorch framework, and trained on eight Tesla V100 GPUs for a total of 6 days. We used the ADAM optimization algorithm with a learning rate of 10−3 and a decay factor of 0.98 every 10,000 samples. We weighted the confusion loss with λ = 10−2”.
The final output and evaluation of the system has been promising so far. In fact, the system is so good at translating music, that many humans were unable to tell the difference what the original input music was, and the AI generated output.
As is the norm with FAIR’s research, they have published an official paper on this study and you can read it in full here. Also be sure to check out the below video which shows this AI in action:
This is yet another example of how far AI has penetrated the music industry. We have seen previous efforts in this field but they have not been able to fully turn an input into something this spectacular. This system has the potential to make even amateurs into musicians. Or you might be developing a video on a small budget – use this research to generate music for yourself without having to pay a high cost for it.
If you have an interest in deep learning, GANs and music, go through the paper I have linked above. This is an amazing time to a data scientist as more and more breakthroughs are happening.