Music generation utilizing AI has gained significance as a valuable area, transforming the way music is produced and enjoyed. This project introduces the concept and purpose behind employing artificial intelligence in music creation. We aim to explore the process of generating music using AI algorithms and the potential it holds.
Our project focuses on understanding and implementing AI techniques that facilitate music composition. AI can make tunes by learning from a big collection of music pieces by using special math rules to understand patterns, beats, and structures in music and then making new tunes based on what it has learned. By training models on musical data, we enable AI systems to learn and produce new original compositions. We will also examine recent developments in AI-generated music, particularly highlighting MusicGen by Meta.
By exploring the scope of AI in music generation, the objective of this project is to inspire musicians, researchers, and music enthusiasts to explore the possibilities of this innovative technology. Together, let us embark on this musical expedition and uncover the melodies AI can generate.
By working on this project, we stand to gain new technical skills and an understanding of how AI algorithms can be implemented to build innovative applications. By the end of this project, we will:
This article was published as a part of the Data Science Blogathon.
The purpose of this project is to explore the intriguing domain of music generation using AI. We aim to investigate how artificial intelligence techniques create unique musical pieces. By leveraging machine learning algorithms, our objective is to train an AI model capable of producing melodies and harmonies across various musical genres.
The project’s focus is on gathering a diverse range of musical data, specifically .mp3 files, which will serve as the foundation for training the AI model. These files will undergo preprocessing to convert them into MIDI format using specialized tools like Spotify’s Basic Pitch. This conversion is essential as MIDI files provide a structured representation of musical elements that the AI model can easily interpret.
The subsequent phase involves building the AI model tailored for music generation. Train the model using the prepared MIDI data, aiming to capture underlying patterns and structures present in the music.
Conduct the performance evaluation to assess the model’s proficiency. This will involve generating music samples and assessing their quality to refine the process and enhance the model’s ability to produce creative music.
The final outcome of this project will be the ability to generate original compositions using the trained AI model. These compositions can be further refined through post-processing techniques to enrich their musicality and coherence.
The project endeavours to tackle the issue of limited accessibility to music composition tools. Traditional methods of music creation can be laborious and demand specialized knowledge. Moreover, generating fresh and distinct musical concepts can pose a formidable challenge. The aim of this project is to employ artificial intelligence to circumvent these obstacles and offer a seamless solution for music generation, even for non-musicians. Through the development of an AI model with the capability to compose melodies and harmonies, the project aims to democratize the process of music creation, empowering musicians, hobbyists, and novices to unleash their creative potential and craft unique compositions with ease.
The story of AI in making tunes goes back to the 1950s, with the Illiac Suite for String Quartet being the first tune made with a computer’s help. However, it’s only in the last few years that AI has really started to shine in this area. Today, AI can make tunes of many types, from classical to pop, and even make tunes that copy the style of famous musicians.
The current state of AI in making tunes is very advance in the recent times. Recently, Meta has brought out a new AI-powered tune maker called MusicGen. MusicGen, made on a strong Transformer model, can guess and make music parts in a similar way to how language models guess the next letters in a sentence. It uses an audio tokenizer called EnCodec to break down audio data into smaller parts for easy processing.
One of the special features of MusicGen is its ability to handle both text descriptions and music cues at the same time, resulting in a smooth mix of artistic expression. Using a big dataset of 20,000 hours of allowed music, making sure its ability to create tunes that connect with listeners. Further, companies like OpenAI have made AI models like MuseNet and Jukin Media’s Jukin Composer that can make tunes in a wide range of styles and types. Moreover, AI can now make tunes that are almost the same as tunes made by humans, making it a strong tool in the music world.
Discussing the ethical aspects of AI-generated music is crucial when exploring this field. One pertinent area of concern involves potential copyright and intellectual property infringements. Train AI models on extensive musical datasets, which could result in generated compositions bearing similarities to existing works. It is vital to respect copyright laws and attribute original artists appropriately to uphold fair practices.
Moreover, the advent of AI-generated music may disrupt the music industry, posing challenges for musicians seeking recognition in a landscape inundated with AI compositions. Striking a balance between utilizing AI as a creative tool and safeguarding the artistic individuality of human musicians is an essential consideration.
For the purpose of this project, we will try and generate some original instrumental music using AI. Personally, I am a big fan of renowned instrumental music channels like Fluidified, MusicLabChill, and FilFar on YouTube, which have excellent tracks for all kinds of mood. Taking inspiration from these channels, we will attempt to generate music on similar lines, which we will finally share on YouTube.
To assemble the necessary data for our project, we focus on sourcing the relevant .mp3 files that align with our desired musical style. Through extensive exploration of online platforms and websites, we discover legal and freely available instrumental music tracks. These tracks serve as invaluable assets for our dataset, encompassing a diverse assortment of melodies and harmonies to enrich the training process of our model.
Once we have successfully acquired the desired .mp3 files, we proceed to transform them into MIDI files. MIDI files represent musical compositions in a digital format, enabling efficient analysis and generation by our models. For this conversion, we rely on the practical and user-friendly functionality provided by Spotify’s Basic Pitch.
With the assistance of Spotify’s Basic Pitch, we upload the acquired .mp3 files, initiating the transformation process. The tool harnesses advanced algorithms to decipher the audio content, extracting crucial musical elements such as notes and structures to generate corresponding MIDI files. These MIDI files serve as the cornerstone of our music generation models, empowering us to manipulate and produce fresh, innovative compositions.
To develop our music generation model, we utilize a specialized architecture tailored specifically for this purpose. The chosen architecture comprises two LSTM (Long Short-Term Memory) layers, each consisting of 256 units. LSTM, a type of recurrent neural network (RNN), excels in handling sequential data, making it an excellent choice for generating music with its inherent temporal characteristics.
The first LSTM layer processes input sequences with a fixed length of 100, as determined by the sequence_length variable. By returning sequences, this layer effectively preserves the temporal relationships present in the musical data. To prevent overfitting and improve the model’s adaptability to new data, a dropout layer with a dropout rate of 0.3 is incorporated.
The second LSTM layer, which does not return sequences, receives the outputs from the previous layer and further learns intricate patterns within the music. Finally, a dense layer with a softmax activation function generates output probabilities for the subsequent note.
Having established our model architecture, let’s dive straight into building the same. We will break down the code into sections and explain each part for the reader’s sake.
We start by importing the necessary libraries that provide useful functionalities for our project. In addition to the usual libraries required for regular ops, we will be using tensorflow for deep learning, and music21 for music manipulation.
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.utils import to_categorical
from music21 import converter, instrument, stream, note, chord
from google.colab import files
Next, we define the directory where our MIDI files are located. The code then goes through each file in the directory, extracts the notes and chords, and stores them for further processing. The ‘converter’ module from the music21 library is used to parse the MIDI files and retrieve the musical elements. As an experiment, we will first use just one MIDI file to train the model and then compare the result by using five MIDI files for training.
# Directory containing the MIDI files
midi_dir = "/content/Midi Files"
notes = []
# Process each MIDI file in the directory
for filename in os.listdir(midi_dir):
if filename.endswith(".midi"):
file = converter.parse(os.path.join(midi_dir, filename))
# Find all the notes and chords in the MIDI file
try:
# If the MIDI file has instrument parts
s2 = file.parts.stream()
notes_to_parse = s2[0].recurse()
except:
# If the MIDI file only has notes (
# no chords or instrument parts)
notes_to_parse = file.flat.notes
# Extract pitch and duration information from notes and chords
for element in notes_to_parse:
if isinstance(element, note.Note):
notes.append(str(element.pitch))
elif isinstance(element, chord.Chord):
notes.append('.'.join(str(n) for n in
element.normalOrder))
# Print the number of notes and some example notes
print("Total notes:", len(notes))
print("Example notes:", notes[:10])
To convert the notes into numerical sequences that our model can process, we create a dictionary that maps each unique note or chord to a corresponding integer. This step allows us to represent the musical elements in a numerical format.
# Create a dictionary to map unique notes to integers
unique_notes = sorted(set(notes))
note_to_int = {note: i for i, note in
enumerate(unique_notes)}
In order to train our model, we need to create input and output sequences. This is done by sliding a fixed-length window over the list of notes. The input sequence consists of the preceding notes and the output sequence is the next note. These sequences are stored in separate lists.
# Convert the notes to numerical sequences
sequence_length = 100 # Length of each input sequence
input_sequences = []
output_sequences = []
# Generate input/output sequences
for i in range(0, len(notes) - sequence_length, 1):
# Extract the input sequence
input_sequence = notes[i:i + sequence_length]
input_sequences.append([note_to_int[note] for
note in input_sequence])
# Extract the output sequence
output_sequence = notes[i + sequence_length]
output_sequences.append(note_to_int[output_sequence])
Before feeding the input sequences to our model, we reshape them to match the expected input shape of the LSTM layer. Additionally, we normalize the sequences by dividing them by the total number of unique notes. This step ensures that the input values fall within a suitable range for the model to learn effectively.
# Reshape and normalize the input sequences
num_sequences = len(input_sequences)
num_unique_notes = len(unique_notes)
# Reshape the input sequences
X = np.reshape(input_sequences, (num_sequences, sequence_length, 1))
# Normalize the input sequences
X = X / float(num_unique_notes)
The output sequences representing the next note to predict will convert into a one-hot encoded format. This encoding allows the model to understand the probability distribution of the next note among the available notes.
# One-hot encode the output sequences
y = to_categorical(output_sequences)
We define our RNN (Recurrent Neural Network) model using the Sequential class from the tensorflow.keras.models module. The model consists of two LSTM (Long Short-Term Memory) layers, followed by a dropout layer to prevent overfitting. The last layer is a Dense layer with a softmax activation function to output the probabilities of each note.
# Define the RNN model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]),
return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(256))
model.add(Dense(y.shape[1], activation='softmax'))
We compile the model by specifying the loss function and optimizer. We then proceed to train the model on the input sequences (X) and output sequences (y) for a specific number of epochs and with a given batch size.
# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam')
# Step 4: Train the model
model.fit(X, y, batch_size=64, epochs=100)
Once we train the model, we can generate new music sequences. We define a function named generate_music that takes three inputs: the trained model, seed_sequence, and length. It uses the model to predict the next note in the sequence based on the previous notes and repeats this process to generate the desired length of music.
To start, we create a copy of the seed_sequence to prevent any modifications to the original sequence. This seed_sequence serves as the initial point for generating the music.
We then enter a loop that runs length times. Within each iteration, perform the following steps:
After normalizing the input_sequence, use the model to predict the probabilities of the next note. The model.predict method takes the input_sequence as input and returns the predicted probabilities.
To select the next note, the np.random.choice function is used, which randomly picks an index based on the probabilities obtained. This randomness introduces diversity and unpredictability into the generated music.
The selected index represents the new note, which is appended to the generated_sequence. The generated_sequence is then updated by removing the first element to maintain the desired length. Once the loop completes, the generated_sequence is returned, representing the newly generated music.
The seed_sequence and the desired generated_length need to be set to generate the music. The seed_sequence should be a valid input sequence that the model has been trained on, and the generated_length determines the number of notes the generated music should contain.
# Generate new music
def generate_music(model, seed_sequence, length):
generated_sequence = seed_sequence.copy()
for _ in range(length):
input_sequence = np.array(generated_sequence)
input_sequence = np.reshape(input_sequence, (1, len(input_sequence), 1))
input_sequence = input_sequence / float(num_unique_notes) # Normalize input sequence
predictions = model.predict(input_sequence)[0]
new_note = np.random.choice(range(len(predictions)), p=predictions)
generated_sequence.append(new_note)
generated_sequence = generated_sequence[1:]
return generated_sequence
# Set the seed sequence and length of the generated music
seed_sequence = input_sequences[0] # Replace with your own seed sequence
generated_length = 100 # Replace with the desired length of the generated music
generated_music = generate_music(model, seed_sequence, generated_length)
generated_music
# Output of the above code
[1928,
1916,
1959,
1964,
1948,
1928,
1190,
873,
1965,
1946,
1928,
1970,
1947,
1946,
1964,
1948,
1022,
1945,
1916,
1653,
873,
873,
1960,
1946,
1959,
1942,
1348,
1960,
1961,
1971,
1966,
1927,
705,
1054,
150,
1935,
864,
1932,
1936,
1763,
1978,
1949,
1946,
351,
1926,
357,
363,
864,
1965,
357,
1928,
1949,
351,
1928,
1949,
1662,
1352,
1034,
1021,
977,
150,
325,
1916,
1960,
363,
943,
1949,
553,
1917,
1962,
1917,
1916,
1947,
1021,
1021,
1051,
1648,
873,
977,
1959,
1927,
1959,
1947,
434,
1949,
553,
360,
1916,
1190,
1022,
1348,
1051,
325,
1965,
1051,
1917,
1917,
407,
1948,
1051]
The generated output, as seen, is a sequence of integers representing the notes or chords in our generated music. In order to listen to the generated output, we will have to convert this back into music by reversing the mapping we created earlier to get the original notes/chords. To do this, we will firstly create a dictionary called int_to_note, where the integers are the keys and the corresponding notes are the values.
Next, we create a stream called output_stream to store the generated notes and chords. This stream acts as a container to hold the musical elements that will constitute the generated music.
We then iterate through each element in the generated_music sequence. Each element is a number representing a note or a chord. We use the int_to_note dictionary to convert the number back to its original note or chord string representation.
If the pattern is a chord, which can be identified by the presence of a dot or being a digit, we split the pattern string into individual notes. For each note, we create a note.Note object, assign it a piano instrument, and add it to the notes list. Finally, we create a chord.Chord object from the notes list, representing the chord, and append it to the output_stream.
If the pattern is a single note, we create a note.Note object for that note, assign it a piano instrument, and add it directly to the output_stream.
Once all the patterns in the generated_music sequence have been processed, we write the output_stream to a MIDI file named ‘generated_music.mid’. Finally, we download the generated music file from Colab using the files.download function.
# Reverse the mapping from notes to integers
int_to_note = {i: note for note, i in note_to_int.items()}
# Create a stream to hold the generated notes/chords
output_stream = stream.Stream()
# Convert the output from the model into notes/chords
for pattern in generated_music:
# pattern is a number, so we convert it back to a note/chord string
pattern = int_to_note[pattern]
# If the pattern is a chord
if ('.' in pattern) or pattern.isdigit():
notes_in_chord = pattern.split('.')
notes = []
for current_note in notes_in_chord:
new_note = note.Note(int(current_note))
new_note.storedInstrument = instrument.Piano()
notes.append(new_note)
new_chord = chord.Chord(notes)
output_stream.append(new_chord)
# If the pattern is a note
else:
new_note = note.Note(pattern)
new_note.storedInstrument = instrument.Piano()
output_stream.append(new_note)
# Write the stream to a MIDI file
output_stream.write('midi', fp='generated_music.mid')
# Download the generated music file from Colab
files.download('generated_music.mid')
Now, it’s time to listen to the outcome of our AI-generated music. You can find the link to listen to the music below.
To be honest, the initial result may sound like someone with limited experience playing musical instruments. This is primarily because we trained our model using only a single MIDI file. However, we can enhance the quality of the music by repeating the process and training our model on a larger dataset. In this case, we will train our model using five MIDI files, all of which will be instrumental music of a similar style.
The difference in the quality of the music generated from the expanded dataset is quite remarkable. It clearly demonstrates that training the model on a more diverse range of MIDI files leads to significant improvements in the generated music. This emphasizes the importance of increasing the size and variety of the training dataset to achieve better musical results.
Though we managed to generate music using a sophisticated model, but there are certain limitations to scaling such a system.
In this project, we embarked on the fascinating journey of generating music using AI. Our goal was to explore the capabilities of AI in music composition and unleash its potential in creating unique musical pieces. Through the implementation of AI models and deep learning techniques, we successfully generated music that closely resembled the style of the input MIDI files. The project showcased the ability of AI to assist and inspire in the creative process of music composition.
Here are some of the key takeaways from this project:
A. AI creates music by understanding patterns and structures in a vast collection of music data. It learns how notes, chords, and rhythms are related and applies this understanding to generate new melodies, harmonies, and rhythms.
A. Yes, AI can compose music in a wide range of styles. By training AI models on different styles of music, it can learn the distinct characteristics and elements of each style. This enables it to generate music that captures the essence of various styles like classical, jazz, rock, or electronic.
A. AI-generated music can involve copyright complexities. Although AI algorithms create the music, the input data often includes copyrighted material. The legal protection and ownership of AI-generated music depend on the jurisdiction and specific situations. Proper attribution and knowledge of copyright laws are crucial when using or sharing AI-generated music.
A. Yes, AI-created music can be used in business projects, but it’s important to consider copyright aspects. Certain AI models are trained on copyrighted music, which might necessitate acquiring appropriate licenses or permissions for commercial usage. Consulting legal experts or copyright specialists is advisable to ensure adherence to copyright laws.
A. AI-created music cannot completely replace human musicians. Although AI can compose music with impressive outcomes, it lacks the emotional depth, creativity, and interpretive skills of human musicians. AI serves as a valuable tool for inspiration and collaboration, but the unique artistry and expression of human musicians cannot be replicated.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.