In this article, we are going to analyze the Zero-crossing rates (ZCRs) of different music genre tracks. This post is inspired by Valerio Valerdo’s work. I highly encourage you to check out his Youtube channel for his outstanding work in the field of ML/DL for audio.
This article was published as a part of the Data Science Blogathon.
A zero-crossing is an instantaneous point at which the sign of a mathematical function changes (e.g. from positive to negative). It is represented by an intercept of the axis (zero value) in the graph of the function.
A zero-crossing in a line graph of a waveform representing voltage over time
The zero-crossing rate (ZCR) is the rate at which a signal transitions from positive to zero to negative or negative to zero to positive. Its value has been extensively used in both speech recognition and music information retrieval for classifying percussive sounds.
ZCR is defined as :
The zero-crossing rate can utilize the rate as a basic pitch detection algorithm for monophonic tonal signals. Voice activity detection (VAD), which determines whether or not human speech is present in an audio segment, also makes use of zero-crossing rates.
Now, let’s take a closer look at it using the librosa library. To begin, we will import all of the required libraries and load the audio files from different music genres with the help of librosa.
#Importing all the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
%matplotlib inline
#Specifying the path to audio files
classical_music_file = "/content/drive/MyDrive/trytheseaudios/classical.00000.wav"
blues_music_file = "/content/drive/MyDrive/trytheseaudios/blues.00000.wav"
reggae_music_file = "/content/drive/MyDrive/trytheseaudios/reggae.00000.wav"
rock_music_file = "/content/drive/MyDrive/trytheseaudios/rock.00000.wav"
jazz_music_file = "/content/drive/MyDrive/trytheseaudios/jazz.00000.wav"
country_music_file ="/content/drive/MyDrive/trytheseaudios/country.00000.wav"
disco_music_file = "/content/drive/MyDrive/trytheseaudios/disco.00000.wav"
hiphop_music_file = "/content/drive/MyDrive/trytheseaudios/hiphop.00000.wav"
metal_music_file = "/content/drive/MyDrive/trytheseaudios/metal.00000.wav"
pop_music_file = "/content/drive/MyDrive/trytheseaudios/pop.00000.wav"
# load audio files with librosa
classical, sr = librosa.load(classical_music_file, duration=30)
blues, _ = librosa.load(blues_music_file,duration=30)
reggae, _ = librosa.load(reggae_music_file, duration=30)
rock, _ = librosa.load(rock_music_file, duration=30)
jazz, _ = librosa.load(jazz_music_file, duration=30)
country,_ = librosa.load(country_music_file, duration=30)
disco, _ = librosa.load(disco_music_file, duration=30)
hiphop, _ = librosa.load(hiphop_music_file, duration=30)
metal, _ = librosa.load(metal_music_file, duration=30)
pop, _ = librosa.load(pop_music_file, duration=30)
Following that, we will evaluate and compare the lowest and highest instantaneous ZCR values, as well as the lowest and highest average ZCR values of various music genre samples.
#Determining the music genre with the lowest instantaneous value of ZCR
min([librosa.feature.zero_crossing_rate(classical).min(), librosa.feature.zero_crossing_rate(blues).min(), librosa.feature.zero_crossing_rate(reggae).min(), librosa.feature.zero_crossing_rate(rock).min(), librosa.feature.zero_crossing_rate(jazz).min(), librosa.feature.zero_crossing_rate(country).min(), librosa.feature.zero_crossing_rate(disco).min(), librosa.feature.zero_crossing_rate(hiphop).min(), librosa.feature.zero_crossing_rate(metal).min(), librosa.feature.zero_crossing_rate(pop).min()])
Output: 0.00585 ---> which is for the Jazz music genre track!
#Determining the music genre with the highest instantaneous value of ZCR
max([librosa.feature.zero_crossing_rate(classical).max(), librosa.feature.zero_crossing_rate(blues).max(), librosa.feature.zero_crossing_rate(reggae).max(), librosa.feature.zero_crossing_rate(rock).max(), librosa.feature.zero_crossing_rate(jazz).max(), librosa.feature.zero_crossing_rate(country).max(), librosa.feature.zero_crossing_rate(disco).max(), librosa.feature.zero_crossing_rate(hiphop).max(), librosa.feature.zero_crossing_rate(metal).max(), librosa.feature.zero_crossing_rate(pop).max()])
Output: 0.67675 ----> pop music genre track!
#Determining the music genre with the LOWEST AVERAGE value of ZCR
min([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.07846 ---> Jazz music genre track!
#Determining the music genre with the HIGHEST AVERAGE value of ZCR
max([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.18307 ---> Metal music genre track!
Further investigation revealed that the classical genre audio sample track has a low ZCR.
print(f"Minimum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).min()}, Maximum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).max()}, Average ZCR for Classical Genre song: {librosa.feature.zero_crossing_rate(classical).mean()}")
Output: Minimum Instantaneous ZCR for Classical Genre song:0.02685, Maximum Instantaneous ZCR for Classical Genre song:0.1767, Average ZCR for Classical Genre song: 0.0982
#Determining Minimum instantaneous, Maximum instantaneous and average ZCR for pop music genre track
print(f"Minimum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).min()}, Maximum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).max()}, Average ZCR for Pop Genre song: {librosa.feature.zero_crossing_rate(pop).mean()}")
Output: Minimum Instantaneous ZCR for Pop Genre song:0.00683, Maximum Instantaneous ZCR for Pop Genre song:0.6767, Average ZCR for Pop Genre song: 0.12676
Observation: According to the results, the jazz music genre track has the lowest instantaneous and average ZCR. Further analysis revealed that even the classical music genre song has extremely low ZCR values. As a result, we can’t generalize and declare that jazz music genre songs have the lowest ZCR, because the observation varies depending on song composition. Furthermore, the metal and pop music audio sample tracks have the highest average ZCR.
Let us now demystify a little more with the help of visuals. In this regard, we will first use Librosa to extract the zero-crossing rate for each music genre track, and then plot the normalized ZCR for each music genre, followed by the actual (non-normalized) ZCR for each music genre.
#Specifying frame size and hop length
FRAME_SIZE = 1024
HOP_LENGTH = 512
#Extracting zero crossing rate for each music genre song using Librosa
zcr_classical = librosa.feature.zero_crossing_rate(classical, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_blues = librosa.feature.zero_crossing_rate(blues, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_reggae = librosa.feature.zero_crossing_rate(reggae, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_rock = librosa.feature.zero_crossing_rate(rock, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_jazz = librosa.feature.zero_crossing_rate(jazz, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_country = librosa.feature.zero_crossing_rate(country, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_disco = librosa.feature.zero_crossing_rate(disco, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_hiphop = librosa.feature.zero_crossing_rate(hiphop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_metal = librosa.feature.zero_crossing_rate(metal, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_pop = librosa.feature.zero_crossing_rate(pop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
frames = range(len(zcr_classical))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
#Visualizing normalized Zero-crossing rate (ZCR) of different music genre songs plt.figure(figsize=(20, 20))
ax = plt.subplot(5, 2, 1) librosa.display.waveplot(classical, alpha=0.5) plt.plot(t, zcr_classical, color="b") plt.ylim((-1, 1)) plt.title("Classical Music Genre song")
plt.subplot(5, 2, 2) librosa.display.waveplot(blues, alpha=0.5) plt.plot(t, zcr_blues, color="g") plt.ylim((-1, 1)) plt.title("Blues Music Genre song")
plt.subplot(5, 2, 3) librosa.display.waveplot(reggae, alpha=0.5) plt.plot(t, zcr_reggae, color="k") plt.ylim((-1, 1)) plt.title("Reggae Music Genre Song")
plt.subplot(5, 2, 4) librosa.display.waveplot(rock, alpha=0.5) plt.plot(t, zcr_rock, color="#E9967A") plt.ylim((-1, 1)) plt.title("Rock Music Genre song")
plt.subplot(5, 2, 5) librosa.display.waveplot(jazz, alpha=0.5) plt.plot(t, zcr_jazz, color="m") plt.ylim((-1, 1)) plt.title("Jazz Music Genre song")
plt.subplot(5, 2, 6) librosa.display.waveplot(country, alpha=0.5) plt.plot(t, zcr_country, color="y") plt.ylim((-1, 1)) plt.title("Country Music Genre song")
plt.subplot(5, 2, 7) librosa.display.waveplot(disco, alpha=0.5) plt.plot(t, zcr_disco, color="r") plt.ylim((-1, 1)) plt.title("Disco Music Genre song")
plt.subplot(5, 2, 8) librosa.display.waveplot(hiphop, alpha=0.5) plt.plot(t, zcr_hiphop, color="#7FFF00") plt.ylim((-1, 1)) plt.title("Hiphop Music Genre song")
plt.subplot(5, 2, 9) librosa.display.waveplot(metal, alpha=0.5) plt.plot(t, zcr_metal, color="#FFB90F") plt.ylim((-1, 1)) plt.title("Metal Music Genre song")
plt.subplot(5, 2, 10)
librosa.display.waveplot(pop, alpha=0.5)
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim((-1, 1))
plt.title("Pop Music Genre song")
plt.subplots_adjust(hspace = 0.75)
Waveplots illustrating the zero-crossing rates of various music genre tracks
#Visualizing NORMALIZED Zero-crossing rates of different music genre tracks
plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical, color="b")
plt.plot(t, zcr_blues, color="g")
plt.plot(t, zcr_reggae, color="k")
plt.plot(t, zcr_rock, color="#E9967A")
plt.plot(t, zcr_jazz, color="m")
plt.plot(t, zcr_country, color="y")
plt.plot(t, zcr_disco, color="r")
plt.plot(t, zcr_hiphop, color="#7FFF00")
plt.plot(t, zcr_metal, color="#FFB90F")
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim(0, 1)
Graph depicting the NORMALIZED Zero-crossing rates of different music genre tracks
#Visualizing ACTUAL (NON-NORMALIZED) Zero-crossing rate of different music genre tracks
plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical*FRAME_SIZE, color="b")
plt.plot(t, zcr_blues*FRAME_SIZE, color="g")
plt.plot(t, zcr_reggae*FRAME_SIZE, color="k")
plt.plot(t, zcr_rock*FRAME_SIZE, color="#E9967A")
plt.plot(t, zcr_jazz*FRAME_SIZE, color="m")
plt.plot(t, zcr_country*FRAME_SIZE, color="y")
plt.plot(t, zcr_disco*FRAME_SIZE, color="r")
plt.plot(t, zcr_hiphop*FRAME_SIZE, color="#7FFF00")
plt.plot(t, zcr_metal*FRAME_SIZE, color="#FFB90F")
plt.plot(t, zcr_pop*FRAME_SIZE, color="#458B00")
plt.ylim(0, 600)
Graph depicting the ACTUAL (NON-NORMALIZED) zero-crossing rates of different music genre tracks
Conclusion
Upon mathematical and visual inspection, we can say that the jazz and classical music genre songs have low ZCR values. And Pop and Metal music genre songs have high ZCR. However, we can not extrapolate these findings to the entire group based on the small sample size. The preceding analysis, on the other hand, may offer us a concise summary, a form of intuition, about distinct types of music genres.
Thanks for reading. If you have any questions or concerns, please leave them in the comments section below. Happy Learning!
Explore our blog for more insightful articles and stay updated on the latest trends and topics in your field!
References:
Analytics Vidhya does not own the media shown in this article; the Author uses it at their discretion.