This article was published as a part of the Data Science Blogathon.
In this article, we are going to analyze the Zero-crossing rates (ZCRs) of different music genre tracks. This post is inspired by Valerio Valerdo’s work. I highly encourage you to check out his Youtube channel for his outstanding work in the field of ML/DL for audio.
Zero-Crossing: A zero-crossing is an instantaneous point at which the sign of a mathematical function changes (e.g. from positive to negative). It is represented by an intercept of the axis (zero value) in the graph of the function.
Zero-Crossing Rate: The zero-crossing rate (ZCR) is the rate at which a signal transitions from positive to zero to negative or negative to zero to positive. Its value has been extensively used in both speech recognition and music information retrieval for classifying percussive sounds.
ZCR is defined as :
The zero-crossing rate can be utilized as a basic pitch detection algorithm for monophonic tonal signals. Voice activity detection (VAD), which determines whether or not human speech is present in an audio segment, also makes use of zero-crossing rates.
Now, let’s take a closer look at it using the librosa library. To begin, we will import all of the required libraries and load the audio files from different music genres with the help of librosa.
#Importing all the necessary libraries import matplotlib.pyplot as plt import numpy as np import librosa import librosa.display import IPython.display as ipd %matplotlib inline
#Specifying the path to audio files classical_music_file = "/content/drive/MyDrive/trytheseaudios/classical.00000.wav" blues_music_file = "/content/drive/MyDrive/trytheseaudios/blues.00000.wav" reggae_music_file = "/content/drive/MyDrive/trytheseaudios/reggae.00000.wav" rock_music_file = "/content/drive/MyDrive/trytheseaudios/rock.00000.wav" jazz_music_file = "/content/drive/MyDrive/trytheseaudios/jazz.00000.wav" country_music_file ="/content/drive/MyDrive/trytheseaudios/country.00000.wav" disco_music_file = "/content/drive/MyDrive/trytheseaudios/disco.00000.wav" hiphop_music_file = "/content/drive/MyDrive/trytheseaudios/hiphop.00000.wav" metal_music_file = "/content/drive/MyDrive/trytheseaudios/metal.00000.wav" pop_music_file = "/content/drive/MyDrive/trytheseaudios/pop.00000.wav"
# load audio files with librosa classical, sr = librosa.load(classical_music_file, duration=30) blues, _ = librosa.load(blues_music_file,duration=30) reggae, _ = librosa.load(reggae_music_file, duration=30) rock, _ = librosa.load(rock_music_file, duration=30) jazz, _ = librosa.load(jazz_music_file, duration=30) country,_ = librosa.load(country_music_file, duration=30) disco, _ = librosa.load(disco_music_file, duration=30) hiphop, _ = librosa.load(hiphop_music_file, duration=30) metal, _ = librosa.load(metal_music_file, duration=30) pop, _ = librosa.load(pop_music_file, duration=30)
Following that, we will evaluate and compare the lowest and highest instantaneous ZCR values, as well as the lowest and highest average ZCR values of various music genre samples.
#Determining the music genre with the lowest instantaneous value of ZCR min([librosa.feature.zero_crossing_rate(classical).min(), librosa.feature.zero_crossing_rate(blues).min(), librosa.feature.zero_crossing_rate(reggae).min(), librosa.feature.zero_crossing_rate(rock).min(), librosa.feature.zero_crossing_rate(jazz).min(), librosa.feature.zero_crossing_rate(country).min(), librosa.feature.zero_crossing_rate(disco).min(), librosa.feature.zero_crossing_rate(hiphop).min(), librosa.feature.zero_crossing_rate(metal).min(), librosa.feature.zero_crossing_rate(pop).min()])
Output: 0.00585 —> which is for the Jazz music genre track!
#Determining the music genre with the highest instantaneous value of ZCR max([librosa.feature.zero_crossing_rate(classical).max(), librosa.feature.zero_crossing_rate(blues).max(), librosa.feature.zero_crossing_rate(reggae).max(), librosa.feature.zero_crossing_rate(rock).max(), librosa.feature.zero_crossing_rate(jazz).max(), librosa.feature.zero_crossing_rate(country).max(), librosa.feature.zero_crossing_rate(disco).max(), librosa.feature.zero_crossing_rate(hiphop).max(), librosa.feature.zero_crossing_rate(metal).max(), librosa.feature.zero_crossing_rate(pop).max()])
Output: 0.67675 —-> pop music genre track!
#Determining the music genre with the LOWEST AVERAGE value of ZCR min([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.07846 —> Jazz music genre track!
#Determining the music genre with the HIGHEST AVERAGE value of ZCR max([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.18307 —> Metal music genre track!
Also, on further investigation, it was found that the classical genre audio sample track has a low ZCR.
print(f"Minimum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).min()}, Maximum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).max()}, Average ZCR for Classical Genre song: {librosa.feature.zero_crossing_rate(classical).mean()}")
Output: Minimum Instantaneous ZCR for Classical Genre song:0.02685, Maximum Instantaneous ZCR for Classical Genre song:0.1767, Average ZCR for Classical Genre song: 0.0982
#Determining Minimum instantaneous, Maximum instantaneous and average ZCR for pop music genre track print(f"Minimum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).min()}, Maximum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).max()}, Average ZCR for Pop Genre song: {librosa.feature.zero_crossing_rate(pop).mean()}")
Output: Minimum Instantaneous ZCR for Pop Genre song:0.00683, Maximum Instantaneous ZCR for Pop Genre song:0.6767, Average ZCR for Pop Genre song: 0.12676
Observation: According to the results, the jazz music genre track has the lowest instantaneous and average ZCR. Further analysis revealed that even the classical music genre song has extremely low ZCR values. As a result, we can’t generalize and declare that jazz music genre songs have the lowest ZCR, because the observation varies depending on song composition. Furthermore, the metal and pop music audio sample tracks have the highest average ZCR.
Let us now demystify a little more with the help of visuals. In this regard, we will first use Librosa to extract the zero-crossing rate for each music genre track, and then plot the normalized ZCR for each music genre, followed by the actual (non-normalized) ZCR for each music genre.
#Specifying frame size and hop length
FRAME_SIZE = 1024 HOP_LENGTH = 512
#Extracting zero crossing rate for each music genre song using Librosa zcr_classical = librosa.feature.zero_crossing_rate(classical, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_blues = librosa.feature.zero_crossing_rate(blues, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_reggae = librosa.feature.zero_crossing_rate(reggae, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_rock = librosa.feature.zero_crossing_rate(rock, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_jazz = librosa.feature.zero_crossing_rate(jazz, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_country = librosa.feature.zero_crossing_rate(country, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_disco = librosa.feature.zero_crossing_rate(disco, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_hiphop = librosa.feature.zero_crossing_rate(hiphop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_metal = librosa.feature.zero_crossing_rate(metal, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0] zcr_pop = librosa.feature.zero_crossing_rate(pop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
frames = range(len(zcr_classical)) t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
#Visualizing normalized Zero-crossing rate (ZCR) of different music genre songs plt.figure(figsize=(20, 20))
ax = plt.subplot(5, 2, 1) librosa.display.waveplot(classical, alpha=0.5) plt.plot(t, zcr_classical, color="b") plt.ylim((-1, 1)) plt.title("Classical Music Genre song")
plt.subplot(5, 2, 2) librosa.display.waveplot(blues, alpha=0.5) plt.plot(t, zcr_blues, color="g") plt.ylim((-1, 1)) plt.title("Blues Music Genre song")
plt.subplot(5, 2, 3) librosa.display.waveplot(reggae, alpha=0.5) plt.plot(t, zcr_reggae, color="k") plt.ylim((-1, 1)) plt.title("Reggae Music Genre Song")
plt.subplot(5, 2, 4) librosa.display.waveplot(rock, alpha=0.5) plt.plot(t, zcr_rock, color="#E9967A") plt.ylim((-1, 1)) plt.title("Rock Music Genre song")
plt.subplot(5, 2, 5) librosa.display.waveplot(jazz, alpha=0.5) plt.plot(t, zcr_jazz, color="m") plt.ylim((-1, 1)) plt.title("Jazz Music Genre song")
plt.subplot(5, 2, 6) librosa.display.waveplot(country, alpha=0.5) plt.plot(t, zcr_country, color="y") plt.ylim((-1, 1)) plt.title("Country Music Genre song")
plt.subplot(5, 2, 7) librosa.display.waveplot(disco, alpha=0.5) plt.plot(t, zcr_disco, color="r") plt.ylim((-1, 1)) plt.title("Disco Music Genre song")
plt.subplot(5, 2, 8) librosa.display.waveplot(hiphop, alpha=0.5) plt.plot(t, zcr_hiphop, color="#7FFF00") plt.ylim((-1, 1)) plt.title("Hiphop Music Genre song")
plt.subplot(5, 2, 9) librosa.display.waveplot(metal, alpha=0.5) plt.plot(t, zcr_metal, color="#FFB90F") plt.ylim((-1, 1)) plt.title("Metal Music Genre song")
plt.subplot(5, 2, 10) librosa.display.waveplot(pop, alpha=0.5) plt.plot(t, zcr_pop, color="#458B00") plt.ylim((-1, 1)) plt.title("Pop Music Genre song") plt.subplots_adjust(hspace = 0.75)
#Visualizing NORMALIZED Zero-crossing rates of different music genre tracks
plt.figure(figsize=(25, 25)) plt.plot(t, zcr_classical, color="b") plt.plot(t, zcr_blues, color="g") plt.plot(t, zcr_reggae, color="k") plt.plot(t, zcr_rock, color="#E9967A") plt.plot(t, zcr_jazz, color="m") plt.plot(t, zcr_country, color="y") plt.plot(t, zcr_disco, color="r") plt.plot(t, zcr_hiphop, color="#7FFF00") plt.plot(t, zcr_metal, color="#FFB90F") plt.plot(t, zcr_pop, color="#458B00") plt.ylim(0, 1)
#Visualizing ACTUAL (NON-NORMALIZED) Zero-crossing rate of different music genre tracks
plt.figure(figsize=(25, 25)) plt.plot(t, zcr_classical*FRAME_SIZE, color="b") plt.plot(t, zcr_blues*FRAME_SIZE, color="g") plt.plot(t, zcr_reggae*FRAME_SIZE, color="k") plt.plot(t, zcr_rock*FRAME_SIZE, color="#E9967A") plt.plot(t, zcr_jazz*FRAME_SIZE, color="m") plt.plot(t, zcr_country*FRAME_SIZE, color="y") plt.plot(t, zcr_disco*FRAME_SIZE, color="r") plt.plot(t, zcr_hiphop*FRAME_SIZE, color="#7FFF00") plt.plot(t, zcr_metal*FRAME_SIZE, color="#FFB90F") plt.plot(t, zcr_pop*FRAME_SIZE, color="#458B00") plt.ylim(0, 600)
Conclusion
Upon mathematical and visual inspection, we can say that the jazz and classical music genre songs have low ZCR values. And Pop and Metal music genre songs have high ZCR. However, we can not extrapolate these findings to the entire group based on the small sample size. The preceding analysis, on the other hand, may offer us a concise summary, a form of intuition, about distinct types of music genres.
Thanks for reading. If you have any questions or concerns, please leave them in the comments section below. Happy Learning!
Read more articles on our blog.
Link to GitHub repo: Click here!
References: 1. https://www.youtube.com/watch?v=EycaSbIRx-0&t=1352s
2. https://en.wikipedia.org/wiki/Zero_crossing
3. https://en.wikipedia.org/wiki/Zero-crossing_rate
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.