Analysis of Zero Crossing Rates of Different Music Genre Tracks

Drishti Last Updated : 03 Dec, 2024
5 min read

In this article, we are going to analyze the Zero-crossing rates (ZCRs) of different music genre tracks. This post is inspired by Valerio Valerdo’s work. I highly encourage you to check out his Youtube channel for his outstanding work in the field of ML/DL for audio.

This article was published as a part of the Data Science Blogathon.

Tools used

  • Python
  • Librosa (librosa.feature.zero_crossing)
  • One 30-second audio clip from each of 10 distinct music genres (Classical, Blues, Reggae, Rock, Jazz, Pop, Hip-hop, Country, Disco, and Metal) from GTZAN dataset

Zero-Crossing

A zero-crossing is an instantaneous point at which the sign of a mathematical function changes (e.g. from positive to negative). It is represented by an intercept of the axis (zero value) in the graph of the function. 

 A zero-crossing in a line graph of a waveform representing voltage over time

Zero-Crossing Rate

The zero-crossing rate (ZCR) is the rate at which a signal transitions from positive to zero to negative or negative to zero to positive. Its value has been extensively used in both speech recognition and music information retrieval for classifying percussive sounds.

ZCR is defined as :

Zero Crossing Rates | Formula

The zero-crossing rate can utilize the rate as a basic pitch detection algorithm for monophonic tonal signals. Voice activity detection (VAD), which determines whether or not human speech is present in an audio segment, also makes use of zero-crossing rates.

Now, let’s take a closer look at it using the librosa library. To begin, we will import all of the required libraries and load the audio files from different music genres with the help of librosa.

A Brief Analysis using Librosa

#Importing all the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
%matplotlib inline
#Specifying the path to audio files
classical_music_file = "/content/drive/MyDrive/trytheseaudios/classical.00000.wav"
blues_music_file = "/content/drive/MyDrive/trytheseaudios/blues.00000.wav"
reggae_music_file = "/content/drive/MyDrive/trytheseaudios/reggae.00000.wav"
rock_music_file = "/content/drive/MyDrive/trytheseaudios/rock.00000.wav"
jazz_music_file = "/content/drive/MyDrive/trytheseaudios/jazz.00000.wav"
country_music_file ="/content/drive/MyDrive/trytheseaudios/country.00000.wav"
disco_music_file = "/content/drive/MyDrive/trytheseaudios/disco.00000.wav"
hiphop_music_file = "/content/drive/MyDrive/trytheseaudios/hiphop.00000.wav"
metal_music_file = "/content/drive/MyDrive/trytheseaudios/metal.00000.wav"
pop_music_file = "/content/drive/MyDrive/trytheseaudios/pop.00000.wav"
# load audio files with librosa
classical, sr = librosa.load(classical_music_file, duration=30)
blues, _ = librosa.load(blues_music_file,duration=30)
reggae, _ = librosa.load(reggae_music_file, duration=30)
rock, _ = librosa.load(rock_music_file, duration=30)
jazz, _ = librosa.load(jazz_music_file, duration=30)
country,_ = librosa.load(country_music_file, duration=30)
disco, _ = librosa.load(disco_music_file, duration=30)
hiphop, _ = librosa.load(hiphop_music_file, duration=30)
metal, _ = librosa.load(metal_music_file, duration=30)
pop, _ = librosa.load(pop_music_file, duration=30)

Following that, we will evaluate and compare the lowest and highest instantaneous ZCR values, as well as the lowest and highest average ZCR values of various music genre samples.

#Determining the music genre with the lowest instantaneous value of ZCR
min([librosa.feature.zero_crossing_rate(classical).min(), librosa.feature.zero_crossing_rate(blues).min(), librosa.feature.zero_crossing_rate(reggae).min(), librosa.feature.zero_crossing_rate(rock).min(), librosa.feature.zero_crossing_rate(jazz).min(), librosa.feature.zero_crossing_rate(country).min(), librosa.feature.zero_crossing_rate(disco).min(), librosa.feature.zero_crossing_rate(hiphop).min(), librosa.feature.zero_crossing_rate(metal).min(), librosa.feature.zero_crossing_rate(pop).min()])
Output: 0.00585 ---> which is for the Jazz music genre track!
#Determining the music genre with the highest instantaneous value of ZCR
max([librosa.feature.zero_crossing_rate(classical).max(), librosa.feature.zero_crossing_rate(blues).max(), librosa.feature.zero_crossing_rate(reggae).max(), librosa.feature.zero_crossing_rate(rock).max(), librosa.feature.zero_crossing_rate(jazz).max(), librosa.feature.zero_crossing_rate(country).max(), librosa.feature.zero_crossing_rate(disco).max(), librosa.feature.zero_crossing_rate(hiphop).max(), librosa.feature.zero_crossing_rate(metal).max(), librosa.feature.zero_crossing_rate(pop).max()])
Output: 0.67675 ----> pop music genre track!
#Determining the music genre with the LOWEST AVERAGE value of ZCR
min([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.07846 ---> Jazz music genre track!
#Determining the music genre with the HIGHEST AVERAGE value of ZCR
max([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])
Output: 0.18307 ---> Metal music genre track!

Further investigation revealed that the classical genre audio sample track has a low ZCR.

print(f"Minimum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).min()}, Maximum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).max()}, Average ZCR for Classical Genre song: {librosa.feature.zero_crossing_rate(classical).mean()}")
Output: Minimum Instantaneous ZCR for Classical Genre song:0.02685, Maximum Instantaneous ZCR for Classical Genre song:0.1767, Average ZCR for Classical Genre song: 0.0982
#Determining Minimum instantaneous, Maximum instantaneous and average ZCR for pop music genre track

print(f"Minimum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).min()}, Maximum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).max()}, Average ZCR for Pop Genre song: {librosa.feature.zero_crossing_rate(pop).mean()}")
Output: Minimum Instantaneous ZCR for Pop Genre song:0.00683, Maximum Instantaneous ZCR for Pop Genre song:0.6767, Average ZCR for Pop Genre song: 0.12676

Observation: According to the results, the jazz music genre track has the lowest instantaneous and average ZCR. Further analysis revealed that even the classical music genre song has extremely low ZCR values. As a result, we can’t generalize and declare that jazz music genre songs have the lowest ZCR, because the observation varies depending on song composition. Furthermore, the metal and pop music audio sample tracks have the highest average ZCR.

Demystification via Visualization

Let us now demystify a little more with the help of visuals. In this regard, we will first use Librosa to extract the zero-crossing rate for each music genre track, and then plot the normalized ZCR for each music genre, followed by the actual (non-normalized) ZCR for each music genre.

#Specifying frame size and hop length
FRAME_SIZE = 1024
HOP_LENGTH = 512
#Extracting zero crossing rate for each music genre song using Librosa 
zcr_classical = librosa.feature.zero_crossing_rate(classical, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_blues = librosa.feature.zero_crossing_rate(blues, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_reggae = librosa.feature.zero_crossing_rate(reggae, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_rock = librosa.feature.zero_crossing_rate(rock, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_jazz = librosa.feature.zero_crossing_rate(jazz, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_country = librosa.feature.zero_crossing_rate(country, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_disco = librosa.feature.zero_crossing_rate(disco, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_hiphop = librosa.feature.zero_crossing_rate(hiphop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_metal = librosa.feature.zero_crossing_rate(metal, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_pop = librosa.feature.zero_crossing_rate(pop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
frames = range(len(zcr_classical))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
#Visualizing normalized Zero-crossing rate (ZCR) of different music genre songs
plt.figure(figsize=(20, 20))
ax = plt.subplot(5, 2, 1)
librosa.display.waveplot(classical, alpha=0.5)
plt.plot(t, zcr_classical, color="b")
plt.ylim((-1, 1))
plt.title("Classical Music Genre song")
plt.subplot(5, 2, 2)
librosa.display.waveplot(blues, alpha=0.5)
plt.plot(t, zcr_blues, color="g")
plt.ylim((-1, 1))
plt.title("Blues Music Genre song")
plt.subplot(5, 2, 3)
librosa.display.waveplot(reggae, alpha=0.5)
plt.plot(t, zcr_reggae, color="k")
plt.ylim((-1, 1))
plt.title("Reggae Music Genre Song")
plt.subplot(5, 2, 4)
librosa.display.waveplot(rock, alpha=0.5)
plt.plot(t, zcr_rock, color="#E9967A")
plt.ylim((-1, 1))
plt.title("Rock Music Genre song")
plt.subplot(5, 2, 5)
librosa.display.waveplot(jazz, alpha=0.5)
plt.plot(t, zcr_jazz, color="m")
plt.ylim((-1, 1))
plt.title("Jazz Music Genre song")
plt.subplot(5, 2, 6)
librosa.display.waveplot(country, alpha=0.5)
plt.plot(t, zcr_country, color="y")
plt.ylim((-1, 1))
plt.title("Country Music Genre song")
plt.subplot(5, 2, 7)
librosa.display.waveplot(disco, alpha=0.5)
plt.plot(t, zcr_disco, color="r")
plt.ylim((-1, 1))
plt.title("Disco Music Genre song")
plt.subplot(5, 2, 8)
librosa.display.waveplot(hiphop, alpha=0.5)
plt.plot(t, zcr_hiphop, color="#7FFF00")
plt.ylim((-1, 1))
plt.title("Hiphop Music Genre song")
plt.subplot(5, 2, 9)
librosa.display.waveplot(metal, alpha=0.5)
plt.plot(t, zcr_metal, color="#FFB90F")
plt.ylim((-1, 1))
plt.title("Metal Music Genre song")
plt.subplot(5, 2, 10)
librosa.display.waveplot(pop, alpha=0.5)
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim((-1, 1))
plt.title("Pop Music Genre song")
plt.subplots_adjust(hspace = 0.75)
zero-crossing rates of various music genre tracks

Waveplots illustrating the zero-crossing rates of various music genre tracks

#Visualizing NORMALIZED Zero-crossing rates of different music genre tracks

plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical, color="b")
plt.plot(t, zcr_blues, color="g")
plt.plot(t, zcr_reggae, color="k")
plt.plot(t, zcr_rock, color="#E9967A")
plt.plot(t, zcr_jazz, color="m")
plt.plot(t, zcr_country, color="y")
plt.plot(t, zcr_disco, color="r")
plt.plot(t, zcr_hiphop, color="#7FFF00")
plt.plot(t, zcr_metal, color="#FFB90F")
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim(0, 1)
NORMALIZED Zero-crossing

Graph depicting the NORMALIZED Zero-crossing rates of different music genre tracks

#Visualizing ACTUAL (NON-NORMALIZED) Zero-crossing rate of different music genre tracks

plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical*FRAME_SIZE, color="b")
plt.plot(t, zcr_blues*FRAME_SIZE, color="g")
plt.plot(t, zcr_reggae*FRAME_SIZE, color="k")
plt.plot(t, zcr_rock*FRAME_SIZE, color="#E9967A")
plt.plot(t, zcr_jazz*FRAME_SIZE, color="m")
plt.plot(t, zcr_country*FRAME_SIZE, color="y")
plt.plot(t, zcr_disco*FRAME_SIZE, color="r")
plt.plot(t, zcr_hiphop*FRAME_SIZE, color="#7FFF00")
plt.plot(t, zcr_metal*FRAME_SIZE, color="#FFB90F")
plt.plot(t, zcr_pop*FRAME_SIZE, color="#458B00")
plt.ylim(0, 600)
zero-crossing rates of different music genre tracks

Graph depicting the ACTUAL (NON-NORMALIZED) zero-crossing rates of different music genre tracks

Conclusion

Upon mathematical and visual inspection, we can say that the jazz and classical music genre songs have low ZCR values. And Pop and Metal music genre songs have high ZCR. However, we can not extrapolate these findings to the entire group based on the small sample size. The preceding analysis, on the other hand, may offer us a concise summary, a form of intuition, about distinct types of music genres.

Thanks for reading. If you have any questions or concerns, please leave them in the comments section below. Happy Learning!

Explore our blog for more insightful articles and stay updated on the latest trends and topics in your field!

References:

Analytics Vidhya does not own the media shown in this article; the Author uses it at their discretion.

I'm a Researcher who works primarily on various Acoustic DL, NLP, and RL tasks. Here, my writing predominantly revolves around topics related to Acoustic DL, NLP, and RL, as well as new emerging technologies. In addition to all of this, I also contribute to open-source projects @Hugging Face.
For work-related queries please contact: [email protected]

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details