!ffmpeg -i Test.m4a Test.wav -v quiet -y
—author: Nipun Batrabadges: truecategories:- signal-processing- audio- filtering- digital-signal-processing- python- scipydate: ’2021-06-18’description: Audio filtering techniques and applicationsoutput-file: 2021-06-18-audio-filters.htmltitle: Audio filterstoc: true—
Introduction
In this post I will look into some filters for audio processing in ffmpeg, sox, and Python. I have recorded a small 6 second audio clip where for the first couple of seconds I was not speaking, but background noise is present.
I had recorded the audio on my Apple device and it was default recorded in .m4a
format. I convert it to the wav
format. I use ffmpeg for the same. In addition, I am using two flags: -v quiet
to reduce the amount of information printed on the console. Second, I am using -y
to overwrite an existing file with the same name.
from IPython.display import Audio
import matplotlib.pyplot as plt
%matplotlib inline
"Test.wav") Audio(
!ffmpeg -i Test.wav -lavfi showspectrumpic=s=720x540:color='magma' ../images/input-spectogram.png -y -v quiet
As can be seen in the above image, I am speaking somewhere close to 3.70 seconds onwards. However, the audio is pretty noisy before this even though I am not speaking. This is due to the background noise coming in from the fans and the air conditioning system.
!sox Test.wav -n spectrogram -o ../images/sox-sg.png
!sox Test.wav -n rate 32k spectrogram -o ../images/sox-sg-trimmed.png
I’ll now get some attributes of the post that are required for processing, such as the recording rate. ## Getting attributes of the recorded file
!ffmpeg -i Test.wav
ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 12.0.5 (clang-1205.0.22.9) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'Test.wav': Metadata: title : Test encoder : Lavf58.76.100 Duration: 00:00:06.63, bitrate: 768 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s At least one output file must be specified
As can be seen from the cell above, the recording rate is 48 kHz. We will need this when we do some processing in Python.
Building a noise profile from first 3 second
!ffmpeg -i Test.wav -ss 0 -to 3.5 -c copy Noise-Test.wav -v quiet -y
'Noise-Test.wav') Audio(
!sox Noise-Test.wav -n rate 32k spectrogram -o ../images/sox-noise.png
!sox Noise-Test.wav -n noiseprof noise.prof
!sox Noise-Test.wav Noise-Test-cleaned.wav noisered noise.prof 0.21
"Noise-Test-cleaned.wav") Audio(
!sox Test.wav Test-cleaned-05.wav noisered noise.prof 0.05
!sox Test.wav Test-cleaned-18.wav noisered noise.prof 0.18
!sox Test.wav Test-cleaned-21.wav noisered noise.prof 0.21
"Test-cleaned-05.wav") Audio(
"Test-cleaned-18.wav") Audio(
"Test-cleaned-21.wav") Audio(
!sox Test-cleaned-21.wav -n rate 32k spectrogram -o ../images/sox-cleaned-21.png
!sox Test-cleaned-05.wav -n rate 32k spectrogram -o ../images/sox-cleaned-05.png
"Test-audacity.wav") Audio(
!sox Test-audacity.wav -n rate 32k spectrogram -o ../images/sg-audacity.png
!ffmpeg -i Test.wav -filter:a "highpass=f=300" high-passed.wav -y -v quiet
"high-passed.wav") Audio(
!sox high-passed.wav -n rate 32k spectrogram -o ../images/highpass.png
"test-imovie.wav") Audio(
!sox test-imovie.wav -n remix 1 rate 32k spectrogram -o ../images/imovie.png
import mediapy
= mediapy.read_image('../images/sox-sg-trimmed.png')
orig = mediapy.read_image('../images/sg-audacity.png')
audacity = mediapy.read_image('../images/sox-cleaned-21.png')
sox_21 = mediapy.read_image('../images/sox-cleaned-05.png')
sox_05 = mediapy.read_image('../images/highpass.png')
high_pass_300 = mediapy.read_image('../images/imovie.png')
imovie
'Original':orig,
mediapy.show_images({'Audacity':audacity,
'Sox:0.21':sox_21,
'Sox:0.05':sox_05,
'HPF:300': high_pass_300,
'imovie':imovie},
='magma', columns=4, height=200 ) cmap
Original
|
Audacity
|
Sox:0.21
|
Sox:0.05
|
HPF:300
|
imovie
|
!sox test-audacity.wav output.dat
import pandas as pd
= pd.read_csv("output.dat", skiprows=2, index_col=0, names=['values'],delim_whitespace=True)
df = df.astype('float64') df
df.plot()