Audio Filtering on the command line and Python

ML
Author

Nipun Batra

Published

June 18, 2021

Introduction

In this post I will look into some filters for audio processing in ffmpeg, sox, and Python. I have recorded a small 6 second audio clip where for the first couple of seconds I was not speaking, but background noise is present.

I had recorded the audio on my Apple device and it was default recorded in .m4a format. I convert it to the wav format. I use ffmpeg for the same. In addition, I am using two flags: -v quiet to reduce the amount of information printed on the console. Second, I am using -y to overwrite an existing file with the same name.

!ffmpeg -i Test.m4a Test.wav -v quiet -y
from IPython.display import Audio
import matplotlib.pyplot as plt
%matplotlib inline
Audio("Test.wav")
!ffmpeg -i Test.wav -lavfi showspectrumpic=s=720x540:color='magma' ../images/input-spectogram.png -y -v quiet

As can be seen in the above image, I am speaking somewhere close to 3.70 seconds onwards. However, the audio is pretty noisy before this even though I am not speaking. This is due to the background noise coming in from the fans and the air conditioning system.

!sox Test.wav -n spectrogram -o ../images/sox-sg.png

!sox Test.wav -n rate 32k spectrogram  -o ../images/sox-sg-trimmed.png 

I’ll now get some attributes of the post that are required for processing, such as the recording rate. ## Getting attributes of the recorded file

!ffmpeg -i Test.wav
ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers
  built with Apple clang version 12.0.5 (clang-1205.0.22.9)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Test.wav':
  Metadata:
    title           : Test
    encoder         : Lavf58.76.100
  Duration: 00:00:06.63, bitrate: 768 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s
At least one output file must be specified

As can be seen from the cell above, the recording rate is 48 kHz. We will need this when we do some processing in Python.

Building a noise profile from first 3 second

!ffmpeg -i Test.wav -ss 0 -to 3.5 -c copy Noise-Test.wav -v quiet -y
Audio('Noise-Test.wav')