Audio Denoising

Mixed Audio (Microphone 1)

Mixed Audio (Microphone 2)

How will I evaluate each algorithm?

I will draw guidelines from this paper, looking at distortion. By distortion we mean how the original signals are distorted from the mixed signals in the absence of other source signals. The equations are below.

And here is the code

Part III: Multi-source separation implementation + benchmarking

Independent Component Analysis

There is another branch of BSS that has been devleoped on top of ICAs: https://link.springer.com/chapter/10.1007/11679363_21 There has also been work dealing with creating varioushttp://www.itfrindia.org/ICCIC/Vol2/255ICCIC.pdf

Voice 1 before, mixed, and estimate Voice 2 before, mixed, and estimated

I implemented FastiA with logcosh to the problem, implementing the ICA method in this iPython notebook.

Evaluation

Below are the two audio files after going through ICA with log cosh function.

Estimated source 1

Estimated source 2

Qualitatively, FastICA preformed well for Source A, and not so well with Source B, although it significantly separated both voices. Distortion for voice 1 is 63.891529805 while voice 2 is 37.458170624.

Deep Learning

AH buzzwords. After doing some literature review (of which there isn't a lot that came up), there hasn't seemed to be an effective way to use deep learning for BSS. However, there are also few articles in general on this subject, so I decided to take a crack at it and implement one of the papers here Here, Sam and I assumed that th algorithm starts at k instead of filling with 0s or wrapping around, and here are the results of the separation.

Estimated Voice 1

Estimated Voice 2 (which it failed to extract)

Notes: Simulated annealing could be used instead of back propagation.

Evaluation Deep learning method produced an average distortion of 72.9566240393 for voice 1, and 40.26824952131 for voice 2. Here is all of my work for the code.

Compression Sensing

Compression sensing (CS) emarkably reduces the amount of sampling neededto restore a siganl exaclty - instead of sampling at least twice the frequency of a signal, CS depends on the number of non-zero frequencies. It is based on the assumption that audio signals are sparse. Here, the basis used is the Discrete Cosine Trnasofmr (DCT), and using L1 norms, we can reconstruc the original siganls. The literature review for this. Other than a few papers by Michael Z , who explores Bayesian priors of BSS to tackel the case where we do not know A, there is not as much research with BSS using compressed sensing - most CS papers are on reconstruction of one signal. A whole another question is - how to find the basis functions for each audio stream, especially for human voices? There is one demo online with CS for BSS, which fails pretty badly for voices

Here are some cool papers:

CS applied wtih ICA There isn't that much

Here are the results of ocmpression sensing, after modifying it using L1 norm to retrieve more than 1 signal at a time.

Here is the Matlab code . I also started translating to Python .

For compression sensing, we must represnet audio sources as a combiantion of basis funcitons such that

$X=AS +&\X\xi$ $S=C&\X\phi$ The literature in this field has been in the L1 norm. Here, we use the L1 norm minimization to extract the x, which is the original signals. Here, we use gradient descnet.

http://www2.ece.ohio-state.edu/~chi/papers/CompressiveBSS_ICIP2010.pdf

Futher areas to delve into

In the case of overcomplete ICA, it is still possible to identify the mixing matrix from the knowledge of x alone, although it is not possible to uniquely recover the sources s. An area to delve deper in is how to best reconstruct the unique sources. Here, we have only considered instantaenous sources. There is also BSS with regards to noise. There are two major approaches: blind source separation and spatial filtering. The first relies on the statistical independence and super-Gaussian distribution of the speech signals. The spatial filtering uses the fact that speech sources are separated in the space, which is an active field of research at Microsoft Research.

Next Steps

I want to conitnue with this - perhaps explore this further my senior year, looking specifically at how to find basis function represnetation for compressed sensing - since that seems to be the key to making blind soruce seaparrtion real time.

Blind Source Separation

Motivation and Problem

What specific problem am I solving?

How will I mark my progress?

Steps

Part I: Understanding of crucial concepts in DSP

Fourier Transform

BSS Definitions Shortcut

Audio Mixing