sklearn.decomposition.FastICA

Fast ICA is a version of independent component analysis developed by A. Hyvarinen and E. Oja. In trying to understand how ICA can be used, I created this notebook.ICA is used to find components that contribute to a joint signal. An analogy that is often used is to use several microphones to distinguish the contributions of each instrument to a symphony. You need to have as many microphones as there instruments. I guess it would help if the microphones capture the signals from maximally different locations.First we import some stuff.
In [1]:
%pylab inline
import scipy as sp
from scipy import signal
import pylab as pl
from sklearn.decomposition import FastICA
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
Here we will create 'recordings' from two microphones of two separate 'instruments'. We start by creating two signals, one from each instrument.
In [2]:
t = sp.arange(0,50,0.01)
instrument1 = sp.sin(t/2)
instrument2 = signal.square(t, duty=.5) / 2
pl.plot(instrument1)
pl.plot(instrument2)
pl.show()
Then we mix them into two different 'recordings', each with their own noise added.
In [3]:
rec1 = instrument1 + instrument2 + sp.randn(len(t))/15
rec2 = instrument1/3 + instrument2*1.1 + sp.randn(len(t))/15
# pl.plot(rec1)
# pl.plot(rec2)
# pl.show()
symphony = sp.reshape(sp.concatenate((rec1,rec2)),[2,len(t)])
pl.plot(symphony[0,:])
pl.plot(symphony[1,:])
pl.show()
Now we will use ICA to try to recover the two original instruments. We have to initialize an ICA object, telling it at least how many components (source 'instruments') to use. The data has to be transposed for FastICA to be able to use it.
In [4]:
ica = FastICA(n_components=2, whiten=True)
ica.fit(symphony.T)
Out[4]:
FastICA(algorithm='parallel', fun='logcosh', fun_args={}, fun_prime='',
    max_iter=200, n_components=2, random_state=None, tol=0.0001,
    w_init=None, whiten=True)
We could now recover the sources.
In [5]:
sources = ica.transform(symphony.T)
pl.plot(sources[:,0])
pl.plot(sources[:,1])
pl.show()
If you plot this, notice that the scale is completely off, it could even be inverted for one or more sources and the instruments could be switched in order. However, the main characteristics of the two 'instruments' are very clear, even though the signals are noisy. Supposedly this scales well with more and more 'instruments' which is good for my purposes. However, I'm more interested in recreating the recordings minus one of the sources. Since we have only two 'instruments' that would look the same as recovering one of them, but I'm hoping to get appropriately scaled signals.
In [6]:
mixing_matrix = ica.get_mixing_matrix()
print mixing_matrix
[[-50.18887779  35.52088765]
 [-16.1972163   39.47261258]]
The columns are the source components (instruments), and the rows are the measured features (recordings). The numbers are the relative weights of the instruments in the recordings. I will set one instrument to zero and then multiply it with the reconstructed sources matrix, so that only one signal (instrument) is left.
In [7]:
# rerec1 = mixing_matrix[0,0] * sources[:,0]
# rerec2 = mixing_matrix[1,0] * sources[:,0]
# pl.plot(rerec1)
# pl.plot(rerec2)
# pl.show()
mixing_matrix[:,1] = 0
resymphony = sp.dot(mixing_matrix,sources.T)
print shape(resymphony)
pl.plot(resymphony[0,:])
pl.plot(resymphony[1,:])
pl.show()
(2, 5000)
If the first component is the sinusoid signal, the second recording should have a lower amplitude (one third of the first). If it is the square wave the second recording should have a higher amplitude (10% higher than the first). Using ICA this way, signals that are not interesting to you can be removed from data.