Vision:
Imagine a surveillance system looking for keywords (example: airports) and showing on a camera recording from where in the far crowd the words are coming from…
The idea:
is to add extra dimensions to the short-time fourier transform, leading to a time-frequency-space representation. Why? To assign spectral bins not only to time instances but also to points in space. Such multidimensional representation is a decomposition of the signal trough frequency (DFT), plus time (STFT) and “space”, leading to the definition of the Short-Time Spatial Fourier Transform (ST-SpFT).
How?
Important thing is to note that the Spatial Fourier Transform (SFT) can be only applied on multi-channel audio with very exact synchronisation of the channels. One way of this multidimensional decomposition is through WEIGHTING THE FFT OF INDEPENDENT MICROPHONE CHANNELS by rotating their phase vectors according to the scanned DoA candidates.
Applications:
- (NOT BLIND!) source separation (we know the spectrum of the sound and its origin in space)
- Acoustic source localisation
The challenges:
- one is to implement the decomposition in a way that provides clear time-frequency space (simply: time-frequency-DoA (TFD)) atoms
- another one is the fast implementation of this TFD decomposition ()
Possible derivates of the idea:
- Spatial Wavelet Transform (replace FFT with DWT)
- Spatial Harmonic Chirp Transform (ev. spatial Short-Time Harmonic Chirp Transform)
Nice images and code coming soon.
RELATED PROJECTS:
1) MarPanning from Marsyas is providing visualisation of spectral bins through the “pan” space, derived from stereo musical recordings. MarPanning seems to use angle information of cross-spectrum bins, meaning that every frequency bin is assigned ONLY TO ONE SPATIAL (Left-Right)
Nice Vide here.






