Short-Time Chirp Transform

December 7, 2008

Problem:
Smeared FFT representation of harmonic lines in case of changing pitch.

Method Description:
To replace the harmonics in the Fourier transform with Chirps.

stft

stcht2

Replacement of the harmonics in Fourier trf by properly designed chirps provide a new orthogonal transform, which over-performs Fourier significantly in cases like mentioned above. The gain we get in enhanced T-F representation could be well used for speech enhancement and other methods, as discussed in our papers.

References:
[1] Weruaga, L. and Képesi, M: “Speech analysis with the Short-time Chirp transform”, 8th European Conf. on Speech , EUROSPEECH 2003, Geneva, Sept 2003, vol.I, pp.53-56.
[2] Képesi, M. Weruaga, L.: “Speech Analysis with the Fast Chirp Transform, ” EUSIPCO 2004, the 12th European Signal Processing Conference, Wien, Austria, 7-10 September 2004
[3] Weruaga, L. and Képesi, M.: “EM-driven Stereo-like Gaussian Chirplet Mixture Estimation”, ICASSP 2005, IEEE International Conference on Acoustics, Speech, and Signal Processing. March 1923, IV, pp. 473-476, 2005, Philadelphia, USA.
[4] L. Weruaga and M. Képesi, “Self-organizing chirp-sensitive artificial auditory cortical model,” Interspeech 2005, pp. 705-708, Lisboa (P), Sep. 2005.
[5] M. Képesi, L. Weruaga, “Adaptive chirp-based time-frequency analysis of speech signals”, Speech Comm., vol.48, pp. 474-492, 2006.
[6] L. Weruaga, M. Képesi, “The fan-chirp transform for non-stationary harmonic sounds”, Signal Proc., vol. 87, pp. 1504-1522, 2007.

[7]
R Dunn, TF Quatieri, “Sinewave Analysis/Synthesis Based on the Fan-Chirp Tranform,” IEEE Workshop on Applications of Signal Processing to …, 2007
[8] Macej Bartkowiak, “Application of the Fan-Chirp Transform to Hybrid Sinusoidal+Noise Modeling of Polyphonic Audio,” Eusipco 2008, Lausanne, Switzerland
[9] Pei Zhao; Zhiping Zhang; Xihong Wu, “Monaural speech separation based on multi-scale Fan-Chirp Transform,” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, March 31 2008-April 4 2008 Page(s):161 – 164
[10] Ha Nguyen, Luis Weruaga: “Time–Frequency Analysis of Vietnamese Speech Inspired on Chirp Auditory Selectivity,” Book Series Lecture Notes in Computer Science, pp. 284-295, Springer Berlin / Heidelberg, Volume 5351/2008, 2008,
ISBN 978-3-540-89196-3

[11] “FAN CHIRP TRANSFORM FOR MUSIC REPRESENTATION”, P. Cancela, E. Lopez, M. Rocamora, DAFx 2010.

Auditory Feedback for Simulating Attention

December 7, 2008

The Trigger:
a need for a pitch estimation method that keeps track of the speaker under interest. In other words a method that knows what to follow in order not to lose the pitch trajectory even in case of having other active speakers in background (cocktail party effect).

Method Description:
The “what to follow” was the main question… I picked up the auditory model-based pitch estimation method, the block scheme of which is depicted below..

auditory_pitch_estim

.. and designed the “Enhance + summa” module in a way that it can accept feedback information form the estimated pitch value, and boost the estimation, if it was correct:

enhance_plus_sum

The block scheme clearly indicates that the answer to the “what to follow?” question is the “formant envelopes”, sampled at the estimated pitch value. Below an example, showing the internal states of the estimation module while enhancing the channels belonging to one speaker, with another active speaker in the background.

processing_example

References:
[1] Képesi, M.: “Auditory Model-Based Tracking of Mixed Acoustic sources,” Proc. of SPRA 2003, Rhodos, Greece 2003.

Links:
More detailed description here

Binary Spectrogram Mapping

December 7, 2008

The idea:
In 1999, after implementing the RASTA speech enhancement method on a Motorola DSP, I was thinking about doing the modifications of the spectrogram not only over the time axis, but to create time-frequency islands (masks) that represent speech activity in time-frequency, ie. the shape of such islands can be anything, but must be continuous, as will represent formant evolutions in the TF space.

Method description:
The method consists of 6 steps:
1. Transforming the 1D speech signal into a 2D spectrogram (ie. time-frequency-energy representation).


2. Continuous evaluation of the signal-to-noise ratio (SNR) and Voice Activity (VAD) in every FFT frequency band.
3. Creating the islands-like continuous binary mask based on the common VAD onsets and offsets of several frequency bands.

image024
4. Smoothing of the binary mask by 2D filtration of the island edges.

image026
5. Weighting the original spectrogram by the created mask representing (hopefully) the speech part in TF.
6. Re-synthesis of the spectrogram to 1D speech signal by using IFFT and OLA.

image042

Credits:
Martin Plsek, BUT for implementing the band-based VAD in yr.2000.

References:
This method was first published in 2000 at the TSP confer under the name “Spectrogram Mapping Method (SMM)”. Since then other teams targeted the same problem and came out with similar solutions under different names (YY, ZZ).

[1] Képesi, M. – Plšek, M.: “One-Channel Speech Separation Using Spectrogram Modifications,” Proc. of the Czech-German Workshop on speech processing, Prague, September 2001, pp.75-7x, ISBN 8086269078.
[2] Képesi, M. Macku, J.: “One-Channel Speech Separation Techniques,” In Proc. of Telecommunications and Signal Processing 2000, Brno, 6-7. 9.2000, ISBN:8072041614, pp.130-133.

Related Links:
DESCRIPTION and audio examples

Hello world!

December 7, 2008

The aim of these pages is to present the ideas I have been working on since 1998, including my stay at BUT, ftw and SPSC. These research activities are from the field of Digital Signal Processing (mostly Digital Speech Processing), and are related to the following topics:

. speech signal quality enhancement (noise suppression),
. high-resolution time-frequency representations,
. high resolution pitch estimation, and
. new signal representations for speaker localization and acoustic beam-forming, in the 3D Time-Frequency-Space.

Posts will be refined over time, and links will be added with related activities of other teams and related open-source applications and toolboxes.


Follow

Get every new post delivered to your Inbox.