Spectrum Interpolation Synthesis for the Compression of Musical Signals

This  demonstration accompanies the MTech. dissertation by Ashish Kumar Malot under the guidance of Prof. Preeti Rao and Prof. V.M.Gadre. The title and abstract of the thesis are given below followed by the results. 

Title: MPEG4 Structured Audio: A Compression Model for Harmonic Musical Sounds
Abstract
In our study we have investigated algorithmic sound models for low bit rate coding of musical signals. We have presented a technique for completely automatic analysis/synthesis of harmonic signals. The technique uses the spectral interpolation synthesis (SIS) model for harmonic signals. The parameters needed for sound synthesis using the SIS model are the fundamental frequency and the amplitude spectrum of each pitch period. We have presented effective methods for automatic pitch extraction, spectrum estimation and synthesis of harmonic musical signals. A pitch-adaptive frequency domain estimation procedure is applied to the input music signal to extract the pitch contour and the harmonic spectrum amplitudes for each period. An algorithm for the smoothing of the pitch contour to eliminate coarse errors arising at note transitions has been proposed. A modified error criterion has been proposed for the selection of spectra for synthesis. The signal is synthesized on a period by period basis by adjusting the phases of the contributing harmonics to ensure continuity and absence of audible artifacts across pitch transitions. The technique has been applied on several examples of segments from wind and bowed string instruments. It is found that typically a perceived quality matching the original is obtained even when large portions of the waveform are generated by interpolation, implying a high degree of compression. Further, there is a graceful degradation in quality as the extent of interpolation is increased which makes the model well suited for use in a scalable coding framework.

Performance of Pitch Contour Smoothing

Sr. No.

Name (*.wav)

Pitch contour before and after smoothing

1 flute
2 clarinet
3 saxophone
4 trumpet
flute1

 

Performance of SIS in Scalable Framework

BL stands for the base layer.

EL stands for the enhancement layer.

R is the ratio of chosen spectra to the total number of spectra expressed in percentage.

SNR is the segmental-SNR computed with respect to the waveform synthesized by using

all the spectra.

 

Sr. No. Name (*.wav) Layer R (%) SNR (dB) Synthesized waveform (*.wav)
1 flute BL+EL1+EL2

BL+EL1

BL

10.96

7.80

4.64

28.61

24.44

20.37

s_flutebpe12

s_flutebpe1

s_fluteb

2 clarinet BL+EL1+EL2

BL+EL1

BL

1.98

1.46

0.95

34.14

31.39

28.75

s_clarinetbpe12

s_clarinetbpe1

s_clarinetb

3 saxophone BL+EL1+EL2

BL+EL1

BL

6.29

4.29

2.28

29.26

25.49

21.81

s_saxophonebpe12

s_saxophonebpe1

s_saxophoneb

4 trumpet BL+EL1+EL2

BL+EL1

BL

15.64

11.00

6.37

27.59

23.96

19.90

s_trumpetbpe12

s_trumpetbpe1

s_trumpetb

5 flute1 BL+EL1+EL2

BL+EL1

BL

17.47

12.51

7.55

29.66

24.59

19.11

s_flute1bpe12

s_flute1bpe1

s_flute1b