VaPar Synth - A Variational Parametric Model for Audio Synthesis

Krishna Subramani$^{1}$, Preeti Rao$^{1}$, Alexandre D'Hooge$^{2}$

IIT Bombay$^{1}$ , ENS Paris-Saclay$^{2}$

This accompanying notebook contains audio examples and illustrations supporting our paper

Current Method in Literature :

Paper Flowchart

The above framewise log-magnitude spectra reconstruction procedure is described in [1]. We implemented it as shown using an Autoencoder, and observe that the reconstruction cannot generalize to pitches the network has not been trained on. For demonstration, we train the network including and excluding MIDI 63 along with its 3 neighbouring pitches on either side :

(a) Including MIDI 63

MIDI 60 61 62 63 64 65 66
Kept $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$

(b) Excluding MIDI 63

MIDI 60 61 62 63 64 65 66
Kept $\checkmark$ $\checkmark$ $\checkmark$ $\times$ $\checkmark$ $\checkmark$ $\checkmark$

We then give MIDI 63 as an input to both the above cases, and see how well the network can reconstruct the input

In [1]:
import IPython.display as ipd
print('Input MIDI 63 Note')
ipd.display(ipd.Audio('./ex/D#_19_og.wav'))
print('(a) Reconstructed MIDI 63 Note when trained Including MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_trained_recon_stft_AE.wav'))
print('(b) Reconstructed MIDI 63 Note when trained Excluding MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_skipped_recon_stft_AE.wav'))
Input MIDI 63 Note
(a) Reconstructed MIDI 63 Note when trained Including MIDI 63