# VaPar Synth - A Variational Parametric Model for Audio Synthesis¶

## Current Method in Literature :¶

The above framewise log-magnitude spectra reconstruction procedure is described in [1]. We implemented it as shown using an Autoencoder, and observe that the reconstruction cannot generalize to pitches the network has not been trained on. For demonstration, we train the network including and excluding MIDI 63 along with its 3 neighbouring pitches on either side :

(a) Including MIDI 63

MIDI 60 61 62 63 64 65 66
Kept $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$

(b) Excluding MIDI 63

MIDI 60 61 62 63 64 65 66
Kept $\checkmark$ $\checkmark$ $\checkmark$ $\times$ $\checkmark$ $\checkmark$ $\checkmark$

We then give MIDI 63 as an input to both the above cases, and see how well the network can reconstruct the input

In [1]:
import IPython.display as ipd
print('Input MIDI 63 Note')
ipd.display(ipd.Audio('./ex/D#_19_og.wav'))
print('(a) Reconstructed MIDI 63 Note when trained Including MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_trained_recon_stft_AE.wav'))
print('(b) Reconstructed MIDI 63 Note when trained Excluding MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_skipped_recon_stft_AE.wav'))

Input MIDI 63 Note

(a) Reconstructed MIDI 63 Note when trained Including MIDI 63