The above framewise log-magnitude spectra reconstruction procedure is described in [1]. We implemented it as shown using an Autoencoder, and observe that the reconstruction cannot generalize to pitches the network has not been trained on. For demonstration, we train the network including and excluding MIDI 63 along with its 3 neighbouring pitches on either side :
(a) Including MIDI 63
MIDI | 60 | 61 | 62 | 63 | 64 | 65 | 66 |
---|---|---|---|---|---|---|---|
Kept | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
(b) Excluding MIDI 63
MIDI | 60 | 61 | 62 | 63 | 64 | 65 | 66 |
---|---|---|---|---|---|---|---|
Kept | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\times$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
We then give MIDI 63 as an input to both the above cases, and see how well the network can reconstruct the input
import IPython.display as ipd
print('Input MIDI 63 Note')
ipd.display(ipd.Audio('./ex/D#_19_og.wav'))
print('(a) Reconstructed MIDI 63 Note when trained Including MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_trained_recon_stft_AE.wav'))
print('(b) Reconstructed MIDI 63 Note when trained Excluding MIDI 63')
ipd.display(ipd.Audio('./ex/D#_19_skipped_recon_stft_AE.wav'))