Here is an example of the 2-dimensional latent space we obtained when training on brass and organ sounds from NSynth.
Note: All the sounds are sustained sounds both when we trained and sampled from the network.
CAUTION: Some of the sounds can be loud, adjust volume accordingly!
We play examples of sounds sampled from each cluster of the latent space and we also play similar sounds from the training set to compare them.
The network has only been trained on odd MIDI pitches and we generate samples conditioned on even MIDI pitches. (The latent space is different from the previous one but has a very similar structure)
Final note: The sampled sounds can for now be considered as of subpar quality. Those are early results which are merely a proof of concept of our network, reducing the audio frames representation to only 2 dimensions. What we want to emphasize is how smooth the interpolation between the two sounds is, and that the network produces a somewhat consistent timbre even through a continuous frequency sweeping.