| br>Impressive as a demo, but ultimately useless for musicians due to audio quality issues (8-bit mu-law encoding). This shows in the examples (bass).
Granted, it's difficult to get much beyond the 8-bit output layer. Without a manageable output layer the network would be impossible to train, and 256 states is already pushing it. However, it should be possible to make better use of the 8 bits.
For instance, a differential encoding/noise shaping filter run at a higher sampling rate (88.2) would shift most of the compression artifacts to frequencies which cannot be heard. Since the noise shaping filter is linear, it can be embedded in the NN in a straightforward way.
I'm sure they've tried something like this, but there may be a need for training examples with non-garbage information at very high frequencies, which requires a secondary learning phase to reconstruct this information. This could be done as a multi-objective learning.
Still, it's disappointing that there isn't much attention to audio quality, or indeed any use case outside of clickbait articles or a 30 second demo.
Probably, the most interesting uses of this technology will be those that emphasize the artifacts. br> br>