Once in a while, a paper comes out that is ambitious, yet simple (to read, certainly not to do it first), and overall just very very cool! And when the experiments could perfectly fit the MSD, it gives me a reason to blog about it. So, how do Google researchers learn everything about music at once?
The work is Multi-Tasking with Joint Semantic Spaces for Large-Scale Music Annotation and Retrieval by Jason Weston, Samy Bengio and Philippe Hamel, recently published in the Journal of New Music Research.
The main idea? Let's learn a representation of audio that is proper to solve ALL the following tasks at the same time:
- artist prediction
- song prediction
- similar songs
- tag prediction
- similar artists
All of the above are defined as a ranking problem, for instance rank me artists starting with the one that most likely performed that song. The learning meta algorithm is roughly:
- pick a random task
- pick a random pair for that task
- update your whole model with a small gradient step
and that's it! ok, I'm still trying to figure out how much magic there is in the "WARP loss" they use, and also how powerful their audio representation is.
They do present very positive results on the MagnaTagATune dataset, so it works. And this model seems easy to scale and parallelize.
At this point you should read the paper and make your own opinion, but I hope to see more of this kind of work at ISMIR. And imagine all the other tasks (to be learn at once) we could add to the list, sky is the limit! And if you're willing to try Echo Nest features as a starting block, this algorithm seems made for the MSD. Ok, we're missing song-level tags and song to song similarity, but let's just say we are working on that.
By the way, it also makes me think of this other ambitious work (A Roadmap Towards Versatile MIR) by teams at INRIA and the University of Tokyo where all MIR data is fit by a graphical model. Is it the same goal, but through a bayesian vs predictive point of view?
Congrats to the authors, this paper should be included in all MIR reading groups.
--TBM
- millionsong's blog
- Login to post comments