As it has been recently announced, we were very fortunate to partner with the website www.secondhandsongs.com to identify covers in the Million Song Dataset.
The SecondHandSongs dataset is officially released, and we are confident that it is the largest one out there. It has about 18K songs, versus ~700 songs for this task at MIREX.
We hope that it brings cover song recognition to a whole new level. First, we believe that many algorithms will simply break and never finish on such an amount of data. For instance, the test set would imply doing 1,500 x 1e6 comparisons, enjoy your DTW-based algorithm! Secondly, we are looking forward to algorithms that actually trains on cover sets (or "cliques"). This is an amazing challenge for machine learning, similar to learning classes of objects from totally different pictures. No one knows exactly where to start, and this is exciting!
A few words on releasing additional datasets, linking different sources of data through the MSD is a unique opportunity, and there is some momentum building up. Many companies have heard about the project, thanks to articles like the one in ars technica.. This is a nice introduction, way more efficient than if a random lab contacts a random company that doesn't care much about academia. Surfing that momentum, we hope to announce a lyrics dataset soon, and maybe more. And obviously, we would like to extend the invitation to join this project to any company out there that has some data about music. If you know someone (a startup, an online radio in your region, ...), please send us the contact. We'll work with them, respect whatever copyright requirement they have, do all we can to generate publicity for their contribution, and we'll give something useful to the MIR community.
Otherwise, enjoy the SecondHandSongs dataset, and please give us feedbak on how you use it! We'll be happy to advertise your work on this website, and we're just curious in general ;)
-TBM
P.S. Let me plug something unrelated: please see our "Call For Suggestions" for f(MIR) 2011!
- millionsong's blog
- Login to post comments