As a blog must start somewhere, I decided to revisit papers from ISMIR 2010 in search of recent work on large-scale MIR. My search is not exhaustive, my first criteria are a paper's title and its abstract. My definition of large-scale is also ill-defined, all I know is that using GZTAN or CAL-500 is not large enough. But here are some of my findings, somewhat organized.
Starting with work involving audio features.
Schnitzer et al. test their islands of gaussians on 16K songs. Gaussians are used to reduce dimensionality and enable the use of a SOM map with larger datasets.
Hamel and Eck use MajorMinor to train their deep belief network and I hope to see additional work with more data as these networks have been used on very large collections of images.
M. Mandel's new work on automatic tagging used many datasets including MajorMinor and one created from Mechanical Turk, which could easily scale.
From the same lab, Bergstra and al. study covariance for automatic tagging and test their results on MajorMinor and Magnatagatune.
E. Vincent's Roadmap Towards Versatile MIR does not talk about large scale data, but the bayesian network they present will necessarily need HUGE AMOUNT of data to be trained. We are looking forward to the follow-ups on this work.
As a commercial demonstration, Meemix showed a system based on 100K manually annotated songs.
Finally, our own work on clustering was done using 40,000 songs whose features came from The Echo Nest API.
Now, work not involving audio features.
Van Zaanen and Kanters' paper on mood classification uses lyrics and user data from the Moody application by Crayonroom. The evaluation is done on 10,000 entries and it seems there could be more available.
In Schedl et al.'s paper on evaluating artist popularity per country, authors mined the web and track 201,135 unique artist names. No audio features are used in this work, but it is a large number of artists! (3 times as many as in this dataset).
On a similar note, Markus Schedl also shows how to use microblogs and he easily retrieves 640K terms from them.
Another set of experiments on P2P network is Koenigstein et al. work on P2P-based recommendations. The final experiment is based on 400 artists, but they have information on 1/2 million songs after a 24h crawl.
Knees et al. aim at doing recommendation on a large corpus of music based on related text found online.
Miller et al.'s prototype geoshuffle uses GPS data to improve music recommendation. Experiments were done using Magnatagatune.
Finally, two tools to work with large-scale data.
Gordon is an interesting exploration tool for large collections of music. We plan to integrate it with the MSD. YAAFE is a promising feature extraction tool, it takes 11 minutes to run on 40 hours of music (mono at 32KHz).
All I showed above is impressive and a good starting point, but I'm really looking forward to someone simply crunching a lot of features in his model.
TBM
Totally unrelated: cool pics of the day
- millionsong's blog
- Login to post comments