A few words on the upcoming ISMIR conference in Miami, we just received the news that our tutorial on the Million Song Dataset was accepted!
We felt (and received some specific emails) that many people were at least intrigued by the dataset and would want to play with it, but did not know where to start. It is true that if you don't usually use python or SQL, have never heard of HDF5, and don't possess a lab computer with 300 GB of memory, the MSD can seem a little... overwhelming?
There are also people that got early on board with the MSD, but are still not aware of all the available resources (cover songs, lyrics, USPOP and the Beatles dataset, ...). And there are talks of new additional corpus before ISMIR.
The main intent of the tutorial is to answer these questions. We will go through the data, show what's in there, and give you lessons learned on how to deal with it efficiently. We hope for "fast enough", live machine learning demos, and if so all code will be provided. Emphasis will be put on connecting different types of data (lyrics -> tags, tags -> years, years -> audio features, etc). So, no matter what is your knowledge of the MSD, consider joining us for the tutorial!
+ if I use the word "HDF5" more than a hundred times, I think Brian Whitman fires me from my internship, so there will even be some drama!
By the way, I (Thierry Bertin-Mahieux) will give the tutorial with Matt Hoffman; Dan Ellis will be active in the preparation and hopefully he'll be present too.
Since I'm talking about ISMIR, a few words on the paper we ask people to cite if they use the MSD. Yes, at the moment it is a submitted paper that you can't read, so yes, it is a little cumbersome. We had the following dilemma / constraints:
1) we felt that this dataset should absolutely be presented at ISMIR, we want people (especially from other fields) to refer to ISMIR when they work on it,
2) if we had waited for ISMIR to announce the dataset, well... that would have been pretty mean to retain all that data for 10 months. Plus, one of the few requests of The Echo Nest for giving away their data is that they were hoping for some academic feedback within a year. Hence, we had to release it.
All this means that, yes, we made the bet that we prepared a good paper (and a good dataset) and that it will get accepted at ISMIR. And yes, we ask people to cite it even though they'll see it only when it gets accepted (before the final revision, though).
We also concluded that our submission will not be anonymous no matter how, we apologize to the reviewers.
Put simply, ISMIR 2011 will be awesome (my intuition tells me that f(MIR) is also preparing great stuff for you), and we hope to see you all there!
-TBM
P.S. nothing to do with anything, but here is the cutest kitten video ever
- millionsong's blog
- Login to post comments