The intern team at Infobright ported the Million Song Dataset to a relational database, and you can easily get it for yourself!
A million HDF5 files is a good way to contain all the data and create subsets, but obviously it is a problematic format for trends analysis. A database like Infobright's is a great answer to this issue. Plus it's faster! If you're currently using our SQLite databases and think they are rather limited, this project might be for you.
Two remarks:
- as far as we can tell, not all the data was ported, tags were compressed in 'genres', average and standard deviations were kept for pitch and timbre data, etc. Be sure it contains the information you need before moving a whole project to that format.
- the MSD team was not involved in the creation of the Infobright database (aside of a few emails), therefore all questions should be addressed to them directly.
Congratulations to the Infobright team! Cool data should come in many forms.
-TBM
- millionsong's blog
- Login to post comments