I've only been working with the subset for the moment, trying to get my head around how I can make use of this data, and what I'd really like to do is compare the metadata (e.g. location, artist name etc.) with analysis information (e.g. density, beats, segments etc.).
I have no idea how to achieve this since as far as I can tell, these different kinds if information are contained in different groups within the hdf5 summary file, and the only open-source programme I can get to read HDF5 files and allow me to filter then (ViTables in Ubuntu) won't allow me to filter between different groups.
I've been looking for ways to combine the data into a single summary group, but so far no luck. Has anyone out there already done this successfully?
I'm a real newbie at this data mining game, but I hope this can lead me onto some really interesting research opportunities, and give me something to write a thesis on.
I downloaded the whole dataset through infochimps and compared the checksums i got with md5sun (on ubuntu 10.10) with the checksums given by you. None of the checksums matched. For example I got:
Hi Rasmus, you're probably alright, you're the second person mentioning to us that the checksum seem wrong. We don't now how infochimps keeps the data, they might have moved / uncompressed / ... the files since they gave us the checksums. Soon, the checksum will be directly included in their API.
- user uploads a short sample (10sec) of unknown song/music
- the server do some magic with the dataset
- the the server will output: artist, songname, etc...?
No, you're looking for a fingerprinter, for instance Shazam: http://www.shazam.com/
The Echo Nest is building one to, but I don't believe it is available at the moment.
Single summary file containing both analysis and metadata?
Hi all,
I've only been working with the subset for the moment, trying to get my head around how I can make use of this data, and what I'd really like to do is compare the metadata (e.g. location, artist name etc.) with analysis information (e.g. density, beats, segments etc.).
I have no idea how to achieve this since as far as I can tell, these different kinds if information are contained in different groups within the hdf5 summary file, and the only open-source programme I can get to read HDF5 files and allow me to filter then (ViTables in Ubuntu) won't allow me to filter between different groups.
I've been looking for ways to combine the data into a single summary group, but so far no luck. Has anyone out there already done this successfully?
I'm a real newbie at this data mining game, but I hope this can lead me onto some really interesting research opportunities, and give me something to write a thesis on.
Thanks in advance for any and all assistance.
Are the checksums correct?
Hi!
I downloaded the whole dataset through infochimps and compared the checksums i got with md5sun (on ubuntu 10.10) with the checksums given by you. None of the checksums matched. For example I got:
md5sum A.tar.gz
2a4a4d7230ddd9739e00a20cae848000 A.tar.gz
And your checksum is:
s3://clients.infochimps.com/millionsong/A.tar.gz
2011-01-28 18:55 8251685808 a4ebc00350644bf21bc065c782dd0e0d-16
It might of course be that there was something wrong with all of my files. Am I doing something wrong when comparing checksums ?
/Rasmus
you're probably fine
Hi Rasmus, you're probably alright, you're the second person mentioning to us that the checksum seem wrong. We don't now how infochimps keeps the data, they might have moved / uncompressed / ... the files since they gave us the checksums. Soon, the checksum will be directly included in their API.
In the meantime, the best way to check your download is to run this python code.
https://github.com/tb2332/MSongsDB/blob/master/PythonSrc/DatasetCreation...
it simply opens every file and read all fields it expects to find. It can take a few hours, but if it does not crash, you're fine.
We'll try to solve this MD5 issue once for all.
TBM
List of unique terms (Echo Nest Tags)
I'd like to download the list of unique terms (Echo Nest Tags) for Automatic Tagging task (http://labrosa.ee.columbia.edu/projects/millionsong/files/unique_terms.txt) but the link keeps sending me to the home page. Is the file missing?
Indeed, a wrong link...
Sorry for that, here is the list:
http://millionsongdataset.com/sites/default/files/Additiona...
I'll try to find the wrong link you had, if you can point me to the page where you got it, it would be even easier.
Cheers!
It's the following possible
It's the following possible with the dataset?
- user uploads a short sample (10sec) of unknown song/music
- the server do some magic with the dataset
- the the server will output: artist, songname, etc...?
fingerprinter
No, you're looking for a fingerprinter, for instance Shazam: http://www.shazam.com/
The Echo Nest is building one to, but I don't believe it is available at the moment.
You guys rock!
Which is your favorite track, of the million?
Favorite track
And how would you measure it?