Following advice from our Quality Assessment office (i.e. Dan), we included the information from the SecondHandSongs dataset into the track_metadata.db SQLite database. You can download the new version from this site (not from infochimps!).
What did we do exactly? We added two columns to the database: 'shs_perf' and 'shs_work'. The first one is simply the performance number on the SHS website, the default value is -1. If you want to know more, or you think we made a matching error, look at:
http://www.secondhandsongs.com/performance/<performance_ID>
For the second column, 'shs_work', it contains the clique numbers from the SHS train and test files. If we know the work on SHS, the number is positive and you can know more about it at:
http://www.secondhandsongs.com/work/<work_ID>
If the song is associated with many works, we take the one with the lowest number. If we don't know the work, it is a negative number. The default value is 0, meaning we don't have any cover song information associated with this song.
The previous statement is slightly wrong, we did not propagate the information through the known MSD duplicates. Therefore, if you really think a song is an obvious cover of another one, you might want to check if it has duplicates, maybe one of them has the cover work information.
OK, what does all this means in practice? You can explore cover songs more easily (and quickly)! Let's grab a random one in iPython:
In [1]: import sqlite3 In [2]: conn = sqlite3.connect('/home/thierry/Desktop/track_metadata.db') In [3]: q = "SELECT track_id,artist_name,title FROM songs WHERE shs_work>0 LIMIT 1" In [4]: res = conn.execute(q) In [5]: track_id,artist_name,title = res.fetchone() In [6]: print track_id+': '+artist_name+' -> '+title TRCEEUH128F42646A8: Dan Hartman -> Vertigo/Relight My Fire
Ok, we got Dan Hartman, let's check if we have his performance on SHS:
In [10]: q = "SELECT shs_perf FROM songs WHERE track_id='" + track_id + "'" In [11]: res = conn.execute(q) In [12]: res.fetchone() Out[12]: (3,)
Let's learn more at:
http://www.secondhandsongs.com/performance/3
It sounds good! Let's find other covers from that song. We start by getting the work number:
In [13]: q = "SELECT shs_work FROM songs WHERE track_id='" + track_id + "'" In [14]: res = conn.execute(q) In [15]: res.fetchone() Out[15]: (3,)
The work number is the same as the same as the performance, this is a coincidence. Let's get all tracks that have the same work, they should all be covers:
In [16]: q = "SELECT track_id,artist_name,title FROM songs WHERE shs_work=3" In [17]: res = conn.execute(q) In [18]: res.fetchall() Out[18]: [(u'TRCEEUH128F42646A8', u'Dan Hartman', u'Vertigo/Relight My Fire'), (u'TRIURBO128F425A1F3', u'Take That Featuring Lulu', u'Relight My Fire')]
We find two cover songs total, the one we knew about, and another one from "Take That".
What are you missing to start your cover songs experiments? Knowing what is in the training set. It is not included in the SQLite database at the moment, but here's how to get the list of 'training works' from the train file:
In [30]: import os In [31]: train_works = set() In [32]: f = open('/MSongsDB/Tasks_Demos/CoverSongs/shs_dataset_train.txt') In [33]: for line in f.xreadlines(): if line[0] == '%': train_works.add(min(map(lambda w: int(w), line[1:].split(' ')[0].split(',')[:-1]))) In [36]: f.close() In [37]: len(train_works) Out[37]: 4128
We found 4128 cliques, consistent with he dataset description on the official webpage. Are the two covers songs of "Relight my Fire" in the training set?
In [38]: 3 in train_works Out[38]: True
Yes!
Please leave us feedback on this mini demo.
-TBM
- millionsong's blog
- Login to post comments