Cover Songs in the SQLite database

Submitted by millionsong on Sun, 03/27/2011 - 12:42

Following advice from our Quality Assessment office (i.e. Dan), we included the information from the SecondHandSongs dataset into the track_metadata.db SQLite database. You can download the new version from this site (not from infochimps!).

What did we do exactly? We added two columns to the database: 'shs_perf' and 'shs_work'. The first one is simply the performance number on the SHS website, the default value is -1. If you want to know more, or you think we made a matching error, look at:
http://www.secondhandsongs.com/performance/<performance_ID>

For the second column, 'shs_work', it contains the clique numbers from the SHS train and test files. If we know the work on SHS, the number is positive and you can know more about it at:
http://www.secondhandsongs.com/work/<work_ID>
If the song is associated with many works, we take the one with the lowest number. If we don't know the work, it is a negative number. The default value is 0, meaning we don't have any cover song information associated with this song.

The previous statement is slightly wrong, we did not propagate the information through the known MSD duplicates. Therefore, if you really think a song is an obvious cover of another one, you might want to check if it has duplicates, maybe one of them has the cover work information.

OK, what does all this means in practice? You can explore cover songs more easily (and quickly)! Let's grab a random one in iPython:

In [1]: import sqlite3
In [2]: conn = sqlite3.connect('/home/thierry/Desktop/track_metadata.db')
In [3]: q = "SELECT track_id,artist_name,title FROM songs WHERE shs_work>0 LIMIT 1"
In [4]: res = conn.execute(q)
In [5]: track_id,artist_name,title = res.fetchone()
In [6]: print track_id+': '+artist_name+' -> '+title
TRCEEUH128F42646A8: Dan Hartman -> Vertigo/Relight My Fire

Ok, we got Dan Hartman, let's check if we have his performance on SHS:

In [10]: q = "SELECT shs_perf FROM songs WHERE track_id='" + track_id + "'"
In [11]: res = conn.execute(q)
In [12]: res.fetchone()
Out[12]: (3,)

Let's learn more at:
http://www.secondhandsongs.com/performance/3

It sounds good! Let's find other covers from that song. We start by getting the work number:

In [13]: q = "SELECT shs_work FROM songs WHERE track_id='" + track_id + "'"
In [14]: res = conn.execute(q)
In [15]: res.fetchone()
Out[15]: (3,)

The work number is the same as the same as the performance, this is a coincidence. Let's get all tracks that have the same work, they should all be covers:

In [16]: q = "SELECT track_id,artist_name,title FROM songs WHERE shs_work=3"
In [17]: res = conn.execute(q)
In [18]: res.fetchall()
Out[18]: 
[(u'TRCEEUH128F42646A8', u'Dan Hartman', u'Vertigo/Relight My Fire'),
 (u'TRIURBO128F425A1F3', u'Take That Featuring Lulu', u'Relight My Fire')]

We find two cover songs total, the one we knew about, and another one from "Take That".

What are you missing to start your cover songs experiments? Knowing what is in the training set. It is not included in the SQLite database at the moment, but here's how to get the list of 'training works' from the train file:

In [30]: import os
In [31]: train_works = set()
In [32]: f = open('/MSongsDB/Tasks_Demos/CoverSongs/shs_dataset_train.txt')
In [33]: for line in f.xreadlines():
    if line[0] == '%':
        train_works.add(min(map(lambda w: int(w), line[1:].split(' ')[0].split(',')[:-1])))
In [36]: f.close()
In [37]: len(train_works)
Out[37]: 4128

We found 4128 cliques, consistent with he dataset description on the official webpage. Are the two covers songs of "Relight my Fire" in the training set?

In [38]: 3 in train_works
Out[38]: True

Yes!
Please leave us feedback on this mini demo.

-TBM

millionsong's blog
Login to post comments

Cover Songs in the SQLite database

News

Quick links

Main contact