Find all songs from a list of artists

We explain how to efficiently get all the songs from a list of artists using SQLite temporary tables. It is particularly useful for tasks such as tag prediction where train and test lists have been established.

We assume we have the file 'artists.txt' whose content is:


We load it with the following code using iPython:

In [2]: artists = []
In [3]: f = open('artists.txt','r')
In [4]: for line in f.xreadlines():
   ...:     if line == '' or line.strip() == '':
   ...:         continue
   ...:     artists.append( line.strip() )
In [5]: f.close()

We then connect to the SQLite database track_metadata.db:

In [7]: import sqlite3
In [8]: conn = sqlite3.connect('/track_metadata.db')

We create a temporary table with our list of artists. Temporary tables are kept in memory, and disappear when the connection is closed. This table will be called 'myartists' and have one column: 'artist_id'.

In [12]: q = "CREATE TEMP TABLE myartists (artist_id TEXT)"
In [13]: res = conn.execute(q)
In [14]: conn.commit()
In [15]: for aid in artists:
   ....:     q = "INSERT INTO myartists VALUES ('"+aid+"')"
   ....:     res = conn.execute(q)
In [16]: conn.commit()

Now, we select from the main tables all the track_id whose artist_id is also in 'myartists'. We use an SQL JOIN for this.

In [17]: q = "SELECT track_id FROM songs JOIN myartists"
In [18]: q += " ON myartists.artist_id=songs.artist_id"
In [19]: res = conn.execute(q)
In [20]: data = res.fetchall()

We know have the information in data. We found 131 tracks:

In [21]: len(data)
Out[21]: 131
In [22]: data[:3]
Out[22]: [(u'TRNPEOL12903CBC2B8',), (u'TRPVEJE12903CBC2BC',), (u'TRXICUN12903CEE131',)]

Don't forget to close the connection!


And that's it! Of course, this could be modified to search for other things, for instance all the tracks from specific years.