This might be considered trivial, but dealing with a million songs requires some helper code. In particular the following is not a good idea:
ls main_dir/*/*/*/*.h5
In the following python code, we count all .h5 files in a given directory, including ALL subdirectories. If you start from the head of the MSD directory, the result should be 1 million.
import os import glob def count_all_files(basedir,ext='.h5') : cnt = 0 for root, dirs, files in os.walk(basedir): files = glob.glob(os.path.join(root,'*'+ext)) cnt += len(files) return cnt
This code can easily be transformed to apply a function to all files, for instance get the title of each song. We use hdf5_getters.py, the python wrapper for the HDF5 song files. Make sure this file is in your PYTHONPATH so it can be imported.
import os import glob import hdf5_getters def get_all_titles(basedir,ext='.h5') : titles = [] for root, dirs, files in os.walk(basedir): files = glob.glob(os.path.join(root,'*'+ext)) for f in files: h5 = hdf5_getters.open_h5_file_read(f) titles.append( hdf5_getters.get_title(h5) ) h5.close() return titles
In Matlab, can you do the same? Argh... Look at this post, that's the best hack we know. If you want some example of this, look at tutorial 2.
- Login to post comments