Saturday, January 15, 2011

Downloading all of South By Southwest

One of my annual traditions is to download all the preview music from the South by Southwest (SXSW) conference in Austin, TX. Then I can listen to it at my leisure and find new artists I like. Every year, they change the website slightly and make it a bit harder to do. In fact, last year they did this shortly after I downloaded everything, so maybe they are on to me.

Music for SXSW 2011 is already appearing on their site (the festival is in March). Over the next few months I'll update my collection and figure out what I like and don't like. So here's the quick and dirty script I worked out to grab everything. This should work on any unix-like OS with python and wget.


#! /bin/env python
import commands
import re
import time
import os

letters = ('a','b','c','d','e','f','g','h','i','j','k','l','m',
           'n','o','p','q','r','s','t','u','v','w','x','y','z')

for letter in letters:
    print "Fetching schedule for %s" % letter
    command = 'wget --random-wait -w 1 -nc -r -l1 -X film/,interactive/,music/conference,register_to_attend/ -H -D schedule.sxsw.com "http://schedule.sxsw.
com/?conference=music&lsort=name&day=ALL&a=%s#"' % letter
    commands.getstatusoutput(command)


(status,output) = commands.getstatusoutput("grep -r 2011/mp3 schedule.sxsw.com")

mp3Lines = output.split('\n')

print len(mp3Lines),"files to try to fetch"

findMp3 = re.compile('(http://audio.sxsw.com/2011/mp3/.*\.mp3)')
findFile = re.compile('(audio.sxsw.com/2011/mp3/.*\.mp3)')

for mp3Line in mp3Lines:
    result = findMp3.search(mp3Line)
    if result:
        destination = findFile.search(mp3Line).group(0)
        if os.path.exists(destination):
            print "Existing file",result.group(0)
            continue

        print "Fetching file",result.group(0)
        command = 'wget -m "%s"' % result.group(0)
        (status,output) = commands.getstatusoutput(command)
        time.sleep(2)

2 comments:

  1. It's a little weird that you have to type out all of the letters. Python doesn't seem to have a much better way of doing this, however. The best I found was this:

    [chr(n) for n in range(ord('a'), ord('z')+1)]

    ReplyDelete
  2. Yes, perl just has (a..z) I think and I was hoping python had xrange('a','z') but no dice.

    ReplyDelete