Technological Trials, Tribulations, and Triumphs: Downloading all of South By Southwest

One of my annual traditions is to download all the preview music from the South by Southwest (SXSW) conference in Austin, TX. Then I can listen to it at my leisure and find new artists I like. Every year, they change the website slightly and make it a bit harder to do. In fact, last year they did this shortly after I downloaded everything, so maybe they are on to me.

Music for SXSW 2011 is already appearing on their site (the festival is in March). Over the next few months I'll update my collection and figure out what I like and don't like. So here's the quick and dirty script I worked out to grab everything. This should work on any unix-like OS with python and wget.

#! /bin/env python

import commands

import re

import time

import os

letters = ('a','b','c','d','e','f','g','h','i','j','k','l','m',

           'n','o','p','q','r','s','t','u','v','w','x','y','z')

for letter in letters:

    print "Fetching schedule for %s" % letter

    command = 'wget --random-wait -w 1 -nc -r -l1 -X film/,interactive/,music/conference,register_to_attend/ -H -D schedule.sxsw.com "http://schedule.sxsw.

com/?conference=music&lsort=name&day=ALL&a=%s#"' % letter

    commands.getstatusoutput(command)

(status,output) = commands.getstatusoutput("grep -r 2011/mp3 schedule.sxsw.com")

mp3Lines = output.split('\n')

print len(mp3Lines),"files to try to fetch"

findMp3 = re.compile('(http://audio.sxsw.com/2011/mp3/.*\.mp3)')

findFile = re.compile('(audio.sxsw.com/2011/mp3/.*\.mp3)')

for mp3Line in mp3Lines:

    result = findMp3.search(mp3Line)

    if result:

        destination = findFile.search(mp3Line).group(0)

        if os.path.exists(destination):

            print "Existing file",result.group(0)

            continue

        print "Fetching file",result.group(0)

        command = 'wget -m "%s"' % result.group(0)

        (status,output) = commands.getstatusoutput(command)

        time.sleep(2)

2 comments:

Charles PlagerJanuary 21, 2011 at 1:14 PM
It's a little weird that you have to type out all of the letters. Python doesn't seem to have a much better way of doing this, however. The best I found was this:

[chr(n) for n in range(ord('a'), ord('z')+1)]
Eric VaanderingJanuary 21, 2011 at 1:15 PM
Yes, perl just has (a..z) I think and I was hoping python had xrange('a','z') but no dice.

Technological Trials, Tribulations, and Triumphs

Saturday, January 15, 2011

Downloading all of South By Southwest

2 comments:

About Me

Blog Archive

Followers

Saturday, January 15, 2011

Downloading all of South By Southwest

2 comments:

About Me

Blog Archive

Subscribe To

Followers