Perl automatic mp3 audiobook downloader

The Dutch Public broadcasting services have a number of nice audiobooks which you can listen for free. This is a great service, but most content is limited to online listening and tries to prevent you from downloading the mp3’s.
Since i would like to take the audiobooks with me on my mp3-player and listen to them when i am offline (for instance in an airplane) i build a perl script to download the audio files.

This is a two-step approach. The website with the audiobooks (for instance ) mostly uses a flash player to play the audio content to you. On a Linux or Mac computer you can install the package “dsniff” which contains “urlsnarf” and that shows you the URL’s which are retrieved on you computer. For instance for one of the audiobooks it would show the URL’s below when you click on all parts:

[Dumper ~]# urlsnarf -i en0
urlsnarf: listening on en0 [tcp port 80 or port 8080 or port 3128]
192.168.0.81 - - [27/Dec/2014:17:51:23 +0100] "GET http://audio.omroep.nl/nps/mp3/bommel/728.mp3 HTTP/1.1" - - "http://www.nps.nl/nps/bommel/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
192.168.0.81 - - [27/Dec/2014:17:51:23 +0100] "GET http://content1b.omroep.nl/urishieldv2/l27m197ec2b3199d940300549ee38a000000.4d9438a4b00d03fe7216163718cb969c/nps/mp3/bommel/728.mp3 HTTP/1.1" - - "http://www.nps.nl/nps/bommel/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
192.168.0.81 - - [27/Dec/2014:17:51:25 +0100] "GET http://audio.omroep.nl/nps/mp3/bommel/729.mp3 HTTP/1.1" - - "http://www.nps.nl/nps/bommel/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
192.168.0.81 - - [27/Dec/2014:17:51:25 +0100] "GET http://content1b.omroep.nl/urishieldv2/l27m1f5f84ba73bd18fe00549ee38d000000.633a459a44db1793d6d37aa0d9989da2/nps/mp3/bommel/729.mp3 HTTP/1.1" - - "http://www.nps.nl/nps/bommel/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
etcetera... 

The “-i en0” after urlsnarf means listen on interface en0, this is typical for a macbook, if you use Linux the interface might be named “eth0” for a fixed network connection or “wlan0” for the wifi connection.

The relevant URL’s which contain the audio are the one’s with urishieldv2 in it, this probably refers to a mechanisme which tries to prevent you from easily predicting the download links, since there is a hash kind of variable in every link that changes with every filename.
So, when we are on the page with all the parts, we click on every part untill all are captured by urlsnarf.
This info is copied and saved in a file, for instance urlsnarf.txt.
And then the perl script is run with this data as input, for instance:

./parser_getter.pl < urlsnarf.txt

#!/usr/bin/perl
#
# 2014-12-26 version 1.0 (c) Ewald

use strict; 
use warnings; 

use LWP::UserAgent;

my $DEBUG = 1;
my $count = 1;

# initialize pointer for HTTP fetch
my $ua = LWP::UserAgent->new();
    
# Pretend to be Chrome on a Mac
$ua->agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");

while(<>) {
	if (/(http\:.*urishield\S+)/) {		# select URL's with urishield in it, discard the rest
		my $URL = $1;
		$DEBUG && print "found URL: $URL\n";

		# Get the mp3
    		my $response = $ua->get($URL);
    
    		unless($response->is_success) {
        		print "Error: " . $response->status_line;
    			} #unless
		
		# Let's save the output, every name gets an incrementing part number
	    	my $save = "part_" . $count++ . ".mp3";
    
    		unless(open SAVE, '>' . $save) {
        		die "Cannot create output file $save";
    		} #unless
    
    		print SAVE $response->content;
    
    		close SAVE;
    
    		$DEBUG && print "Saved " .  length($response->content) .  " bytes of data to '$save'\n\n";

		} #if
	} #while

you can download it here: url=https://www.oiepoie.nl/source/parser_getter.pl
When you run the script, the output should be something like this:

[Dumper]% ~/perl/parser_getter.pl < urlsnarf.txt
found URL: http://content1a.omroep.nl/urishieldv2/l27m5c68424d3983d34a00549edd0b000000.888031fd36578e71965fd87fe4c47691/nps/mp3/bommel/736.mp3
Saved 26060333 bytes of data to 'part_1.mp3'

found URL: http://content1a.omroep.nl/urishieldv2/l27m6daf9f5b7fa2e2ab00549eddb3000000.08e4416181ef0929a56f61c29e631010/nps/mp3/bommel/737.mp3
Saved 27389443 bytes of data to 'part_2.mp3'

found URL: http://content1b.omroep.nl/urishieldv2/l27m255eef306a86a3a200549edd0e000000.f74dd78d1b80d166ea63332169f77b43/nps/mp3/bommel/738.mp3
Saved 28820536 bytes of data to 'part_3.mp3'

found URL: http://content1b.omroep.nl/urishieldv2/l27m703f9ed30dc0c86900549edd10000000.77a82e78b996ded8d01f8b358842ea45/nps/mp3/bommel/739.mp3
Saved 28018054 bytes of data to 'part_4.mp3'

found URL: http://content1c.omroep.nl/urishieldv2/l27m448b988943e1683b00549edd12000000.c86acdb1faa882f8eba785b06c0859d7/nps/mp3/bommel/740.mp3
Saved 27468856 bytes of data to 'part_5.mp3'

found URL: http://content1a.omroep.nl/urishieldv2/l27m31eb2dca29a8561b00549edd14000000.51573901492484e6775ff27900eaf84c/nps/mp3/bommel/741.mp3
Saved 28096630 bytes of data to 'part_6.mp3'

found URL: http://content1c.omroep.nl/urishieldv2/l27m72acd31f5e2b5b3800549edd15000000.fe03a7383128d51fed8c739a4ca20cf4/nps/mp3/bommel/742.mp3
Saved 27464676 bytes of data to 'part_7.mp3'

found URL: http://content1a.omroep.nl/urishieldv2/l27m692578b42b7f0e4c00549edd17000000.4d1b1b8d94b8c570cb727c77782b55c6/nps/mp3/bommel/743.mp3
Saved 26377982 bytes of data to 'part_8.mp3'

found URL: http://content1c.omroep.nl/urishieldv2/l27m2c27b9f73b52190e00549edd1a000000.8368a180c635f2b970010f932e3b2fcc/nps/mp3/bommel/744.mp3
Saved 27148699 bytes of data to 'part_9.mp3'

Have Fun!

Author: Ewald

The grey haired professor