Friday 4 March 2011

10.00pm. Okay so I having quite a productive day in the office, I decided to continue my brain workout by picking up some more Perl code.

I wrote a couple of scripts that are able to download data from search pages, including a version that is able to pass login information, this made the whole challenge slightly more difficult, in the end I was able to utilise the --keep-session-cookies & --save-cookies/--load-cookies arguments of wget to achieve my goals.

The next step was to read in an array of search variables to pass to wget, easily done. I then started looking at automating the whole process by looking at search forms that used unique id's in their POST data; with some friendly $int++ I was able to generate a whole bunch of searches with success.

Next step: Trawling through the pages to extract the data I require.

No comments:

Post a Comment