## Saturday, September 12, 2009

### Stop With The Emails!

As most of you have noticed my Oracal program stopped working. Recently Google redesigned their results, and this messed it up. For one thing they removed the file size after the URL ( - 18k - ), this is what I used to find the URL. They also shortened 'Similar Pages' to just 'Similar', this was how I found the lines with the URLs on them. I could have possibly gotten around this, but I instead decided to change the method for harvesting URLs. Instead of grabbing them directly from the search result page, I'd just get them from the friendly list of links lynx put at the end of the dump. The reason I resisted this at first was that that list would have every link on the page in it. So there would probably be lots of links that weren't search results. Well after actually looking at the links almost all the non result links can be filtered with a few simple rules (just removing anything with 'google' in it got rid of most). So I've updated the code and the program should now work again.

In celebration of this glorious occation, I've asked Oracal some questions, here's what it had to say:
