Tuesday, January 8, 2008

Set Timeout While Spidering A Site


Though I heavily depend on urllib2 module to develop web crawler, but sometimes the crawlers just stuck ... :(. So it's necessary to set a timeout but unfortunately urllib2 doesn't provide anything for this purpose. So we have to depend on socket module. here is the code that I use:


import socket

timeout = 300 # seconds
socket.setdefaulttimeout(timeout)

No comments:

Post a Comment