Error Handling

Web scraping is frequently and notoriously unreliable, because you must contend with many external factors over which you have no control. A target website may fail because of defects in the web application, or there may be problems with the Internet connectivity anywhere between you and the target website. These problems may seem negligible when you are browsing a few websites with a normal web browser, but a web-scraping agent is capable of navigating more web pages in a few hours than a human can view in an entire year. So, small glitches can become significant problems that inhibit reliable data extraction. You can minimize these factors with error handling, especially for your critical web-scraping tasks.

Error handling can be difficult to implement properly. In cases where your agent only collects non-critical data, you may decide to skip error handling and simply re-run the agent. Error handling is only important when an agent needs to deliver reliable data every time it runs.

For web scraping, there are two aspects to error handling: Agent error handling and Error logs and notifications. Agent error handling happens automatically, when specific errors occur during the execution of an agent. For example, an agent could retry a command if it fails to load a new web page. Error logs and notifications can warn an administrator when an agent encounters trouble and requires attention. Consider the case in which a target website has a new layout and the agent may no longer be able to extract data correctly. You can configure error handling for this agent so that an email notification will be sent to the administrator - who can decide how to update the agent to accommodate the new layout.

We encourage you to learn more about error handling in these sections:

•Agent error handling

•Error logs and notifications