ReliabilityWeb scraping is notoriously unreliable and will often fail because of problems you have no control over. We understand that reliability is extremely important in many situations, so we have tackled this difficult issue head on and added strong support for debugging, error handling and logging.
DebuggingContent Grabber has one of the best debuggers of any web automation software, and this will help you build reliable agents where all issues that can be resolved at design time are resolved at design time.
Error handlingMany web scraping errors are unavoidable even with the best designed agents, and this is where error handling comes into play. One example could be an unreliable website that suddenly starts returning only error pages, and requires a web browser restart to start functioning again.
Error recoveryMany dynamic websites have bugs causing errors that are impossible to handle gracefully. Dynamic websites are small applications running in your web browser, and they may crash, hang, leak memory or cause many other fatal issues.
Content Grabber uses a health monitor process that looks for problems in the running web browsers, and restarts browsers that have run into trouble. A restarted web browser will continue from the point where it failed, so in most situations, this will not cause any interruption to the web scraping process.
Logging & notificationsSome website errors may occur very rarely, and may be impossible to catch during debugging. An example could be CAPTCHA protection that appears after hours of web scraping, or simply a broken Internet connection. Content Grabber can log all activity and errors, including the full HTML of web pages that are causing problems. This makes it much easier to identify runtime errors and take appropriate action to resolve these.
Notifications can be used to notify an administrator about specific problems, such as missing web content or other errors.
Content Grabber can email status reports to an administrator when errors or notifications have occurred during web scraping.
Ease-of-UseThe Content Grabber agent editor has a typical point and click user interface where you click on the content you want to extract, or on the buttons and links you want to follow.
The agent editor sets itself apart from the crowd with its built-in smarts that automatically detect and configure all commands. It will automatically create lists of content and links, handle pagination and web forms, download or upload files, and configure any other action you perform on a web page. At the same time, you always have the option to manually fine tune the commands, so Content Grabber gives you both simplicity and control.
The Content Grabber agent editor is so simple to use that it can easily be used by beginners, and the built-in smarts enable users to quickly build large numbers of web scraping agents.
DataData is everything when it comes to web scraping. Content Grabber allows you to load data from any source and use it in your agents for anything you need. You can also export extracted data to almost anywhere. This flexibility is key - enabling your technology to grow with your business.
Once data has been extracted and exported, it can be distributed by email, FTP or a custom defined destination.
Agent Management ToolsContent Grabber is designed to manage hundreds of agents in a professional web scraping environment with development, testing and productions servers.
Logs, schedules and status information for all agents can be managed in one centralized location, and all proxies, database connections and script libraries can be managed on a per server basis.
ScriptingNo one wants to write scripts to get things done and with Content Grabber you rarely have to. However, if you have some unusual requirements, or you need to fine tune some process, it's nice to know the ability is there.
Content Grabber has a fully-fledged built-in script editor with IntelliSense that is more than capable when building smaller scripts.
Distribute Executable Agents Royalty FreeBuild royalty free self-contained web scraping agents that can run anywhere without the Content Grabber software. A self-contained agent is a single executable file that is easy to send or copy anywhere, and has a multitude of powerful configuration options.
You are free to sell or give away your self-contained agents and you can add promotional messages and advertisements to the agents' user interface. Content Grabber imagery / adverts are also included. Note: If you want to white-label your self-contained agent you will need to use the Premium Edition of Content Grabber.
Command-lineYou can run agents from the command-line by using the Content Grabber command-line program. With this you can specify command-line parameters that can easily be used as input data by your agents.
The Premium addition includes all of the features listed above under Professional Edition as well as those below.
Visual Studio 2013 IntegrationContent Grabber can integrate with Visual Studio 2013 for the most powerful script editing, debugging and unit testing features.
Custom Display TemplatesThe standard configuration screens of a self contained agent include promotional messages for Content Grabber. Custom HTML display templates allow you to remove these promotional messages and add your own designs to the screens - effectively allowing you to white label your self-contained agent.
Command-line (royalty free distribution)The command-line program can run without the Content Grabber software and can be distributed royalty free.
Programming InterfaceThe Content Grabber API can be used to add web automation capabilities to your own desktop and web applications.
The dedicated web API has minimal dependencies and requires no special security privileges, so its very easy to use in a web environment. The web API does require access to the Content Grabber Windows service, which is part of the Content Grabber software, and must be installed on the web server or a server accessible to the web server.