Content Grabber Basics

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Quick Start with Content Grabber >

Content Grabber Basics

Web-scraping tools generally use macros or configuration methods, and follow a sequential list of commands. The macro approach is more user friendly and automatically records the actions of a user in a browser. However, there are typically restrictions on accessing the code behind the agent. The configuration approach allows the user to directly configure each part of the agent. They can introduce more code structure, controls, data refinements, or add their own naming conventions.

 

Content Grabber gives you the option to either follow the simple macro automation methods, or to take direct control over the treatment of each element and command within your agent.

 

Content Grabber Agent Development

With Content Grabber, you can visually browse the website and click on the data elements in the order that you want to collect them. Based on the content elements selected, Content Grabber will automatically determine the corresponding action type and provide default names for each command as it builds the agent for you.

 

Content Grabber main screen2

Content Grabber main screen - building CarPoint Agent

 

A Content Grabber agent is a collection of commands which are executed in serial until completed. The commands can either be actions (such as a jump to a URL) or data capture commands (e.g. capture text). These commands are recorded in order of execution in the Agent Explorer panel of the Content Grabber screen.

 

Agent Explorer

Agent Explorer panel with New Agent commands

 

If you want to make other adjustments or gain more control of your commands, you can make changes in the Configure Agent Command panel.

 

Configure Agent Command 1

Configure Agent Command panel

 

You can also add new commands to your agent, or configure existing ones. To do this, you simply click twice on any web element (content item) and the Content Grabber Message window will appear. From here you can select the command type you want and add it to the Agent Explorer.

 

Content Grabber Message window popup

Content Grabber Message window pop-up

 

Content Grabber Data Outputs

After you have finished building you Agent and run it for the first time, Content Grabber saves the data locally in a structured database format. Content Grabber can export extracted web data as a report or to numerous different database types. Data output options include CSV, Excel, XML, SQL Server, MySQL, Oracle and OleDB.

 

Output Data Options

Content Grabber's Data Configuration window

 

You can also use a Content Grabber export script to completely customize the data export to your own database structures.

 

Scheduling

Content Grabber provides an Agent scheduling facility that enables you to automatically run your agent at predetermined time slots whenever you need it to run. This can be done every hour, every day, month, year and so on.

 

Agent Scheduler

Content Grabber's Scheduling Window