The Internal Database

<< Click to Display Table of Contents >>

Navigation:  Data >

The Internal Database

Content Grabber uses two layers of internal data, internal data and export data. While the agent is running, it continuously saves data in an internal format. When the agent finishes, it converts the internal data into export data, and then it sends this data to the export target so it becomes external data.

 

The default internal database is a SQLite database, but this can be changed SQL Server or MySQL. Oracle and OleDB are not currently supported as internal databases.

 

Internal Data

Content Grabber stores internal data in the database - which contains all the data extraction elements for any agent, but also contains the data which corresponds to the settings for the agent properties. For example, if an agent stops, crashes, or otherwise fails, the internal data is used to start the agent again exactly at the point of failure.  Also, the internal data will always contain a data table for each container command in the agent and at least one data field for each capture command in the agent. You can always view the internal data by clicking the Data > View Internal Data menu.

 

If you configure an agent to add new data to the existing data, then the internal data store will continue to grow larger. Otherwise, the internal data store will be recreated every time the agent is run.

 

Export Data

The agent stores the export data in the internal database, and this data doesn't contain any of the property data for the agent itself. Also, the export data is more readable than the internal data. The export data will usually not contain a data table for each container command in the agent, and some capture commands may not have corresponding data fields.

 

Export data is always overwritten, so you cannot add data to existing export data. You can only add data to existing internal data. You can always view the internal data by clicking the Data > View Export Data menu.

 

External Data

External data is the same as the export data, and the agent delivers it to an external data store chosen by the user. Typically, external data is overwritten, except when you choose Export last data segment only and you are exporting to a database. In that case, the operation simply adds the data to the external database tables.

 

You can also use a Data Export Script to customize the export process which will allow you to update or add to existing data. The export data is set in a fixed format, but you have some options for changing this format. You need to use an Data Export Script if you want to automatically create or configure your own data structures within this data.

 

Internal Database

The Internal Database window can be opened from the Data menu in the Content Grabber editor. You can use this window to change the internal database. The default internal database is a SQLite file database, but you can change it to either a SQL Server or MySQL database. Changing the internal database from SQLite to SQL Server or MySQL can increase performance of agents significantly.

 

internalDatabase
Internal database settings.

 

 

The Old data option on the Internal Database window allows you to control how long extracted data is kept in the internal database. The following options are available.

 

Option

Description

Delete

All previously extracted data is deleted when an agent starts a new run.

Keep All and Export

Extracted data is never deleted from the internal database, and all extracted data is exported to the chosen export target.

 

This option is often used when an agent has been configured to extract only new data. The agent can check previously extracted data and stop when it reaches data that has already been extracted.

Keep Some and Don't Export

The agent will keep data from the last successful run, but it will only export data from the current run.

 

This option is often used when previously extracted data can be used to increase performance of an agent. For example, if the agent downloads large files, it maybe able to use information on the website to see if a file has changes, and if a file has not changed, then copy the file from the previously extracted data rather than downloading the file again.

 

The Embed files in database option is used to control whether extracted files are stored in the internal database or on disk.

 

The Track latest changes option can be used to keep track of the latest changes that have been made to extracted data. The agent will mark extracted data as deleted, modified or added. See the Change Tracking topic for more information.

 

When changes are made to an agent, the internal database tables may need to change as well. For example, if the agent is modified to extract more data fields, the internal database tables need more columns to store the new data. Content Grabber needs to recreate the internal database tables when such agent changes are made, and this will remove all existing data in those tables. This will reset change tracking, and could have other serious consequences if an agent is configured to use previously extracted data.

 

The option Allow automatic overwrite of internal database can be used to control whether an agent is allowed to recreate the database tables. An error occurs if a change is made to an agent that requires the database tables to be recreated, and the agent is not allowed to do so. In this case a user may be able to manually make changes to the internal database tables to accommodate the modified agent without loosing existing data.

 

Important Notice About SQL Server and MySQL

WARNING: It is important to ensure that no two agents that use the same internal database have the same agent name.

 

If two agents use the same internal database, it's important they don't use the same agent name. When an agent recreates the internal database it first removes all tables for that agent by deleting tables starting with the name of the agent.

 

For example, if the agent uses the following internal tables, it would remove all tables starting with SWSCR [Agent Name].

 

SWSCR [Agent Name] AGENT PROCESS

SWSCR [Agent Name] AGENT PROCESS

SWSCR [Agent Name] AGENT BROWSER

SWSCR [Agent Name]

SWSCR [Agent Name] [Command Name 1]

SWSCR [Agent Name] [Command Name 2]

 

This approach creates an obvious problem when two agents use the same internal database and also start with the same name, because Content Grabber would remove the tables for both projects when it should only remove tables for one project. It is important to ensure that no two agents that use the same internal database have the same agent name.