Using the Content Grabber Agent Service

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Programming Interface > Building a Web Application >

Using the Content Grabber Agent Service

Content Grabber includes a Windows service that can be used to run agents. Your web application communicates with the Windows service using a small proxy assembly that requires no special security privileges and depends on no other files. You simply add the proxy assembly to your web application's assembly references and use the proxy to call the API functions. The proxy assembly contains the same methods as the standard API, except for functions used to load and save agents. The proxy can only execute agents, not load agents, because the proxy is designed to rely on no other assemblies and loading an agent would require the agent definition classes found in the full Content Grabber API.

 

Before you can use the proxy to communicate with the Windows Service, you need to connect to the service. You use the proxy method Connect to connect to the service. The default connection string is:

 

http://localhost:8001/ContentGrabber

 

If your Content Grabber service is located on a remote server, you can change localhost to the name or IP address of the remote server. The service is listed on port 8000 by default, but you can change the port in the Content Grabber editor. The Content Grabber service is stopped by default and configured to start manually. If you are going to use this service, you should configure the service to start automatically.

 

The Content Grabber service is configured to logon as the local System account by default. You can change this account if you wish, but if you decide to continue using the local System account, you must make sure this account has proper access to the Internet through Internet Explorer. For example, Internet cookies may have been disabled for the System account, or there may be other Internet restrictions for this account. To test and change Internet setting for the System account, open the Content Grabber editor and click the Test IE as System button in the Application menu. Pressing this button will open Internet Explorer using the System account and you can then test Internet access and change settings in Internet Explorer if required.

 

If your Content Grabber service is located on a remote server, you can change localhost to the name or IP address of the remote server. The service is listening on port 8001 by default, but you can change the port in the Content Grabber editor. The Content Grabber service is stopped by default and configured to start manually. If you are going to use this service, you should configure the service to start automatically.

 

The Content Grabber service is configured to logon as the local System account by default. You can change this account if you wish, but if you decide to continue using the local System account, you must make sure this account has proper access to the Internet through Internet Explorer. For example, Internet cookies may have been disabled for the System account, or there may be other Internet restrictions for this account. To test and change Internet setting for the System account, open the Content Grabber editor and click the Test IE as System button in the Application menu. Pressing this button will open Internet Explorer using the System account and you can then test Internet access and change settings in Internet Explorer if required.

 

All Windows services, including the Content Grabber agent service, run in a special Windows session that cannot interact with users and cannot display user interfaces. JavaScript on some websites does not work correctly without at least a hidden web browser window. Such websites are rare, but if you need to run such a website through the Content Grabber agent service, the service needs to run the agent in a normal user session. In order for the Windows service to start an agent in a normal user session the following three conditions must be met.

 

1.You must set the agent option Run Interactively.

2.The Content Grabber agent service must run under the System account.

3.A user must be logged onto the computer while the agent runs. The agent will run in the Windows session of the logged in user, but in the security context of the System account.

 

Example

The following example connects to a local Content Grabber agent service and runs an agent in a new session. It also sets the log level to high and specifies that log information should be written to file.

 

string sessionId = Guid.NewGuid().ToString();

AgentProxy proxy = new AgentProxy(@"C:\Users\Public\Documents\Content Grabber\

Agents\qantas\qantas.scg", sessionId);

proxy.Connect("http://localhost:8001/ContentGrabber");

AgentSettings settings = new AgentSettings();

settings.LogLevel = AgentLogLevel.High;

settings.IsLogToFile = true;                

proxy.StartAgent(settings);

 

You can connect to the Content Grabber agent service again at any time to get status information about a specified agent running in a specified session.

 

AgentProxy qantasProxy = new AgentProxy(@"C:\Users\Public\Documents\Content Grabber\

Agents\qantas\qantas.scg", sessionId);

qantasProxy.Connect("http://localhost:8001/ContentGrabber");

AgentStatus status = qantasProxy.GetAgentStatus();

 

If you are running an agent from a web application, you could use AJAX callbacks to get status information about a running agent. You only have to keep track of the session ID between callbacks.

 

Proxy Functions

The following functions are available in the proxy assembly:

Function

Description

AgentProxy(string agentNameOrPath, string sessionId)

Instantiates a new proxy with the specified agent and session ID. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:

 

C:\Users\Public\Documents\Content Grabber\Agents

 

AgentProxy(string agentNameOrPath)

Instantiates a new proxy without a session. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:

 

C:\Users\Public\Documents\Content Grabber\Agents

 

void Connect(string endPointAddress)

Connects to the Content Grabber agent service. You can specify the server name or IP address and port number. The default connection string for a local service is:

 

http://localhost:8000/ContentGrabber

 

void CloseConnection()

Closes the connection to the Content Grabber agent service.

void StartAgent()

Starts the agent specified when instantiating the proxy. The agent will run asynchronously.

void StartAgent(AgentSettings settings)

Starts the agent with additional settings. The agent will run asynchronously. See below for more information about the AgentSettings class.

void StopAgent()

Stops the agent if it is currently running asynchronously.

void CloseAgentSession()

Closes an agent session after the agent has been run asynchronously. When you close an agent session, all data associated with that session is removed and you will not be able to retrieve status information about the agent that ran in this session. You can only close a session if an agent is not currently running in the session.

 

You don't need to close a session. Session data will be removed automatically after the agent has completed running and the session timeout has elapsed. The default session timeout is 30 minutes, so by default session data will be removed automatically 30 minutes after the agent has completed.

AgentStatus GetAgentStatus()

Returns status information about an agent that has been run asynchronously. See below for more information about the AgentStatus class.

DataTable GetAgentProgressAsDataTable()

Returns progress information in a DataTable about an agent running in asynchronously. See below for more information about the information returned.

DataTable GetAgentProgressAsJson()

Returns progress information as JSON about an agent running in asynchronously. See below for more information about the information returned.

DataTable GetAgentLogAsDataTable(int offset, int limit)

Returns log data in a DataTable for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned.

 

offset (optional): Index of the first log entry to return.

Limit (optional): Index of the last log entry to return.

string GetAgentLogAsJson(int offset, int limit)

Returns log data as JSON for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned.

 

offset (optional): Index of the first log entry to return.

Limit (optional): Index of the last log entry to return.

DataSet GetAgentDataAsDataSet(int offset, int limit)

Returns extracted data in a DataSet for an agent that has been run asynchronously.

 

offset (optional): Index of the first data entry to return.

Limit (optional): Index of the last data entry to return.

string GetAgentDataAsJson(int offset, int limit)

Returns extracted data as a JSON string for an agent that has been run asynchronously.

 

offset (optional): Index of the first data entry to return.

Limit (optional): Index of the last data entry to return.

string GetAgentDataAsXml(int offset, int limit)

Returns extracted data as an XML string for an agent that has been run asynchronously.

 

offset (optional): Index of the first data entry to return.

Limit (optional): Index of the last data entry to return.

RunAgentReturnJson(string agentNameOrPath, string sessionId, int limit)

Runs an agent synchronously and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

Limit (optional): Maximum number of data rows to return.

RunAgentReturnXml(string agentNameOrPath, string sessionId, int limit)

Runs an agent synchronously and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

Limit (optional): Maximum number of data rows to return.

RunAgentReturnDataSet(string agentNameOrPath, string sessionId, int limit)

Runs an agent synchronously and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

Limit (optional): Maximum number of data rows to return.

RunAgentReturnJson(AgentSettings settings, int limit)

Runs an agent synchronously with additional settings and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

See below for more information about the AgentSettings class.

 

Limit (optional): Maximum number of data rows to return.

RunAgentReturnXml(AgentSettings settings, int limit)

Runs an agent synchronously with additional settings and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

See below for more information about the AgentSettings class.

 

Limit (optional): Maximum number of data rows to return.

RunAgentReturnDataSet(AgentSettings settings, int limit)

Runs an agent synchronously with additional settings and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

See below for more information about the AgentSettings class.

 

Limit (optional): Maximum number of data rows to return.

 

AgentSettings

The following agent settings can specified when running an agent:

Property

Description

bool LogLevel

Log detail level. Set the log level to None to turn off logging.

bool IsLogHtml

Logs the raw HTML of all web pages processed by the agent.

bool IsLogToFile

Logs data to a file instead of a database table.

int Timeout

This value specifies the session timeout in minutes when an agent is run asynchronously. All session data is removed automatically when the agent has completed and this timeout has elapsed. The default session timeout is 30 minutes.

 

This value specifies the maximum number of seconds an agent will run when it's run synchronously. When the timeout is reached, the agent will stop and close its session if it's run in a session. The default timeout is 30 seconds.

Dictionary<string, string> InputParameters

A list of input parameters.

GlobalData

Any serializable data object can be stored in this dictionary and will be available to all scripts in an agent. Notice that input parameters will eventually be stored in this dictionary as well, so it doesn't matter if you use input parameters or global data to store your input data.

ProxyList

A list of web proxies that will be used by the agent. If a proxy list is specified, it overrides any default proxy settings in the agent.

 

AgentStatus

An agent can provide the following status information:

Property

Description

AgentRunningStatus RunStatus

The RunStatus can be one of the following values.

 

Completed. The agent has completed successfully.

Incomplete. The agent has completed, but stopped prematurely. The agent may have been stopped manually.

Failed. The agent has completed, but a critical error occurred.

Idle. The agent has never been run.

Starting. The agent is starting.

ExportingData. The agent is exporting data to the specified export target.

Stopping. The agent is in the process of stopping.

Restarting. The agent is restarting. This usually occurs when the agent needs to clear JavaScript memory leaks.

ExportFailed. The agent completed, but failed to export data.

int PageLoads

The number of page loads. This includes AJAX calls triggered by agent actions.

TimeSpan Runtime

The amount of time the agent has run.

int MissingElements

The number of times an agent command could not find it's specified content where the content was not specified as optional.

int PageErrors

The number of page load errors. This includes errors loading content from AJAX calls that were triggered by agent actions.

DateTime StartTime

The time the agent started.

 

Agent Progress Data

An agent can provide progress data in a DataTable. The DataTable contains a DataRow for each web browser the agent is using to extract data. Each DataRow contains a status column and a description column. The progress data is the same information displayed when running an agent in the Content Grabber agent editor.

 

Agent Log Data

An agent can provide log data in a DataTable. The DataTable contains a log level column and a description column. A log level of 1 means an error, 2 means a warning and 3 means information. The log data is the same data you can view in the Content Grabber agent editor.

 

Agent Export Data

The API can provider extracted data in a DataSet, as an XML string or as a JSON string. For large amount of data, use the parameters offset and limit to page through the data. Offset is the index of the first data entry to return and limit is the index of the last data entry to return. The API method GetAgentStatus returns a value Export Row Count which contains the total number of data entries available. See Data Counting for more information about the Export Row Count value.