<< <%SKIN-STRTRANS-SYNTOC%> >> Navigation: »No topics above this level« API Reference |
The Content Grabber API consists of two components, the Agent API Library and the Agent Proxy Library. The two libraries provide similar functionality, but the Agent Proxy library must connect to the Content Grabber agent service while the Agent API is a stand-alone library that can either connect to a Content Grabber agent service or work on its own.
The Agent API contains agent definition classes that allows you to load, modify and save agents. You cannot add or remove commands from an agent, but you can modify properties of existing commands. The Proxy API can only run agents and cannot load and save agents.
The Proxy API depends on no other files than the Proxy API assembly file. The Agent API depends on the Content Grabber runtime files which can be generated by the Content Grabber application by choosing Runtime Package in the Application menu. This will generate a zip file with all required files and folders. All these files and folders must be copied to the folder of your executable program.
Agents can also be run using simple web requests. Web requests can be sent from most applications and require no special Content Grabber API libraries.
Function |
Description |
AgentApi(string agentNameOrPath, string sessionId) |
Instantiates a new API class with the specified agent and session ID. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
|
AgentApi(string agentNameOrPath) |
Instantiates a new API class without a session. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
|
void Connect(string endPointAddress) |
Connects to a Content Grabber agent service. You can specify the server name or IP address and port number. The default connection string for a local service is:
http://localhost:8000/ContentGrabber
|
void CloseConnection() |
Closes the connection to the Content Grabber agent service. |
void StartAgent() |
Runs the agent specified when instantiating the API. The agent will run asynchronously. |
void StartAgent(AgentSettings settings) |
Runs the agent with additional settings. The agent will run asynchronously. See below for more information about the AgentSettings class. |
void StopAgent() |
Stops the agent if it is currently running. |
void CloseAgentSession() |
Closes an agent session. When you close an agent session, all data associated with that session is removed and you will not be able to retrieve status information about the agent that ran in this session. You can only close a session if an agent is not currently running in the session.
You don't need to close a session. Session data will be removed automatically after the agent has completed running and the session timeout has elapsed. The default session timeout is 30 minutes, so by default session data will be removed automatically 30 minutes after the agent has completed. |
AgentStatus GetAgentStatus() |
Returns status information about an agent that has been run asynchronously. See below for more information about the AgentStatus class. |
DataTable GetAgentProgressAsDataTable() |
Returns progress information in a DataTable about an agent running in asynchronously. See below for more information about the information returned. |
DataTable GetAgentProgressAsJson() |
Returns progress information as a JSON string about an agent running in asynchronously. See below for more information about the information returned. |
DataTable GetAgentLogAsDataTable(string agentNameOrPath, string sessionId) |
Returns log data in a DataTable for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned. |
string GetAgentLogAsJson(string agentNameOrPath, string sessionId) |
Returns log data as a JSON string for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned. |
DataSet GetAgentExportDataAsDataSet() |
Returns extracted data in a DataSet for an agent that has been run asynchronously. |
string GetAgentExportDataAsJson() |
Returns extracted data as a JSON string for an agent that has been run asynchronously. |
string GetAgentExportDataAsXml() |
Returns extracted data as an XML string for an agent that has been run asynchronously. |
RunAgentReturnJson |
Runs an agent synchronously and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnXml |
Runs an agent synchronously and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnDataSet |
Runs an agent synchronously and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnJson(AgentSettings settings) |
Runs an agent synchronously with additional settings and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
RunAgentReturnXml(AgentSettings settings) |
Runs an agent synchronously with additional settings and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
RunAgentReturnDataSet |
Runs an agent synchronously with additional settings and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
public Agent GetAgent() |
Returns the agent specified when instantiating the API class. |
public void SaveAgent(Agent agent) |
Saves the specified agent. |
The following functions are available in the proxy assembly:
Function |
Description |
AgentProxy(string agentNameOrPath, string sessionId) |
Instantiates a new proxy with the specified agent and session ID. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
|
AgentProxy(string agentNameOrPath) |
Instantiates a new proxy without a session. You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
|
void Connect(string endPointAddress) |
Connects to the Content Grabber agent service. You can specify the server name or IP address and port number. The default connection string for a local service is:
http://localhost:8000/ContentGrabber
|
void CloseConnection() |
Closes the connection to the Content Grabber agent service. |
void StartAgent() |
Starts the agent specified when instantiating the proxy. The agent will run asynchronously. |
void StartAgent(AgentSettings settings) |
Starts the agent with additional settings. The agent will run asynchronously. See below for more information about the AgentSettings class. |
void StopAgent() |
Stops the agent if it is currently running asynchronously. |
void CloseAgentSession() |
Closes an agent session after the agent has been run asynchronously. When you close an agent session, all data associated with that session is removed and you will not be able to retrieve status information about the agent that ran in this session. You can only close a session if an agent is not currently running in the session.
You don't need to close a session. Session data will be removed automatically after the agent has completed running and the session timeout has elapsed. The default session timeout is 30 minutes, so by default session data will be removed automatically 30 minutes after the agent has completed. |
AgentStatus GetAgentStatus() |
Returns status information about an agent that has been run asynchronously. See below for more information about the AgentStatus class. |
DataTable GetAgentProgressAsDataTable() |
Returns progress information in a DataTable about an agent running in asynchronously. See below for more information about the information returned. |
DataTable GetAgentProgressAsJson() |
Returns progress information as JSON about an agent running in asynchronously. See below for more information about the information returned. |
DataTable GetAgentLogAsDataTable(string agentNameOrPath, string sessionId) |
Returns log data in a DataTable for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned. |
string GetAgentLogAsJson(string agentNameOrPath, string sessionId) |
Returns log data as JSON for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned. |
DataSet GetAgentDataAsDataSet() |
Returns extracted data in a DataSet for an agent that has been run asynchronously. |
string GetAgentDataAsJson() |
Returns extracted data as a JSON string for an agent that has been run asynchronously. |
string GetAgentDataAsXml() |
Returns extracted data as an XML string for an agent that has been run asynchronously. |
RunAgentReturnJson |
Runs an agent synchronously and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnXml |
Runs an agent synchronously and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnDataSet |
Runs an agent synchronously and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run. |
RunAgentReturnJson(AgentSettings settings) |
Runs an agent synchronously with additional settings and returns extracted data as a JSON string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
RunAgentReturnXml(AgentSettings settings) |
Runs an agent synchronously with additional settings and returns extracted data as an XML string. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
RunAgentReturnDataSet(AgentSettings settings) |
Runs an agent synchronously with additional settings and returns extracted data in a DataSet. The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
See below for more information about the AgentSettings class. |
The following agent settings can specified when running an agent:
Property |
Description |
AgentLogLevel LogLevel |
Log detail level. Set the log level to None to turn off logging. |
bool IsLogHtml |
Logs the HTML of all loaded web pages to files. |
bool IsLogToFile |
Logs information to a file instead of a database. |
int Timeout |
Session timeout. All session data is removed automatically when the agent has completed and this timeout has elapsed. |
Dictionary<string, string> InputParameters |
A list of input parameters. |
GlobalData |
Any serializeable data object can be stored in this dictionary and will be available to all scripts in an agent. Notice that input parameters will eventually be stored in this dictionary as well, so it doesn't matter if you use input parameters or global data to store your input data. |
An agent can provide the following status information:
Property |
Description |
AgentRunningStatus RunStatus |
The RunStatus can be one of the following values.
•Completed. The agent has completed successfully. •Incomplete. The agent has completed, but stopped prematurely. The agent may have been stopped manually. •Failed. The agent has completed, but a critical error occurred. •Idle. The agent has never been run. •Starting. The agent is starting. •ExportingData. The agent is exporting data to the specified export target. •Stopping. The agent is in the process of stopping. •Restarting. The agent is restarting. This usually occurs when the agent needs to clear JavaScript memory leaks. •ExportFailed. The agent completed, but failed to export data.
|
int PageLoads |
The number of page loads. This includes AJAX calls triggered by agent actions. |
TimeSpan Runtime |
The amount of time the agent has run. |
int MissingElements |
The number of times an agent command could not find it's specified content where the content was not specified as optional. |
int PageErrors |
The number of page load errors. This includes errors loading content from AJAX calls that were triggered by agent actions. |
DateTime StartTime |
The time the agent started. |
An agent can provide progress data in a DataTable. The DataTable contains a DataRow for each web browser the agent is using to extract data. Each DataRow contains a status column and a description column. The progress data is the same information displayed when running an agent in the Content Grabber agent editor.
An agent can provide log data in a DataTable. The DataTable contains a log level column and a description column. A log level of 1 means an error, 2 means a warning and 3 means information. The log data is the same data you can view in the Content Grabber agent editor.
The following functions are available when using web requests:
Function |
Description |
StartAgent?agent={agentNameOrPath}&sessionId={sessionId}&sessionTimeout={sessionTimeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters} |
Starts an agent that will run asynchronously.
This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.
This function accepts the following parameters:
agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
sessionId (optional): The session will run in a session with the specified session ID.
sessionTimeout (optional): If the agent is running in a session, this value specifies the number of minutes the agent data will be available before it's deleted. The default session timeout is 30 minutes.
logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.
logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.
logToFile (optional): Logs data to a file instead of a database table. The default value is False.
Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded. |
RunAgentReturnJson?agent={agentNameOrPath}&timeout={timeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters} |
Runs an agent synchronously and returns the extracted data as a JSON string.
The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.
This function accepts the following parameters:
agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
Timeout (optional): This maximum number of seconds an agent will run. When the timeout is reached, the agent will stop and close its session if it's run in a session. The default timeout is 30 seconds.
logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.
logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.
logToFile (optional): Logs data to a file instead of a database table. The default value is False.
Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded. |
RunAgentReturnXml?agent={agentNameOrPath}&timeout={timeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters} |
Runs an agent synchronously and returns the extracted data as an XML string.
The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.
This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.
This function accepts the following parameters:
agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:
C:\Users\Public\Documents\Content Grabber\Agents
Timeout (optional): This maximum number of seconds an agent will run. When the timeout is reached, the agent will stop and close its session if it's run in a session. The default timeout is 30 seconds.
logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.
logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.
logToFile (optional): Logs data to a file instead of a database table. The default value is False.
Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded. |
StopAgent?agent={agentNameOrPath}&sessionId={sessionId}" |
Stops the agent if it is currently running asynchronously.
This function supports only GET requests. |
CloseAgentSession?agent={agentNameOrPath}&sessionId={sessionId} |
Closes an agent session after the agent has been run asynchronously. When you close an agent session, all data associated with that session is removed and you will not be able to retrieve status information about the agent that ran in this session. You can only close a session if an agent is not currently running in the session.
You don't need to close a session. Session data will be removed automatically after the agent has completed running and the session timeout has elapsed. The default session timeout is 30 minutes, so by default session data will be removed automatically 30 minutes after the agent has completed.
This function supports only GET requests. |
GetAgentStatus?agent={agentNameOrPath}&sessionId={sessionId} |
Returns status information about an agent that has been run asynchronously. See below for more information about the AgentStatus class.
This function supports only GET requests. |
GetAgentProgressAsJson?agent={agentNameOrPath}&sessionId={sessionId} |
Returns progress information as JSON about an agent running in a asynchronously. See below for more information about the information returned.
This function supports only GET requests. |
GetAgentLogAsJson?agent={agentNameOrPath}&sessionId={sessionId} |
Returns log data as JSON for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned.
This function supports only GET requests. |
DataSet GetAgentDataAsDataSet() |
Returns extracted data in a DataSet for an agent that has been run asynchronously.
This function supports only GET requests. |
GetAgentDataAsJson?agent={agentNameOrPath}&sessionId={sessionId} |
Returns extracted data as a JSON string for an agent that has been run asynchronously.
This function supports only GET requests. |
GetAgentDataAsXml?agent={agentNameOrPath}&sessionId={sessionId} |
Returns extracted data as an XML string for an agent that has been run asynchronously.
This function supports only GET requests. |