Using Simple Web Requests

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Programming Interface > Building a Web Application >

Using Simple Web Requests

The Content Grabber Windows service supports simple web requests, so you can execute agents and retrieve extracted data from non-Windows environments, such as from a PHP page on a Linux server.

 

The Windows service is listening for web requests on port 8002 by default. You can change this port number in the Content Grabber editor. The service is stopped by default and configured to start manually. If you are going to use this service, you should configure the service to start automatically. Please see Using the Content Grabber Agent Service for more information about the Windows service.

 

This is an example of a web request that executes an agent named Sequentum and provides some input values for the agent.

 

http://localhost:8002/ContentGrabber/RunAgentReturnJson?agent=sequentum&pars={"StartDate":"2015-10-15","EndDate":"2015-12-15"}

 

The above web request executes the agent synchronously and returns the extracted data as a JSON string.

 

Web Request Functions

The following functions are available when using web requests:

Function

Description

StartAgent?agent={agentNameOrPath}&sessionId={sessionId}&sessionTimeout={sessionTimeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters}

Starts an agent that will run asynchronously.

 

This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.

 

This function accepts the following parameters:

 

agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:

 

C:\Users\Public\Documents\Content Grabber\Agents

 

sessionId (optional): The session will run in a session with the specified session ID.

 

sessionTimeout (optional): If the agent is running in a session, this value specifies the number of minutes the agent data will be available before it's deleted. The default session timeout is 30 minutes.

 

logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.

 

logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.

 

logToFile (optional): Logs data to a file instead of a database table. The default value is False.

 

Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded.

RunAgentReturnJson?agent={agentNameOrPath}&timeout={timeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters}&limit={limit}

Runs an agent synchronously and returns the extracted data as a JSON string.

 

The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.

 

This function accepts the following parameters:

 

agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:

 

C:\Users\Public\Documents\Content Grabber\Agents

 

Timeout (optional): This maximum number of seconds an agent will run. When the timeout is reached, the agent will stop and close its session if it's run in a session. The default timeout is 30 seconds.

 

logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.

 

logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.

 

logToFile (optional): Logs data to a file instead of a database table. The default value is False.

 

Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded.

 

Limit (optional): The maximum number of results to return.

RunAgentReturnXml?agent={agentNameOrPath}&timeout={timeout}&logLevel={logLevel}&logHtml={isLogHtml}&logToFile={isLogToFile}&pars={inputParameters}&limit={limit}

Runs an agent synchronously and returns the extracted data as an XML string.

 

The agent is always run in a session when the agent supports sessions, and the session is closed automatically after the agent has completed its run.

 

This function supports both GET and POST requests. If you need to specify a long list of input parameters you must use POST requests, since GET requests are limited in length.

 

This function accepts the following parameters:

 

agent (required): You can specify the full path to an agent file or just the name of the agent. If you only specify the agent name, Content Grabber will look for the agent in the default location for the user running the agent service. The default agent location for the local System account is:

 

C:\Users\Public\Documents\Content Grabber\Agents

 

Timeout (optional): This maximum number of seconds an agent will run. When the timeout is reached, the agent will stop and close its session if it's run in a session. The default timeout is 30 seconds.

 

logLevel (optional): Log detail level. Set the log level to None to turn off logging. The default log level is None. Accepted values are None, Low, Medium and High.

 

logHtml (optional): Logs the raw HTML of all web pages processed by the agent. The default value is False.

 

logToFile (optional): Logs data to a file instead of a database table. The default value is False.

 

Pars (optional): A JSON formatted list of input values that can be used by the agent. The JSON string should be URL encoded.

 

Limit (optional): The maximum number of results to return.

StopAgent?agent={agentNameOrPath}&sessionId={sessionId}"

Stops the agent if it is currently running asynchronously.

 

This function supports only GET requests.

CloseAgentSession?agent={agentNameOrPath}&sessionId={sessionId}

Closes an agent session after the agent has been run asynchronously. When you close an agent session, all data associated with that session is removed and you will not be able to retrieve status information about the agent that ran in this session. You can only close a session if an agent is not currently running in the session.

 

You don't need to close a session. Session data will be removed automatically after the agent has completed running and the session timeout has elapsed. The default session timeout is 30 minutes, so by default session data will be removed automatically 30 minutes after the agent has completed.

 

This function supports only GET requests.

GetAgentStatus?agent={agentNameOrPath}&sessionId={sessionId}

Returns status information about an agent that has been run asynchronously. See below for more information about the AgentStatus class.

 

This function supports only GET requests.

GetAgentProgressAsJson?agent={agentNameOrPath}&sessionId={sessionId}

Returns progress information as JSON about an agent running in a asynchronously. See below for more information about the information returned.

 

This function supports only GET requests.

GetAgentLogAsJson?agent={agentNameOrPath}&sessionId={sessionId}&offset={offset}&limit={limit}

Returns log data as JSON for an agent that has been run asynchronously. This function does not return any data if logging is disabled or if logging is written to file. See below for more information about the information returned.

 

This function supports only GET requests.

 

offset (optional): Index of the first log entry to return.

 

Limit (optional): Index of the last log entry to return.

DataSet GetAgentDataAsDataSet?agent={agentNameOrPath}&sessionId={sessionId}&offset={offset}&limit={limit}

Returns extracted data in a DataSet for an agent that has been run asynchronously.

 

This function supports only GET requests.

 

offset (optional): Index of the first data entry to return.

 

Limit (optional): Index of the last data entry to return.

GetAgentDataAsJson?agent={agentNameOrPath}&sessionId={sessionId}&offset={offset}&limit={limit}

Returns extracted data as a JSON string for an agent that has been run asynchronously.

 

This function supports only GET requests.

 

offset (optional): Index of the first data entry to return.

 

Limit (optional): Index of the last data entry to return.

GetAgentDataAsXml?agent={agentNameOrPath}&sessionId={sessionId}&offset={offset}&limit={limit}

Returns extracted data as an XML string for an agent that has been run asynchronously.

 

This function supports only GET requests.

 

offset (optional): Index of the first data entry to return.

 

Limit (optional): Index of the last data entry to return.