Content Transformation Scripts

<< Click to Display Table of Contents >>

Navigation:  Scripting >

Content Transformation Scripts

Content transformation scripts are used to transform content after it has been extracted from a web page. Content transformation is often used on HTML elements to extract information that is not placed in individual elements and therefore cannot be selected in the web browser. For example, content transformation can be used to extract parts of an address, such as a postal code, from a single HTML element containing the full address.

 

Content transformation scripts can be used in most Capture Commands to transform the content extracted by the commands, but can also be used in other types of commands, such as in a Navigate Link command to transform an extracted URL.

 

contentTransformationScript

 

A content transformation can be defined as a regular expression or as a C# or VB.NET script. Regular expressions are often used when you wish to extract sub-text from a larger piece of extracted text.

 

The following regular expression example extracts all the text until the first '<' character:

 

(.*?)<

return $1

 

See the topic Script Languages for information about how to use regular expressions in Content Grabber.

 

The following script also extracts all text until the first '<' character, but uses C# instead of regular expressions:

using System;

using Sequentum.ContentGrabber.Api;

public class Script

{        

 public static string TransformContent(ContentTransformationArguments args)

 {                

         return args.Content.Remove(args.Content.IndexOf('<'));

 }

}

 

The script must have a static method with the following signature:

public static string TransformContent(ContentTransformationArguments args)

 

The function will return the transformed content.

 

An instance of the ContentTransformationArguments class is provided by Content Grabber and has the following functions and properties:

 

Property or Function

Description

Agent Agent

The current agent.

ScriptUtils ScriptUtilities

A script utility class with helper methods. See Script Utilities for more information.

Command Command

The current agent command being executed.

IConnection DatabaseConnection

The current internal database connection used by the agent. This connection is already open and should not be closed by your script.

string Content

The extracted content that should be transformed.

IHtmlNode HtmlNode

The extracted HTML node.

IInternalDataRow DataRow

The current internal data row containing the data that has been extracted so far in the current container command.

bool IsDebug

True if the agent is running in debug mode.

IInputData InputDataCache

All input data available to the current command.

void WriteDebug(string debugMessage, DebugMessageType messageType = DebugMessageType.Information)

Writes log information to the agent log. This method has no effect if agent logging is disabled, or if called during design time.

void WriteDebug(string debugMessage, bool showMessageInDesignMode, DebugMessageType messageType = DebugMessageType.Information)

Writes log information to the agent log. This method has no effect if agent logging is disabled, or if called during design time.

void Notify(bool alwaysNotify)

Triggers notification at the end of an agent run. If alwaysNotify is set to false, this method only triggers a notification if the agent has been configured to send notifications on critical errors.

void Notify(string message, bool alwaysNotify)

Triggers notification at the end of an agent run, and adds the message to the notification email. If alwaysNotify is set to false, this method only triggers a notification if the agent has been configured to send notifications on critical errors.

GlobalDataDictionary GlobalData

Global data dictionary that can be used to store data that needs to be available in all scripts and after agent restarts.

 

Input Parameters are also stored in this dictionary.

IConnection GetDatabaseConnection(string connectionName)

Returns the specified database connection. The database connection must have been previously defined for the agent or be a shared connection for all agents on the computer. Your script is responsible for opening and closing the connection by calling the OpenDatabase and CloseDatabase methods.

IInputDataRow GetInputData()

If the current command is a data provider, the data for that command is returned. Otherwise this function searches the command's parents and returns the first found input data.

IInputDataRow GetInputData(Command command)

If the specified command is a data provider, the data for that command is returned. Otherwise this function searches the command's parents and returns the first found input data.

IInputDataRow GetInputData(string commandName)

If the specified command is a data provider, the data for that command is returned. Otherwise searches the command's parents and returns the first found input data.

IInputDataRow GetInputData(Guid commandId)

If the specified command is a data provider, the data for that command is returned. Otherwise the function throws an error.