Download Document

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Agent Commands > Capture Commands >

Download Document

The Download Document command extracts a document from a web page. The command will download a document, and then save it to the file system or send it to a database - depending on your chosen export target.

 

The web selection path for this command normally points to the document link itself, but it could also point to a web element that contains information from which the document URL can be derived by using Content Transformation.

 

The figure below shows the Command Properties panel after choosing Download Document from the New Command drop-down:

 

Choose Download Document Command

 

Data Fields

If the agent is saving the document to a database, then by default this command will generate two data fields: one for the document binary data and another for the name of the document. If the agent is saving the document to the file system, then the command will generate only one data field containing the full file path to the document. The command property Export URL can be used to also generate a data field that contains the document URL.

 

Command Configuration

The Common tab in the Configure Agent Command panel has three tabs:

File URL - contains the URL for the image.

File Name - contains the name of the downloaded image.

Convert to HTML - specifies if the downloaded document should be converted to HTML.

 

We explain the details of each below.

 

File URL

The entry in this tab determines the specific URL for the image, and the agent uses this URL to download the document at run time.

 

You can choose the HTML attribute that the command should extract to get the document URL. The default value is URL, which extracts the href HTML attribute (if the chosen web element is a link).

 

Click the Transformation Script button to enter regular .NET expressions or write a .NET script that will transform the document URL to meet your requirements. See the Content Transformation Script topic for more information about content transformation scripts.

 

Use the Data Value option to specify that an agent data value will be used as file URL. The agent data can come from a data provider, an input parameter or captured data.

 

File Name

The entry in this tab contains the file name. From the drop-down menu, you can choose the HTML attribute that you want to use as the name.

 

Click the Transformation Script button to enter regular .NET expressions or write a .NET script that will transform the document name to meet your requirements. See the Content Transformation Script topic for more information about content transformation scripts.

 

Use the Data Value option to specify that an agent data value will be used as file name. The agent data can come from a data provider, an input parameter or captured data.

 

Use the Detect File Extension option to specify if agent should try and detect the file type of the downloaded document, or if a transformation script or a data value will provide a file name that includes a file extension.

 

Convert To HTML

A downloaded document can be converted into a HTML page as it's being downloaded, and a URL command can later be used to open the HTML page. Capture commands can then be used to extract data from the HTML page.

 

convertDocument

 

Please see Extracting Data From Non-HTML Documents for more information.

 

Click To Download

Check this box when no direct URL is available for the document, but it is necessary to download the document by clicking on a web element - such as a button. When you enable this option:

 

The File URL tab becomes unavailable

Content Grabber will assign a unique identifier as the file name

You will have access to the Action configuration tab where you can fine tune the behavior of the action.

 

See the topic Action Commands for more information.