Navigation:  Agent Commands > Action Commands > Action Configuration >


Typically, you have no concern about the sequence of complex activities during the loading of a web page, since you simply wait for the content that you want to see. The most critical content on a web page will likely load far in advance of the time that you actually get around to viewing a specific part of the web page. Usually, all features function correctly as you fill in web forms or click links.


However, it's very different with web-scraping agents, since these agents are very fast. An agent will attempt to process a web page as quickly as possible and continue onto the next page. A web-scraping agent is so fast that it could easily start processing a page before all of the essential content loads. So, it's important that you configure an action command to wait for all important browser activities to complete and all the content loads before web page processing begins.


When an action command executes, it waits for certain activities to complete in the web browser. For example, if a command executes a click on a link, it may wait for a page to load or an AJAX call to complete. Some actions may result in a very complex set of activities. An action may load a new page that then uses AJAX to load additional dynamic content onto the page.


Discovering Activities

Action commands automatically discover web browser activities. After a command fires the action events, it will monitor all activity in the web browser and wait for critical activities to complete. Once no new activities have started for a little while, it will consider the action to be complete.


You can specify which activities an action command should wait for. The command can wait for activities in the main web page and in sub-pages that are loaded in web frames.


Page load activities can be optional or required. An error will be reported if a page load activity is required, but no page load occurs. If Wait for page load is set to None, the command will not wait for any page load to occur, which is slightly faster than setting Wait for page load to Optional.


An AJAX activity occurs when a web page loads content from the web server asynchronously. A Script activity occurs when a JavaScript file is loaded by the web page asynchronously. AJAX and Script activities are always optional, which means no error will be reported if a command is configured to wait for AJAX, but no AJAX activities occur.


Action Command - Activities

Wait configuration panel.


Complex Website Activities

Some websites have very complex activities. For example, many travel websites that provide hotel and flight search functionality will load a waiting page and after a while load the actual search result. An action command will often complete the action after the waiting page is loaded, since it doesn't know that more content will be loaded later. If the website redirects from the waiting page to the search result page, then the Wait option Delayed redirect can often be used successfully, but sometimes websites use other techniques and it can be very difficult for the action command to tell when an action has completed.


Sometimes it's possible to determine that a website action has completed when a specific URL has been loaded. This URL could be from a full page load, a frame page load, or an asynchronous AJAX call. A Regular Expression can be specified to wait for a URL that matches that Regular Expression.


To wait for a URL, set the option Wait for URL to Optional or Required, and set the option Wait URL Regex with a Regular Expression the URL must match. If the option Wait for URL is set to Required, a page load error occurs if the URL is never loaded.


Additionally, a website may attempt to load the contents of sub pages before loading the content on the main page. This is because the pages may exist within different domains. The options Wait for External Sub-Page AJAX, Wait for External Sub-Page Load and Wait for External Sub-Page Scripts allow commands to wait on page loads from sub pages on different domains than the main page, depending on the type of request being used. If these options are not used, then elements inside the external sub page will not be loaded.


Wait for URL options.

Sometimes the only reliable way to determine when an action has completed, is to wait for certain content to appear on the web page. The action command can check a specified selection XPath and make sure the corresponding content exists on the main web page before it considers the action completed. This check takes place directly on the web page before the page has been parsed by Content Grabber, so only pure XPath version 1.0 syntax can be used.


To wait for one or more selection XPaths, set the option Wait for XPaths to Optional or Required, and set the option Wait XPaths with one or more XPaths. If the option Wait for XPaths is set to Required, a page load error occurs if the content selected by the XPaths never appear on the page.


Multiple XPaths can be used to cover different scenarios. For example, if an action command is waiting for a search result to appear on the web page, but sometimes the search result is empty and a message is displayed instead of the search result, then one XPaths can be used to select the search result, and another XPath used to select the message that appears when the search result is empty. This means the command will not wait around for the search result to appear on the page if the search result is empty, but instead stop waiting as soon as the empty search message appears in the page.


Wait for XPaths options.

Wait Timeouts

Action commands will wait for browser activities for a certain period of time before the wait times out, and the command either considers the action completed or reports the timeout as a page load error. The default timeout values are usually appropriate, but there will sometimes be situations where some timeout values should be modified. For example, timeout values may need to be increased for a very slow website in order for the agent to work properly, or timeout values could be decreased for a very fast website in order to increase agent performance.


Action timeouts.

Browser Activity Screen

This feature shows all browser activities that occur after the current action executes. You can use this information to determine potential issues with the configuration of the action. Use the Activity button on the Content Grabber status bar to open the Browser Activity screen, as shown in the figure below:



The Activity button on the status bar


Critical activities have dark coloring and other activities have light coloring. A blue row appears in the sequence at the point where the command recognizes completion of the action. Activities that occur after the action completes may not necessarily indicate a problem. If the agent does not work as you expect, then you may need to reconfigure your action in such a way that it waits for some or all of those activities.



Activity is seen after the action completes, which may not be a problem