Using the HTML Parser

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Improving Agent Performance and Reliability >

Using the HTML Parser

When Content Grabber extracts data from a web page, it first loads the web page into a web browser. The web browser parses and renders the web page, and executes any JavaScript the page contains. This is a very safe approach, since Content Grabber uses an embedded version of Chrome as it's web browser. Therefore if the target website is working in Chrome, Content Grabber can usually extract data from the website. However, the approach is also slow and may cause instability.

 

If you have been using Chrome to browse the web, you may have sometimes experienced problems, such as hanging websites or program crashes. This may occur very rarely (say once a year), so it may not be a problem during normal usage of Chrome. When Content Grabber uses Chrome to browse a website, it may access more web pages in a few hours than you access in a year, so stability issues are magnified significantly.

 

The main source of website instability is JavaScript. A website developer can use JavaScript to implement dynamic features on the website, but JavaScript bugs may lead to memory leaks, hanging websites or even program crashes.

 

All action commands in a Content Grabber agent, that open a new web browser, can be configured to open a specific type of web browser. The default browser is an embedded version of Chome, but you can change this to a HTML Parser. The HTML Parser does not use Chrome at all, and it completely ignores JavaScript, so it's generally much more reliable.

 

JavaScript is always single threaded, so many operations cannot be performed simultaneously when using Chome web browsers. Since the Static Parser does not execute JavaScript, it can often process web pages much faster than a Chrome web browser.

 

Many websites don't work properly if JavaScript is disabled, so the HTML Parser will not work for all websites, but many websites can be partly processed with a HTML Parser, so you should always switch to a HTML Parser if a particular web page can be processed without JavaScript.

 

staticParser

Configure an Action command to use a specific web

browser type

 

If you want an agent to use the HTML Parser by default, then you can set the web browser type on the Agent Settings > Browser > HTML Parser:

 

staticParserRibbonMenu