Multithreading

<< Click to Display Table of Contents >>

Navigation:  Improving Agent Performance and Reliability >

Multithreading

Multithreading refers to multiple tasks running concurrently, and in Content Grabber that usually mean multiple HTML parsers and web browsers loading and processing multiple web pages from a target website at the same time.

 

Content Grabber uses a maximum of 5 active HTML parsers and 1 active dynamic browser to process a website by default, but that can be changed by setting the agent options Max Active Parsers and Max Active Browsers. An additional active parser or browser will sometimes be used by an agent to avoid deadlocks while waiting for an available parser or browser.

 

The Max Active Parsers option specified the number of active HTML, XML and JSON parsers, and Max Active Browsers specifies the number of active dynamic web browser.

 

The best number for Max Active Parsers and Max Active Browsers depends on how hard you can hit the target website without making the website unstable and without getting blocked by the website. The number also depends on how much memory and how many CPU cores are available on the computer running Content Grabber.

 

A dynamic web browser uses a significant amount of memory and CPU, and often uses many concurrent web connections to download content from a website, so the Max Active Browsers options should not be set too high. A HTML parser uses little memory and CPU, and only a single web connection, so the Max Active Parsers option can often be set much higher.