Dynamic Websites

<< Click to Display Table of Contents >>

Navigation:  Introduction > Web Scraping Techniques >

Dynamic Websites

Using HTML script, a client-side dynamic web page will continue to load more content after the initial content loads and the page elements are available to the user. The most common language for client-side scripting is JavaScript, and it may use AJAX (Asynchronous JavaScript and XML) to load additional content onto a web page asynchronously. It may also modify existing content on a web page, such as enabling or disabling content when you click on particular web elements.

To extract data correctly, Content Grabber needs to detect any dynamic changes on a web page. For example, if you want to extract any additional data that AJAX loads onto a web page, then you'll want to configure Content Grabber to wait on AJAX to finish processing the new content before it can start extracting it.

 

Content Grabber is excellent at the automatic detection of dynamic changes. However, sometimes JavaScript behaves unusually, and you may need to make adjustments to properly extract dynamic content. For example, Content Grabber can detect when JavaScript completes an AJAX load of dynamic content. But it cannot detect exactly when the JavaScript is done and so it will simply wait for a few milliseconds. If the JavaScript takes an unusually long time to display the dynamic content, you may need to use the timeout feature of Content Grabber to insert a short interval for the JavaScript to display the dynamic content (typically a few additional milliseconds).

 

Familiarity with JavaScript can make it much easier to configure a web-scraping agent to extract data from dynamic websites when Content Grabber is unable configure the agent automatically. You can learn more from various JavaScript tutorials available on the web, such as:

 

http://www.w3schools.com/js/