Optimizing Selections

<< Click to Display Table of Contents >>

Navigation:  Selection Techniques >

Optimizing Selections

An excessively long XPath is unlikely to work well if the target web page changes - even slightly. Every step in an XPath must match an HTML tag on the web page, and it will fail if any of these tags move to another location or go away entirely.

 

When you click on an HTML element in the web browser, Content Grabber will automatically attempt to create an optimal selection XPath by making it as short as possible. For example, the full selection XPath path to a <div> HTML element could be as follows:

 

DIV[1]/DIV[5]/TABLE[2]/TBODY[1]/TR[2]/TD[1]/DIV

 

If the DIV tag has an ID attribute with a unique value listView, the optimal XPath is:

 

//DIV[@id='listView']

 

Content Grabber will examine the entire web page for a DIV tag having the ID attribute value listView. This XPath example is very robust and not sensitive to future page changes, and it will work as long as this element exists on the web page - even if the rest of the page changes.

 

The design of Content Grabber is to prefer ID attributes for optimizing XPaths, because the expectation is that any given ID will be unique on a web page. However, sometimes websites use IDs that are unique to a specific content element, such as a product ID, and such IDs are not appropriate for use in an XPath. If you extract data from a product catalog by using an XPath to extract the product title from all product detail pages, then you do not want the XPath to depend on a specific product ID, such as we have in the case below:

 

//H1[@id='sku_245865']

 

Such an XPath would work for only one specific product and not for the others. Content Grabber will avoid using IDs that look like identifiers, such as a product identifier, but if it turns out wrong, then your only option is to optimize the XPath manually.

 

If you need to optimize a selection XPath manually, you can use the Exact Selection tool. This tool generates selection XPaths with as much detail as possible, and then you can manually remove the detail that is causing problems - such as an ID attribute containing a product identifier.

 

ExactSelectionTool

The Exact Selection tool