Regular Expressions

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  Introduction > Web Scraping Techniques >

Regular Expressions

With Regular Expressions, you can write expressions that look for specific character sequences within strings and then extract small text strings out of larger ones.

 

Content Grabber uses XPath to select web elements on a web page, and then extracts content from those web elements. You may only want some parts of the content extraction, or you may want to transform it. For example, a single web element may contain the entire address of a company, but you may want to extract the content into separate elements such as street address, city, zip code and state. You can use Regular Expressions to split the address text into separate text strings.

 

There are many tutorial websites that teach Regular Expressions. Here is one example:

 

http://www.regular-expressions.info/reference.html