3rd Party Tools

We provide download links to 3rd party tools for your convenience. These tools have not been developed by us and we don't support the tools.

The tools are open source projects, and you must ensure you comply with the licenses associated with each tool. Some of the licenses are not compatible with the Content Grabber license, so we cannot include the tools in the Content Grabber installation package, and you must download them separately from here or from the author's websites.

Document Converters

Document converters are used to convert documents into HTML, so the documents can be processed like any other HTML page. For example, if a website is linking to PDF or MS Word files, you may want to extract text or images from within these document instead of downloading the documents.

Important: Most files, including PDF and Word files, are not intended to be converted into HTML, so the converted HTML pages will be much more difficult to work with than standard HTML pages. In most cases, you'll have to select the entire HTML page and use Regular Expressions to extract the target text.

Please read the Content Grabber manual for more information about how to use a document converter with Content Grabber.

PDF to HTML

You can use this document converter to convert PDF documents into HTML.

Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
pdftohtml.zip

The PDF to HTML converter requires the Microsoft C++ 2010 runtime. If that is not already installed on your computer, you can download it from here:

https://www.microsoft.com/en-au/download/details.aspx?id=5555

The pdftohtml utility is part of the poppler library (http://poppler.freedesktop.org)

The source code for poppler is available on the above website. You can also download the source code here poppler-0.18.1.tar.gz

Docx to HTML

You can use this document converter to convert Office 2007+ word documents into HTML.

Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
docxtohtml.zip

The docxtohtml utility uses PowerTools for Open XML. Please visit this link for more information about the PowerTools for Open XML library.

Excel to HTML

You can use this document converter to convert Excel and CSV documents into HTML.

Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
exceltohtml.zip

IFilter Converter

You can use this document converter to convert most files into simple HTML if the appropriate IFilter is installed. The Windows indexing server uses IFilters to convert documents into plain text, so IFilters for many different documents will already be installed on most computers.

Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
ifilter.zip

IFilters are used to convert documents into plain text, so this converter cannot retain any content structures in a document, and you will only be able to extract the entire page content. You will sometimes be able to use regular expressions to extract pieces of information from the entire page content.