We provide download links to 3rd party tools for your convenience. These tools have not been developed by us and we don't support the tools.
The tools are open source projects, and you must ensure you comply with the licenses associated with each tool. Some of the licenses are not compatible with the Content Grabber license, so we cannot include the tools in the Content Grabber installation package, and you must download them separately from here or from the author's websites.
Document Converters
Document converters are used to convert documents into HTML, so the documents can be processed like any other HTML page. For example, if a website is linking to PDF or MS Word files, you may want to extract text or images from within these document instead of downloading the documents.Important: Most files, including PDF and Word files, are not intended to be converted into HTML, so the converted HTML pages will be much more difficult to work with than standard HTML pages. In most cases, you'll have to select the entire HTML page and use Regular Expressions to extract the target text.
Please read the Content Grabber manual for more information about how to use a document converter with Content Grabber.
PDF to HTML
You can use this document converter to convert PDF documents into HTML.Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
pdftohtml.zip
The PDF to HTML converter requires the Microsoft C++ 2010 runtime. If that is not already installed on your computer, you can download it from here:
https://www.microsoft.com/en-au/download/details.aspx?id=5555
The pdftohtml utility is part of the poppler library (http://poppler.freedesktop.org)
The source code for poppler is available on the above website. You can also download the source code here poppler-0.18.1.tar.gz
Docx to HTML
You can use this document converter to convert Office 2007+ word documents into HTML.Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
docxtohtml.zip
The docxtohtml utility uses PowerTools for Open XML. Please visit this link for more information about the PowerTools for Open XML library.
Excel to HTML
You can use this document converter to convert Excel and CSV documents into HTML.Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
exceltohtml.zip
IFilter Converter
You can use this document converter to convert most files into simple HTML if the appropriate IFilter is installed. The Windows indexing server uses IFilters to convert documents into plain text, so IFilters for many different documents will already be installed on most computers.Download the zip file below and extract the content into to the Converters folder in the default Content Grabber document folder
(My Documents\Content Grabber\Converters)
ifilter.zip
IFilters are used to convert documents into plain text, so this converter cannot retain any content structures in a document, and you will only be able to extract the entire page content. You will sometimes be able to use regular expressions to extract pieces of information from the entire page content.