IP Blocking & Proxy Servers

<< <%SKIN-STRTRANS-SYNTOC%> >>

Navigation:  CAPTCHA & IP Blocking >

IP Blocking & Proxy Servers

When you visit a website with a web browser or a web scraping tool, such as Content Grabber, the website owner can record your IP address and may be able to use this information to identify you or block your access to the website.

 

If you do not want a website owner to be able to identify you while you visit a website, you can use a proxy server to hide your IP address. When you use a proxy server, you do not visit the target website directly, but instead request that the proxy server visit the website for you.

 

There are many different types of proxy server, but Content Grabber supports only HTTP proxy servers. It does not support other types of proxy servers, such as SOCKS proxies.

 

How to Configure Proxy Servers

After you have purchased proxy server access or found freely available proxy servers on the web, you will receive one or more proxy server IP addresses and possibly a username and password to access the proxies. You need to enter this information into the Content Grabber agent by selecting Agent Settings from the ribbon menu, or you can setup default proxies in the Application Settings ribbon menu.

 

agentSettingsProxy

Agent proxy settings

 

applicationSettingsProxy

Application proxy settings

 

The proxy menus allow you to set the Proxy source. You can select one of the following proxy sources:

Proxy source

Description

Default

The agent will use the proxy configured in Internet Explorer, or no proxies if no proxies have been configured in Internet Explorer.

Application

The agent will use the default proxy settings configured in the Application Settings menu.

Proxy List

The agent will use a specific list of proxies. Click the Proxy List button to add proxies to the list.

Gateway

Use this proxy setting in Application Settings if you must connect to a specific proxy to access the Internet.

3rd Party

Content Grabber can integrate with the following 3rd party proxy provides:

 

Luminati

Nohodo

Private Proxy Switch

 

If you set the 3rd party proxy to Proxy API, you can specify API configuration to download a list of proxies from a 3rd party API.

 

You can also set the 3rd party proxy to Fiddler, which will allow you to view web traffic between Content Grabber and the target web server. This can sometimes be useful when debugging hard to process websites. This requires the Fiddler software to be running on the computer.

Direct

The agent will not use any proxies.

 

The Proxy List screen is used to specify a list of proxies and set the proxy verification option:

 

proxyList

Add proxies on the Proxy list screen

 

You must specify the Proxy Address in the following format, including the port number:

 

206.118.215.245:60099

 

In the example above, 206.118.215.245 is the IP address of the proxy server and 60099 is the port number.

 

Proxy Verification

Content Grabber can automatically verify if a proxy is available before switching to the proxy. This allows agents to switch to the next proxy if a proxy is not available, and thereby avoid stopping the agent prematurely just because a proxy is unavailable.

 

Proxy verification option

Description

Verify proxy before use

The agent will verify a proxy before switching to the proxy.

Verification timeout

The number of seconds the agent will wait for a successful proxy verification.

 

Importing Proxy Servers

If you are using a large number of proxy servers, it can be tedious to add them all to each project. You can use the Import Proxies button to import a list of proxies from a CSV file. The CSV file must have the following format:

 

Proxy Address, Username, Password

 

The Username and Password are optional columns.

 

CSV Example 1:

proxy

173.244.220.185:8800

50.21.10.78:8800

134.22.166.242:8800

 

CSV Example 2:

proxy, username, password

173.244.220.185:8800, user1, pass1

50.21.10.78:8800, user1, pass1

134.22.166.242:8800, user1, pass1

 

Proxy Gateway

Some company network configurations require all users to access the Internet through a specific proxy. Use the Gateway proxy option in Application Settings to open the Proxy Gateway screen.

 

proxyGateway

Some companies require force Internet access

only through a proxy gateway

 

The Proxy address can specify a proxy directly or it can be a URL to a proxy configuration script (PAC).