<< <%SKIN-STRTRANS-SYNTOC%> >> Navigation: CAPTCHA & IP Blocking > IP Blocking & Proxy Servers |
When you visit a website with a web browser or a web scraping tool, such as Content Grabber, the website owner can record your IP address and may be able to use this information to identify you or block your access to the website.
If you do not want a website owner to be able to identify you while you visit a website, you can use a proxy server to hide your IP address. When you use a proxy server, you do not visit the target website directly, but instead request that the proxy server visit the website for you.
There are many different types of proxy server, but Content Grabber supports only HTTP proxy servers. It does not support other types of proxy servers, such as SOCKS proxies.
After you have purchased proxy server access or found freely available proxy servers on the web, you will receive one or more proxy server IP addresses and possibly a username and password to access the proxies. You need to enter this information into the Content Grabber agent by selecting Agent Settings from the ribbon menu, or you can setup default proxies in the Application Settings ribbon menu.
Agent proxy settings
Application proxy settings
The proxy menus allow you to set the Proxy source. You can select one of the following proxy sources:
Proxy source |
Description |
Default |
The agent will use the proxy configured in Internet Explorer, or no proxies if no proxies have been configured in Internet Explorer. |
Application |
The agent will use the default proxy settings configured in the Application Settings menu. |
Proxy List |
The agent will use a specific list of proxies. Click the Proxy List button to add proxies to the list. |
Gateway |
Use this proxy setting in Application Settings if you must connect to a specific proxy to access the Internet. |
3rd Party |
Content Grabber can integrate with the following 3rd party proxy provides:
Luminati Nohodo Private Proxy Switch
If you set the 3rd party proxy to Proxy API, you can specify API configuration to download a list of proxies from a 3rd party API.
You can also set the 3rd party proxy to Fiddler, which will allow you to view web traffic between Content Grabber and the target web server. This can sometimes be useful when debugging hard to process websites. This requires the Fiddler software to be running on the computer. |
Direct |
The agent will not use any proxies. |
The Proxy List screen is used to specify a list of proxies and set the proxy verification option:
Add proxies on the Proxy list screen
You must specify the Proxy Address in the following format, including the port number:
206.118.215.245:60099
In the example above, 206.118.215.245 is the IP address of the proxy server and 60099 is the port number.
Content Grabber can automatically verify if a proxy is available before switching to the proxy. This allows agents to switch to the next proxy if a proxy is not available, and thereby avoid stopping the agent prematurely just because a proxy is unavailable.
Proxy verification option |
Description |
Verify proxy before use |
The agent will verify a proxy before switching to the proxy. |
Verification timeout |
The number of seconds the agent will wait for a successful proxy verification. |
If you are using a large number of proxy servers, it can be tedious to add them all to each project. You can use the Import Proxies button to import a list of proxies from a CSV file. The CSV file must have the following format:
Proxy Address, Username, Password
The Username and Password are optional columns.
CSV Example 1:
proxy
173.244.220.185:8800
50.21.10.78:8800
134.22.166.242:8800
CSV Example 2:
proxy, username, password
173.244.220.185:8800, user1, pass1
50.21.10.78:8800, user1, pass1
134.22.166.242:8800, user1, pass1
Some company network configurations require all users to access the Internet through a specific proxy. Use the Gateway proxy option in Application Settings to open the Proxy Gateway screen.
Some companies require force Internet access
only through a proxy gateway
The Proxy address can specify a proxy directly or it can be a URL to a proxy configuration script (PAC).