This article addresses TrustedProxies with Selenium or Scripting. Here’s what we covered in previous articles:
- Introduction to Using TrustedProxies with Rank Tracker
- Set Up Your Computer for TrustedProxies
- Using Rank Tracker with the Big-G Stealth Extractor
In this article, you’ll read about TrustedProxies with Selenium and custom code for SERP extraction.
You can use simple software for direct URL queries – a long-employed method – or you can use a headless browser scraping system, often controlled via Selenium. Below is a popular method for using the Big-G to scrape Google with these tools:
- Launch a fresh headless browser instance, using a random User Agent each time. With an incognito/private session of the browser, you don’t need to worry about cookies.
- Visit Google’s homepage (the ccTLD you’ve selected).
- Type the keyword in search and click the Google Search button.
- Scrape the information you need from the first page of results, and visit subsequent pages, if needed, by scrolling to the bottom and clicking on the relevant page number.
To configure the Python webdriver for Selenium to use Chrome, see How do i set proxy for chrome in python webdriver. Be sure to use IP authentication before configuring Selenium.
An alternative to a headless browser is to send data requests in code, via TrustedProxies. Before doing so, make sure your IP address is authenticated to the proxy or that you’re set up for
username:password authentication. TrustedProxies provides a number of code examples in Python, PHP, NodeJS, C#, and Java. Choose whichever language you want to get started.
Things to consider for your script
Authentication: With python requests, you can specify a parameter to pass in a username and password, the IP address and port number. Depending on how the authentication method on your account is set up, if you’re set up for
username:password authentication, you may need to pass one or more of these parameters to the proxy server when you connect.
Timeouts: Sometimes on the Internet, a data response to your script is delayed. Your script must be able to anticipate delays and handle them appropriately. In addition, your script connection has its own timeout settings. If you need to increase the timeout your script allows, you may also want to increase the timeout setting.
Human Emulation: You’ll need to code some form of human emulation in your script. Human emulation delays help to prevent your proxy servers from being blocked by websites and keep your script from returning errors. It’s good practice to code in a random delay of 15 to 20 seconds between queries, or to increase from that point if necessary.