To the Swift: Proxies on Your Relay Team, Part Two

Relay

In part one of this article, we noted how much proxy speed contributes to the benefits of proxy services. We asked, “Can a proxy server speed up your internet transactions?” and discussed objective measures of proxy speed along with links to some speed testing services. Now let’s talk about what you can do to optimize proxy transmission speed, or at least keep from slowing down transmissions.

Manage Latency

Latency is delay in connection and transmission of data between two computers located far apart and communicating over the Internet. The delay is generally due to the geographical distance and the number of “hops” between connecting servers.

The locations you choose can greatly affect a proxy’s speed in processing your requests. Optimum locations are close to you, and also close to the target site. A good choice of locations can help minimize latency.

Proxy and Target in the Same Country

You can reduce latency by choosing a proxy as close as possible to your target server. Services like ProxyMesh and WonderProxy provide proxy servers in many different locations. If you do use proxies in different continents from your target servers, you may encounter much higher latency.

Try a World Proxy

A world proxy is a proxy server that has outgoing IPs located all around the world. If a world proxy server is physically located in the US, and the outgoing IP is in your target country, requests through the world proxy will likely take at least 1 second longer than a direct request. But world proxies offer advantages. If you can’t find a proxy dedicated to a specific country, you may still find a world proxy that includes IP addresses for that country. ProxyMesh, for example, offers a world proxy access along with a custom header to target requests to some 37 countries.

Try an Open Proxy

Open proxies are free and available to all Internet users, and can forward requests from and to any site. Like other types of proxies, an open proxy offers online anonymity and privacy through concealment of IP address from web servers, since the server requests appear to originate from the proxy server. With an open proxy, however, this anonymity is not total. Also, open proxies are slower and more error prone than commercial proxies. But the tradeoff is a huge increase in quantity and diversity of IP addresses.

Check Geolocation

You can check speed connectivity by making sure your location and the remote site location are geographically close. But keep in mind that server locations may change. Some hosting providers may reassign blocks of IP addresses to a data center in a different geolocation from their original one. And it can take some time for the geo IP databases to update the IP location so that it is accurately stated in responses to location testing.

It’s good practice to periodically check the location of your proxy. You can use services like WhatIsMyIP.com to check the current geo IPs of a proxy server. Make sure you check geo IP over HTTPS to get an accurate reading.

Use Data Compression

To speed transmission and help control bandwidth usage, include an Accept-Encoding header to take advantage of compression options such as gzip. Most remote sites support at least one of these content encoding methods.

Compression may be especially useful in speeding high-traffic research with sizeable amounts of data per request. Although many proxies will strip out identifying headers, they do not alter the content of your request. So, whichever compression headers you send to the remote server will be passed through the proxy, so that the proxy will send back the requested data in compressed format. This article on HTTP Compression provides more details.

Minimize Requests

Minimize requests to pull images, JavaScript, and CSS files. These files usually aren’t necessary for web crawling or screen scraping. This practice won’t speed up transmission, but you can avoid slowing it down while optimizing bandwidth use and controlling the total number of requests.

Distribute Requests

Distribute requests over many IPs to reduce delayed responses and timeouts. A rotating proxy server can help you avoid rate limits and blocking by choosing a random IP for each request.

Rate limits are limits on the number of HTTP requests a user can make in a given period. You can get around them, for example, by changing your IP address frequently when you encounter a site or API that uses IP throttling or IP address rate limiting.

Avoid Timeouts

Proxy servers typically have request & response timeouts. If the remote server does not respond in that time, you will get a 408 response code. If you need to wait longer for a complete response, you may be able to use a custom header to specify the number of seconds you want to wait.

If a significant portion of your requests are timing out (the 408 response code), here are a few possible causes:

  • The network connections between you and the proxy and/or between the proxy and the remote site could be unreliable. Try some different proxies and see if that fixes the problem.
  • The pages you’re requesting take a long time to load. If a proxy server normally waits for 20 seconds, you could increase that time with a custom timeout header.
  • The proxy IPs have been blocked by the remote site. If you think this is the case, then you’ll want to switch proxies, and ideally use multiple proxies to distribute your requests.

Some other time-saving strategies:

  • Reduce time and number of requests from the same IP address using rotating proxies, which helps prevent rate limits and blocking.
  • Avoid timeouts by using proxies located near your target sites. Try configuring a custom request header for this result when using a world proxy.
  • Anticipate 301 responses (i.e., a site has been permanently moved) by scripting your request to follow redirection.

Other Good Practices

Here are more recommended practices that can speed your proxy responses and minimize timeouts.

  • Reduce the number of concurrent requests from a single IP. This could involve using an additional IP for crawling, or slowing down your crawl rate on your current requests.
  • With added proxies, you have more connection strategies available, such as putting all of your authorized proxies in a list in your code or script, then randomly choosing one proxy for each request.

In Conclusion

Speed is essential in a proxy operation. Measuring proxy speed can be complex. But you can find reliable services to help you understand and measure proxy speed. Try using the strategies we’ve outlined here to maximize the benefits of proxies.