Ever tried running a web scraper only to get blocked after a few hundred requests? Or maybe you're conducting security audits and need to rotate your IP address constantly? The solution lies in proxy automation, but manually managing proxy lists is about as fun as watching paint dry.
Let me walk you through building a Python tool that handles everything automatically: finding proxies, testing them, and deploying them in a way that actually works. No fluff, just practical code you can use today.
When you're crawling websites, performing security audits, or accessing geo-restricted content, your real IP becomes a liability. Make too many requests and you're blocked. The problem is that managing proxies manually is tedious—you need to find them, verify they work, and configure your tools to use them.
There are two main approaches here: free infrastructure (public proxies, Tor) or paid services (rotating proxies, cloud providers). While paid options offer better stability and faster speeds, public proxies work surprisingly well if you automate the filtering process properly.
The first challenge is finding a reliable source of proxy data. You want something that provides structured data—JSON, CSV, anything but HTML that you'd have to parse manually. After testing various sources, I found that Geonode offers a straightforward API for accessing free proxy lists.
The beauty of their service is the clean JSON format. When you query their API, you get back structured data with IP addresses, ports, latency measurements, and protocol types. Here's what matters: you can filter by protocol (HTTP, HTTPS, SOCKS4, SOCKS5) and get up to 500 proxies per request.
The API response includes everything you need: the proxy IP, port number, latency in milliseconds, and the total count of available proxies. For our automation, we'll focus on three key data points—IP address, latency, and the total count for pagination.
Here's how the extraction works. First, fetch the total proxy count for your chosen protocol. Then iterate through the pages, collecting proxies that meet your latency threshold. A latency filter of 50ms is reasonable—anything slower tends to timeout during actual use. Store each proxy in the format protocol ip port for easy configuration later.
If you're working with large-scale web scraping projects or need more reliable proxy infrastructure, 👉 explore enterprise-grade proxy solutions that eliminate the hassle of managing public proxy lists. These managed services provide better uptime and faster speeds compared to free alternatives.
Having a list of proxies is one thing. Having proxies that actually function from your location is another story entirely. Many public proxies only work from specific geographic regions or are simply dead.
The straightforward approach would be testing each proxy sequentially—make a request through the proxy, check if it succeeds. But test 2,000 proxies one at a time and you'll be waiting for hours. Multi-threading solves this problem.
Python's threading library lets you spin up multiple workers simultaneously. Each worker grabs a proxy, tests it against a target URL, and reports back whether it's alive. The trick is creating a custom Thread class that returns values, since the standard threading module doesn't support this directly.
The testing function is simple: configure your request library to use the proxy, set a reasonable timeout (5 seconds works well), and attempt to fetch a page. If you get a successful response, the proxy is good. Any exception means it's dead or unreachable.
Running 12 workers concurrently provides a good balance between speed and not overwhelming your network connection. The function maintains two lists: the original untested proxies and the verified working ones. As workers complete their checks, successful proxies get added to the verified list while the original list shrinks.
Now you've got a list of verified, working proxies. The next step is making them useful. Instead of manually specifying which proxy to use for each request, a load balancer distributes your traffic across all available proxies automatically.
The go-dispatch-proxy project fits perfectly here. It's a Golang port of the original NodeJS dispatch-proxy, which means it's fast and handles concurrent connections well. The tool creates a single SOCKS5 endpoint that automatically rotates through your proxy list.
Installation requires libpcap-dev for packet capture functionality. Once installed, compile the project from source. The resulting binary accepts a list of proxies as arguments and opens a local port (default 8080) that acts as your load balancing proxy.
The configuration is straightforward—pass your verified proxies as command-line arguments using the format -proxy protocol://ip:port. The tool handles the rotation logic internally, so your applications just point to localhost:8080 and traffic gets distributed automatically.
One critical note: this load balancer only supports SOCKS5 proxies. Make sure your filtering code validates the protocol type before attempting to start the balancer. Running it with HTTP proxies will fail and waste time.
For scenarios requiring port scanning, combine your load balancer with tools like naabu that support SOCKS5 proxies. Simply configure the scanner to use localhost:8080 as its proxy endpoint.
Not everyone needs a load balancer. Sometimes you just want to use proxychains with a fresh list of working proxies. Adding this functionality takes one additional function that formats your verified proxies into ProxyChains configuration syntax.
The function iterates through your active proxy list and prints each entry in the format protocol ip port. You can specify whether to use dynamic or random chain modes. Dynamic chains try proxies in order and skip dead ones, while random chains select proxies randomly from your list.
To use the generated configuration, redirect your script's output to a file and copy it into your proxychains.conf. Update the chain mode at the top of the config file based on your preference.
Testing is simple—run a command through proxychains that reveals your IP address. Set up a listener on a remote server and watch connections arrive from different IP addresses as proxychains rotates through your list. Each request should originate from a different proxy.
This automated setup works well for specific scenarios. Public proxies with a load balancer excel at port scanning and security testing where you need SOCKS5 protocol support. The rotating nature prevents detection while the automation saves hours of manual work.
For web scraping or HTTP-based tasks, consider alternatives. Docker containers running multiple Tor instances with HAProxy provide similar functionality using the HTTP protocol. These rotate automatically without needing proxy list management.
Cloud services like Amazon API Gateway offer another option for web fuzzing—extremely fast, highly reliable, but limited to HTTP/HTTPS traffic. Choose based on your protocol requirements and whether you need the speed and stability of paid services versus the zero-cost approach of public proxies.
Let's be honest about the limitations here. Public proxies log your traffic. Every request, including session cookies and credentials, passes through servers you don't control. Use them for testing and development, never for sensitive operations.
The proxies you find will have varying lifespans. Some die within minutes, others last for hours. This is why automation matters—you can quickly rebuild your proxy list when quality degrades. If your work demands consistent proxy availability and better security, 👉 consider professional proxy services with dedicated infrastructure and proper security measures.
The code we've built here handles the complete workflow: fetching proxy lists, filtering by latency, verifying functionality, and deploying through either a load balancer or ProxyChains. It eliminates the manual tedium while giving you control over the entire process. Whether you're conducting security research, developing scrapers, or testing geo-restrictions, having automated proxy management changes how efficiently you work.