每小时处理 10,000 个验证码意味着每秒持续解决约 2.8 个验证码。通过正确的架构可以实现这一点。本指南介绍了使用以下方法达到此吞吐量所需的数学、代码和调整CaptchaAI。
数学
如果单个 reCAPTCHA v2 求解需要 15 秒(中位数):
- 顺序:3,600 秒 / 15 秒 = 240 次解决 /hour
- 要达到 10,000/hour:您需要始终在飞行中 ~42 个并发求解
关键见解:您不会等待 CaptchaAI 变得更快 - 您重叠了足够多的请求,可以在同一个 15 秒窗口内完成 42 个解决。
建筑学
┌──────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────┐
│ Task │────▶│ Submit │────▶│ CaptchaAI │────▶│ Result │
│ Queue │ │ Workers │ │ API │ │ Store │
│ (Redis) │ │ (async) │ │ │ │ (DB) │
└──────────┘ └────────────┘ └─────────────┘ └──────────┘
│ ▲
│ ┌──────────┐ │
└───▶│ Poll │────┘
│ Workers │
└──────────┘
成分:
- 任务队列 - 保存带有站点密钥和 URL 的待处理验证码任务
- 提交worker – 同时发送任务到CaptchaAI API
- 投票工作人员 – 以优化的时间间隔检查结果
- 结果存储 – 在令牌到达时保存它们
Python:异步管道
# high_throughput_solver.py
import os
import asyncio
import time
import aiohttp
API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
BASE_URL = "https://ocr.captchaai.com"
MAX_CONCURRENT = 50 # Max simultaneous solves
POLL_INTERVAL = 5 # Seconds between polls
INITIAL_WAIT = 12 # Seconds before first poll
semaphore = asyncio.Semaphore(MAX_CONCURRENT)
stats = {"submitted": 0, "solved": 0, "failed": 0, "start": 0}
async def solve_one(session, sitekey, pageurl, task_num):
"""Submit and poll a single CAPTCHA."""
async with semaphore:
try:
# Submit
async with session.get(f"{BASE_URL}/in.php", params={
"key": API_KEY, "method": "userrecaptcha",
"googlekey": sitekey, "pageurl": pageurl, "json": "1",
}) as resp:
result = await resp.json(content_type=None)
if result.get("status") != 1:
stats["failed"] += 1
return None
stats["submitted"] += 1
task_id = result["request"]
# Wait before first poll
await asyncio.sleep(INITIAL_WAIT)
# Poll
for _ in range(25):
async with session.get(f"{BASE_URL}/res.php", params={
"key": API_KEY, "action": "get",
"id": task_id, "json": "1",
}) as resp:
poll_result = await resp.json(content_type=None)
if poll_result.get("status") == 1:
stats["solved"] += 1
return poll_result["request"]
if poll_result.get("request") != "CAPCHA_NOT_READY":
stats["failed"] += 1
return None
await asyncio.sleep(POLL_INTERVAL)
stats["failed"] += 1
return None
except Exception as e:
stats["failed"] += 1
return None
async def run_batch(tasks):
"""Process a batch of CAPTCHA tasks concurrently."""
connector = aiohttp.TCPConnector(
limit=MAX_CONCURRENT,
keepalive_timeout=60,
)
async with aiohttp.ClientSession(connector=connector) as session:
coros = [
solve_one(session, task["sitekey"], task["pageurl"], i)
for i, task in enumerate(tasks)
]
results = await asyncio.gather(*coros)
return results
async def main():
# Generate test tasks (replace with your task source)
tasks = [
{
"sitekey": "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
"pageurl": "https://www.google.com/recaptcha/api2/demo",
}
for _ in range(100) # Start with 100 tasks
]
stats["start"] = time.time()
print(f"Processing {len(tasks)} tasks with {MAX_CONCURRENT} concurrent workers")
results = await run_batch(tasks)
elapsed = time.time() - stats["start"]
print(f"\nCompleted in {elapsed:.0f}s")
print(f"Submitted: {stats['submitted']}")
print(f"Solved: {stats['solved']}")
print(f"Failed: {stats['failed']}")
print(f"Throughput: {stats['solved'] / (elapsed / 3600):.0f} solves/hour")
asyncio.run(main())
JavaScript:并发管道
// high_throughput_solver.js
const axios = require('axios');
const https = require('https');
const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';
const BASE = 'https://ocr.captchaai.com';
const MAX_CONCURRENT = 50;
const agent = new https.Agent({ keepAlive: true, maxSockets: MAX_CONCURRENT });
const api = axios.create({ baseURL: BASE, httpsAgent: agent, timeout: 30000 });
const stats = { submitted: 0, solved: 0, failed: 0 };
async function solveOne(sitekey, pageurl) {
try {
const submit = await api.get('/in.php', {
params: { key: API_KEY, method: 'userrecaptcha', googlekey: sitekey, pageurl, json: '1' },
});
if (submit.data.status !== 1) { stats.failed++; return null; }
stats.submitted++;
await new Promise(r => setTimeout(r, 12000));
for (let i = 0; i < 25; i++) {
const poll = await api.get('/res.php', {
params: { key: API_KEY, action: 'get', id: submit.data.request, json: '1' },
});
if (poll.data.status === 1) { stats.solved++; return poll.data.request; }
if (poll.data.request !== 'CAPCHA_NOT_READY') { stats.failed++; return null; }
await new Promise(r => setTimeout(r, 5000));
}
stats.failed++;
return null;
} catch { stats.failed++; return null; }
}
async function runWithConcurrency(tasks, limit) {
const results = [];
const executing = new Set();
for (const task of tasks) {
const p = solveOne(task.sitekey, task.pageurl).then(r => {
executing.delete(p);
return r;
});
executing.add(p);
results.push(p);
if (executing.size >= limit) {
await Promise.race(executing);
}
}
return Promise.all(results);
}
(async () => {
const tasks = Array.from({ length: 100 }, () => ({
sitekey: '6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-',
pageurl: 'https://www.google.com/recaptcha/api2/demo',
}));
const start = Date.now();
console.log(`Processing ${tasks.length} tasks, ${MAX_CONCURRENT} concurrent`);
await runWithConcurrency(tasks, MAX_CONCURRENT);
const elapsed = (Date.now() - start) / 1000;
console.log(`\nDone in ${elapsed.toFixed(0)}s`);
console.log(`Solved: ${stats.solved}, Failed: ${stats.failed}`);
console.log(`Throughput: ${(stats.solved / (elapsed / 3600)).toFixed(0)} solves/hour`);
agent.destroy();
})();
调整参数
| 范围 | 保守的 | 均衡 | 挑衅的 |
|---|---|---|---|
| 最大并发数 | 20 | 50 | 100 |
| 初始等待 | 15秒 | 12秒 | 10秒 |
| POLL_INTERVAL | 7秒 | 5秒 | 3秒 |
| 最多轮询尝试次数 | 30 | 25 | 20 |
| 预期吞吐量 | ~4,800/hr | ~10,000/hr | ~18,000/hr |
开始保守并增加 MAX_CONCURRENT 直到您看到收益递减或错误率增加。
监控吞吐量
实时跟踪这些指标:
- 每分钟求解 – 对于 10K/hour 目标,应保持在 ~167
- 错误率 – 保持在 5% 以下。如果出现峰值,请减少并发性
- 队列深度 – 如果正在增长,请增加工作人员。如果为空,则说明过度配置
- P90 求解时间 – 如果增加,CaptchaAI 可能会受到速率限制
故障排除
| 问题 | 原因 | 处理方式 |
|---|---|---|
| 吞吐量稳定在 ~5K/hr | 并发量不足 | 将 MAX_CONCURRENT 增加到 80-100 |
| 错误率 > 10% | 超载 API 或不良代理 | 减少并发,检查代理健康状况 |
| 内存使用量不断增长 | 无限任务积累 | 结果到达时进行处理,不进行缓冲 |
ERROR_NO_SLOT_AVAILABLE |
CaptchaAI 队列已满 | 退出并在 5 秒后重试 |
常问问题
CaptchaAI 并发限制是多少?
对并发请求没有硬性限制,但极高的并发(500+)可能会触发速率限制。从 50 开始,然后扩大规模。
我可以在多台机器上运行它吗?
是的。使用共享队列(Redis、RabbitMQ)并在多个服务器上运行工作脚本。每个工人独立地拉取任务。
按照这个速度,平衡消费怎么样?
10,000 解决了/hour,密切监控您的余额。使用余额检查端点 (res.php?action=getbalance) 并设置警报。
下一步
构建您的高吞吐量验证码管道 –”获取您的 CaptchaAI API 密钥。
相关指南: