Apify 是一个运行 Crawlee Actor 的云抓取平台。以下是如何向您的 Apify actor 添加 CaptchaAI CAPTCHA 解决方案。
演员设置
输入模式
{
"title": "CAPTCHA Scraper Input",
"type": "object",
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"editor": "requestListSources"
},
"captchaaiApiKey": {
"title": "CaptchaAI API Key",
"type": "string",
"isSecret": true
},
"maxConcurrency": {
"title": "Max Concurrency",
"type": "integer",
"default": 3
}
},
"required": ["startUrls", "captchaaiApiKey"]
}
演员代码
const { Actor } = require('apify');
const { PlaywrightCrawler } = require('crawlee');
Actor.main(async () => {
const input = await Actor.getInput();
const { startUrls, captchaaiApiKey, maxConcurrency = 3 } = input;
const solver = new CaptchaAISolver(captchaaiApiKey);
const crawler = new PlaywrightCrawler({
maxConcurrency,
requestHandlerTimeoutSecs: 180,
async requestHandler({ request, page, log }) {
await page.goto(request.url, { waitUntil: 'networkidle' });
// Check for CAPTCHA
const sitekey = await page.evaluate(() => {
const el = document.querySelector('[data-sitekey]');
return el ? el.getAttribute('data-sitekey') : null;
});
if (sitekey) {
log.info(`Solving CAPTCHA on ${request.url}`);
const token = await solver.solve(sitekey, request.url);
// Inject and submit
await page.evaluate((t) => {
document.querySelector('[name="g-recaptcha-response"]').value = t;
const cb = document.querySelector('.g-recaptcha')?.getAttribute('data-callback');
if (cb && window[cb]) window[cb](t);
}, token);
await page.click('button[type="submit"]');
await page.waitForNavigation({ timeout: 15000 });
}
// Extract data
const title = await page.title();
const items = await page.$$eval('.item', els =>
els.map(el => ({
name: el.querySelector('.name')?.textContent?.trim(),
price: el.querySelector('.price')?.textContent?.trim(),
url: el.querySelector('a')?.href,
}))
);
// Push to Apify dataset
await Actor.pushData({
url: request.url,
title,
items,
scrapedAt: new Date().toISOString(),
});
log.info(`Scraped ${items.length} items from ${request.url}`);
},
});
await crawler.run(startUrls);
});
class CaptchaAISolver {
constructor(apiKey) {
this.apiKey = apiKey;
}
async solve(sitekey, pageurl) {
const params = new URLSearchParams({
key: this.apiKey,
method: 'userrecaptcha',
googlekey: sitekey,
pageurl: pageurl,
json: '1',
});
const submitResp = await fetch('https://ocr.captchaai.com/in.php', {
method: 'POST',
body: params,
});
const submitResult = await submitResp.json();
if (submitResult.status !== 1) {
throw new Error(`Submit: ${submitResult.request}`);
}
const taskId = submitResult.request;
await new Promise(r => setTimeout(r, 15000));
for (let i = 0; i < 24; i++) {
const pollResp = await fetch(
`https://ocr.captchaai.com/res.php?key=${this.apiKey}&action=get&id=${taskId}&json=1`
);
const result = await pollResp.json();
if (result.status === 1) return result.request;
if (result.request !== 'CAPCHA_NOT_READY') {
throw new Error(`Solve: ${result.request}`);
}
await new Promise(r => setTimeout(r, 5000));
}
throw new Error('Timeout');
}
}
Apify 上的环境变量
安全地存储您的 CaptchaAI 密钥:
- 转到 Actor 设置 → 环境变量
- 添加:
CAPTCHAAI_API_KEY= 你的密钥(标记为秘密) - 代码访问:
process.env.CAPTCHAAI_API_KEY
// Alternative: use env var instead of input
const apiKey = input.captchaaiApiKey || process.env.CAPTCHAAI_API_KEY;
Apify 代理 + CaptchaAI
const crawler = new PlaywrightCrawler({
proxyConfiguration: await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
}),
// ... rest of config
});
常问问题
我可以在 Apify 的免费套餐中使用 CaptchaAI 吗?
是的。 CaptchaAI 是适用于任何 Apify 计划的外部 API 调用。您的成本是 CaptchaAI 的每次解决定价加上 Apify 的计算成本。
我应该使用 Apify 代理还是 CaptchaAI 的代理参数?
使用 Apify 代理来抓取请求,使用 CaptchaAI 而不使用代理来解决问题。对于大多数用例来说,这是最具成本效益的方法。
如何通过验证码解决来处理 Apify actor 超时?
将 requestHandlerTimeoutSecs 设置为至少 180 秒,以留出验证码解决时间。
相关指南
- Crawlee + CaptchaAI 集成
- 构建自定义抓取框架
部署验证码解决参与者 -获取CaptchaAI.