掌握GeeTest验证码提交中的每个参数。了解如何提取 gt、challenge 和其他值,并将它们正确提交到 CaptchaAI。
GeeTest v3 参数
| 范围 | 必需的 | 描述 |
|---|---|---|
gt |
是的 | GeeTest 帐户 ID(32 个字符的十六进制)。在页面源或 API 响应中找到 |
challenge |
是的 | 特定于会话的挑战字符串。每个解决方案必须是新鲜的 |
pageurl |
是的 | 显示验证码的页面的完整 URL |
api_server |
不 | 自定义 GeeTest API 服务器子域 |
从页面中提取参数
# extract_geetest_params.py
import requests
import re
import json
def extract_geetest_v3(page_url, session=None):
"""Extract GeeTest v3 gt and challenge from a page."""
if session is None:
session = requests.Session()
session.headers["User-Agent"] = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36"
)
resp = session.get(page_url, timeout=15)
html = resp.text
# Method 1: Extract gt from HTML
gt_match = re.search(r'gt["\']?\s*[:=]\s*["\']([a-f0-9]{32})', html)
gt = gt_match.group(1) if gt_match else None
# Method 2: Find API endpoint that returns challenge
api_match = re.search(r'(https?://[^"\']+register-slide[^"\']*)', html)
challenge = None
if api_match:
api_url = api_match.group(1)
api_resp = session.get(api_url, timeout=10)
try:
data = api_resp.json()
challenge = data.get("challenge")
gt = gt or data.get("gt")
except json.JSONDecodeError:
pass
if not challenge:
# Try embedded challenge
ch_match = re.search(r'challenge["\']?\s*[:=]\s*["\']([a-f0-9]+)', html)
challenge = ch_match.group(1) if ch_match else None
return {"gt": gt, "challenge": challenge, "pageurl": page_url}
# Usage
params = extract_geetest_v3("https://staging.example.com/qa-login")
print(f"gt: {params['gt']}")
print(f"challenge: {params['challenge']}")
向 CaptchaAI 提交 GeeTest
# solve_geetest.py
import requests
import time
import os
def solve_geetest(gt, challenge, pageurl, api_server=None):
"""Solve GeeTest v3 slide CAPTCHA via CaptchaAI."""
api_key = os.environ["CAPTCHAAI_API_KEY"]
payload = {
"key": api_key,
"method": "geetest",
"gt": gt,
"challenge": challenge,
"pageurl": pageurl,
"json": 1,
}
if api_server:
payload["api_server"] = api_server
# Submit
resp = requests.post(
"https://ocr.captchaai.com/in.php",
data=payload,
timeout=30,
)
result = resp.json()
if result.get("status") != 1:
raise RuntimeError(f"Submit failed: {result.get('request')}")
task_id = result["request"]
# Poll — GeeTest typically solves in 10-20 seconds
time.sleep(10)
for _ in range(30):
resp = requests.get("https://ocr.captchaai.com/res.php", params={
"key": api_key,
"action": "get",
"id": task_id,
"json": 1,
}, timeout=15)
data = resp.json()
if data.get("status") == 1:
return data["request"] # Returns challenge, validate, seccode
if data["request"] != "CAPCHA_NOT_READY":
raise RuntimeError(data["request"])
time.sleep(5)
raise TimeoutError("GeeTest solve timeout")
使用解决方案
该解决方案返回必须提交到目标站点的验证端点的三个值:
# submit_solution.py
import json
def submit_geetest_solution(session, validation_url, solution, original_challenge):
"""Submit GeeTest solution to the target site."""
# Parse solution if string
if isinstance(solution, str):
solution = json.loads(solution)
payload = {
"geetest_challenge": solution.get("challenge", original_challenge),
"geetest_validate": solution.get("validate", ""),
"geetest_seccode": solution.get("seccode", ""),
}
resp = session.post(validation_url, data=payload, timeout=30)
return resp
# Complete flow
def full_geetest_flow(page_url, validation_url):
import requests
from extract_geetest_params import extract_geetest_v3
session = requests.Session()
session.headers["User-Agent"] = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36"
)
# Step 1: Extract parameters
params = extract_geetest_v3(page_url, session)
print(f"gt: {params['gt']}, challenge: {params['challenge'][:16]}...")
# Step 2: Solve
solution = solve_geetest(
params["gt"], params["challenge"], params["pageurl"],
)
print("Solved!")
# Step 3: Submit
resp = submit_geetest_solution(
session, validation_url, solution, params["challenge"],
)
print(f"Validation response: {resp.status_code}")
return resp
挑战新鲜感
challenge 参数是特定于会话的并且很快就会过期:
# fresh_challenge.py
import time
def get_fresh_challenge(session, register_url):
"""Always fetch a fresh challenge before solving."""
resp = session.get(register_url, timeout=10)
data = resp.json()
challenge = data.get("challenge")
if not challenge:
raise ValueError("No challenge returned")
return challenge
def solve_with_fresh_challenge(session, gt, register_url, pageurl):
"""Ensure challenge is fresh before submitting to CaptchaAI."""
challenge = get_fresh_challenge(session, register_url)
# Submit immediately — don't let it expire
solution = solve_geetest(gt, challenge, pageurl)
return solution
关键规则: 提取挑战并在几秒钟内提交给 CaptchaAI。陈旧的挑战总是会失败。
自定义API服务器
一些网站使用自定义 GeeTest 子域:
# The api_server parameter specifies a custom GeeTest backend
# Default: api.geetest.com
# Custom examples: api-na.geetest.com, api.geetest.com/ajax-custom
solution = solve_geetest(
gt="abc123...",
challenge="def456...",
pageurl="https://staging.example.com/qa-login",
api_server="api-na.geetest.com", # North America endpoint
)
故障排除
| 问题 | 原因 | 处理方式 |
|---|---|---|
ERROR_CAPTCHA_UNSOLVABLE |
陈旧的挑战 | 在提交之前立即获取新的挑战 |
validate 为空 |
API版本错误 | 将 version=4 用于 GeeTest v4 网站 |
| 解决方案被站点拒绝 | 缺少 seccode |
确保所有三个字段均已提交 |
gt 参数未找到 |
通过 JavaScript 加载 | 使用 Selenium 或检查注册端点的 XHR 响应 |
常问问题
gt 和 challenge 有什么区别?
gt 是站点的 GeeTest 帐户 ID - 它保持不变。challenge 每次会话都会生成,并且每次都必须重新提取。
挑战的有效期是多长时间?
通常为 60-120 秒。提取并立即提交给CaptchaAI。
api_server 是做什么的?
它告诉 CaptchaAI 使用哪个 GeeTest API 服务器。仅当站点使用非默认 GeeTest 端点时才需要。检查页面的网络请求是否为 api-*.geetest.com。
相关指南
- GeeTest v4 变更
- GeeTest 与 BLS 比较
掌握GeeTest参数——以 CaptchaAI 开头.