Error Handling and Retry Strategies
Error Handling and Retry Strategies
Scraping jobs fail. Websites go down, anti-bot systems block requests, and network connections drop. A reliable scraping pipeline needs proper error handling to recover from failures without losing data.
Common Error Types
HTTP-Level Errors
| Status Code | Meaning | Action |
|---|---|---|
| 400 | Bad request (invalid parameters) | Fix the request, don't retry |
| 401 | Invalid or missing API key | Check your API key |
| 429 | Rate limited | Back off and retry after delay |
| 500 | Server error | Retry with backoff |
| 503 | Service temporarily unavailable | Retry with backoff |
Scrape-Level Errors
These appear in the job result when using the async API:
- TIMEOUT: The target page took too long to load. Try increasing timeout or simplifying the
waitForSelector. - BLOCKED: The target site blocked the request. Try a different mode (upgrade to
browser_stealthorhttp_stealth). - NAVIGATION_ERROR: The URL failed to load (DNS error, SSL error, etc.). Verify the URL is correct and accessible.
- SELECTOR_TIMEOUT: The
waitForSelectorelement never appeared. The page structure may have changed.
Retry Strategy: Exponential Backoff
The most effective retry strategy is exponential backoff — each retry waits longer than the last. This prevents hammering a failing service and gives transient issues time to resolve.
Pattern: Wait 2^attempt seconds between retries (1s, 2s, 4s, 8s...).
Node.js / TypeScript
interface ScrapeResult {
success: boolean;
data?: { html: string; jobId: string };
error?: { code: string; message: string };
}
async function scrapeWithRetry(
url: string,
options: { mode?: string; waitForSelector?: string } = {},
maxRetries = 3
): Promise<ScrapeResult> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(
'https://api.fastwebscraper.com/v1/scrape/sync',
{
method: 'POST',
headers: {
'X-API-Key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
mode: options.mode ?? 'auto',
...options,
}),
signal: AbortSignal.timeout(60000),
}
);
// Don't retry client errors (4xx except 429)
if (response.status >= 400 && response.status < 500 && response.status !== 429) {
const error = await response.json();
return { success: false, error: error.error };
}
// Retry on 429 (rate limited) and 5xx errors
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const result = await response.json();
// Check if the scrape itself succeeded
if (result.data?.html) {
return { success: true, data: result.data };
}
throw new Error(result.error?.message ?? 'Empty response');
} catch (error) {
const delay = Math.pow(2, attempt) * 1000;
console.warn(
`Attempt ${attempt + 1}/${maxRetries} failed for ${url}: ${error.message}. ` +
`Retrying in ${delay}ms...`
);
if (attempt < maxRetries - 1) {
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
return {
success: false,
error: { code: 'MAX_RETRIES', message: `Failed after ${maxRetries} attempts` },
};
}
// Usage
const result = await scrapeWithRetry('https://example.com/product/123', {
mode: 'auto',
waitForSelector: '.price',
});
if (result.success) {
console.log('Scraped successfully:', result.data!.html.length, 'chars');
} else {
console.error('Scrape failed:', result.error!.message);
}Python
import requests
import time
from typing import Optional
def scrape_with_retry(
url: str,
mode: str = 'auto',
wait_for_selector: Optional[str] = None,
max_retries: int = 3,
) -> dict:
"""Scrape a URL with exponential backoff retry."""
for attempt in range(max_retries):
try:
body = {
'url': url,
'mode': mode,
}
if wait_for_selector:
body['waitForSelector'] = wait_for_selector
response = requests.post(
'https://api.fastwebscraper.com/v1/scrape/sync',
headers={
'X-API-Key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
json=body,
timeout=60,
)
# Don't retry client errors (except 429)
if 400 <= response.status_code < 500 and response.status_code != 429:
return {'success': False, 'error': response.json().get('error')}
response.raise_for_status()
result = response.json()
if result.get('data', {}).get('html'):
return {'success': True, 'data': result['data']}
raise Exception(result.get('error', {}).get('message', 'Empty response'))
except Exception as e:
delay = (2 ** attempt)
print(f'Attempt {attempt + 1}/{max_retries} failed for {url}: {e}')
if attempt < max_retries - 1:
print(f'Retrying in {delay}s...')
time.sleep(delay)
return {
'success': False,
'error': {'code': 'MAX_RETRIES', 'message': f'Failed after {max_retries} attempts'}
}
# Usage
result = scrape_with_retry(
'https://example.com/product/123',
mode='auto',
wait_for_selector='.price',
)
if result['success']:
print(f'Scraped: {len(result["data"]["html"])} chars')
else:
print(f'Failed: {result["error"]["message"]}')C#
using System.Net.Http.Json;
async Task<(bool Success, JsonElement? Data, string? Error)> ScrapeWithRetry(
HttpClient client,
string url,
string mode = "auto",
string? waitForSelector = null,
int maxRetries = 3)
{
for (int attempt = 0; attempt < maxRetries; attempt++)
{
try
{
var request = new Dictionary<string, string>
{
["url"] = url,
["mode"] = mode,
};
if (waitForSelector != null)
request["waitForSelector"] = waitForSelector;
var response = await client.PostAsJsonAsync(
"https://api.fastwebscraper.com/v1/scrape/sync", request);
// Don't retry client errors (except 429)
if ((int)response.StatusCode >= 400
&& (int)response.StatusCode < 500
&& (int)response.StatusCode != 429)
{
var errorResult = await response.Content
.ReadFromJsonAsync<JsonElement>();
return (false, null, errorResult.ToString());
}
response.EnsureSuccessStatusCode();
var result = await response.Content
.ReadFromJsonAsync<JsonElement>();
var data = result.GetProperty("data");
if (data.TryGetProperty("html", out var html)
&& html.GetString()?.Length > 0)
{
return (true, data, null);
}
throw new Exception("Empty response");
}
catch (Exception ex)
{
var delay = (int)Math.Pow(2, attempt) * 1000;
Console.WriteLine(
$"Attempt {attempt + 1}/{maxRetries} failed: {ex.Message}");
if (attempt < maxRetries - 1)
await Task.Delay(delay);
}
}
return (false, null, $"Failed after {maxRetries} attempts");
}
// Usage
using var client = new HttpClient { Timeout = TimeSpan.FromSeconds(60) };
client.DefaultRequestHeaders.Add("X-API-Key", "YOUR_API_KEY");
var (success, data, error) = await ScrapeWithRetry(
client,
"https://example.com/product/123",
waitForSelector: ".price");
if (success)
{
var html = data!.Value.GetProperty("html").GetString()!;
Console.WriteLine($"Scraped: {html.Length} chars");
}
else
{
Console.WriteLine($"Failed: {error}");
}Handling Specific Failure Modes
Rate Limiting (429)
When you receive a 429 response, the API is asking you to slow down. The correct response is to:
- Wait longer before retrying (respect
Retry-Afterheader if present) - Reduce your request concurrency
- Spread requests more evenly over time
Blocked Requests
If a scrape returns a BLOCKED error, the target site detected and rejected the request. Try:
- Upgrade to a stealth mode (
http_stealthorbrowser_stealth) if usinghttporbrowser - Add
countrytargeting to match the site's expected visitors - Add a
waitForSelectorto handle CAPTCHA challenge pages - Reduce request frequency to the same domain
Timeout Errors
If pages consistently time out:
- Check if the site is actually accessible (try in a browser)
- Simplify or remove
waitForSelector— the element may not exist - Consider using the async API for long-running scrapes
Batch Error Handling
For batch scraping, track success and failure rates per domain:
interface BatchResult {
url: string;
success: boolean;
html?: string;
error?: string;
}
async function scrapeBatch(urls: string[]): Promise<BatchResult[]> {
const results: BatchResult[] = [];
const concurrency = 5;
for (let i = 0; i < urls.length; i += concurrency) {
const batch = urls.slice(i, i + concurrency);
const batchResults = await Promise.allSettled(
batch.map(async (url) => {
const result = await scrapeWithRetry(url);
return { url, ...result };
})
);
for (const settled of batchResults) {
if (settled.status === 'fulfilled') {
results.push(settled.value);
} else {
results.push({
url: 'unknown',
success: false,
error: settled.reason.message,
});
}
}
}
// Log summary
const succeeded = results.filter(r => r.success).length;
const failed = results.length - succeeded;
console.log(`Batch complete: ${succeeded} succeeded, ${failed} failed`);
return results;
}Key Takeaways
- Always implement retries — transient failures are normal in scraping
- Use exponential backoff — don't hammer a failing service
- Don't retry 4xx errors (except 429) — they indicate a problem with your request
- Track failure rates — sudden drops in success rate indicate a problem to investigate
- Log failures — you need the context to debug issues later
For more on the API endpoints and error codes, see the API Reference.