Error Handling and Retry Strategies

FastWebScraper Team3 min read

Error Handling and Retry Strategies

Scraping jobs fail. Websites go down, anti-bot systems block requests, and network connections drop. A reliable scraping pipeline needs proper error handling to recover from failures without losing data.

Common Error Types

HTTP-Level Errors

Status CodeMeaningAction
400Bad request (invalid parameters)Fix the request, don't retry
401Invalid or missing API keyCheck your API key
429Rate limitedBack off and retry after delay
500Server errorRetry with backoff
503Service temporarily unavailableRetry with backoff

Scrape-Level Errors

These appear in the job result when using the async API:

  • TIMEOUT: The target page took too long to load. Try increasing timeout or simplifying the waitForSelector.
  • BLOCKED: The target site blocked the request. Try a different mode (upgrade to browser_stealth or http_stealth).
  • NAVIGATION_ERROR: The URL failed to load (DNS error, SSL error, etc.). Verify the URL is correct and accessible.
  • SELECTOR_TIMEOUT: The waitForSelector element never appeared. The page structure may have changed.

Retry Strategy: Exponential Backoff

The most effective retry strategy is exponential backoff — each retry waits longer than the last. This prevents hammering a failing service and gives transient issues time to resolve.

Pattern: Wait 2^attempt seconds between retries (1s, 2s, 4s, 8s...).

Node.js / TypeScript

interface ScrapeResult { success: boolean; data?: { html: string; jobId: string }; error?: { code: string; message: string }; } async function scrapeWithRetry( url: string, options: { mode?: string; waitForSelector?: string } = {}, maxRetries = 3 ): Promise<ScrapeResult> { for (let attempt = 0; attempt < maxRetries; attempt++) { try { const response = await fetch( 'https://api.fastwebscraper.com/v1/scrape/sync', { method: 'POST', headers: { 'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ url, mode: options.mode ?? 'auto', ...options, }), signal: AbortSignal.timeout(60000), } ); // Don't retry client errors (4xx except 429) if (response.status >= 400 && response.status < 500 && response.status !== 429) { const error = await response.json(); return { success: false, error: error.error }; } // Retry on 429 (rate limited) and 5xx errors if (!response.ok) { throw new Error(`HTTP ${response.status}`); } const result = await response.json(); // Check if the scrape itself succeeded if (result.data?.html) { return { success: true, data: result.data }; } throw new Error(result.error?.message ?? 'Empty response'); } catch (error) { const delay = Math.pow(2, attempt) * 1000; console.warn( `Attempt ${attempt + 1}/${maxRetries} failed for ${url}: ${error.message}. ` + `Retrying in ${delay}ms...` ); if (attempt < maxRetries - 1) { await new Promise(resolve => setTimeout(resolve, delay)); } } } return { success: false, error: { code: 'MAX_RETRIES', message: `Failed after ${maxRetries} attempts` }, }; } // Usage const result = await scrapeWithRetry('https://example.com/product/123', { mode: 'auto', waitForSelector: '.price', }); if (result.success) { console.log('Scraped successfully:', result.data!.html.length, 'chars'); } else { console.error('Scrape failed:', result.error!.message); }

Python

import requests import time from typing import Optional def scrape_with_retry( url: str, mode: str = 'auto', wait_for_selector: Optional[str] = None, max_retries: int = 3, ) -> dict: """Scrape a URL with exponential backoff retry.""" for attempt in range(max_retries): try: body = { 'url': url, 'mode': mode, } if wait_for_selector: body['waitForSelector'] = wait_for_selector response = requests.post( 'https://api.fastwebscraper.com/v1/scrape/sync', headers={ 'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json', }, json=body, timeout=60, ) # Don't retry client errors (except 429) if 400 <= response.status_code < 500 and response.status_code != 429: return {'success': False, 'error': response.json().get('error')} response.raise_for_status() result = response.json() if result.get('data', {}).get('html'): return {'success': True, 'data': result['data']} raise Exception(result.get('error', {}).get('message', 'Empty response')) except Exception as e: delay = (2 ** attempt) print(f'Attempt {attempt + 1}/{max_retries} failed for {url}: {e}') if attempt < max_retries - 1: print(f'Retrying in {delay}s...') time.sleep(delay) return { 'success': False, 'error': {'code': 'MAX_RETRIES', 'message': f'Failed after {max_retries} attempts'} } # Usage result = scrape_with_retry( 'https://example.com/product/123', mode='auto', wait_for_selector='.price', ) if result['success']: print(f'Scraped: {len(result["data"]["html"])} chars') else: print(f'Failed: {result["error"]["message"]}')

C#

using System.Net.Http.Json; async Task<(bool Success, JsonElement? Data, string? Error)> ScrapeWithRetry( HttpClient client, string url, string mode = "auto", string? waitForSelector = null, int maxRetries = 3) { for (int attempt = 0; attempt < maxRetries; attempt++) { try { var request = new Dictionary<string, string> { ["url"] = url, ["mode"] = mode, }; if (waitForSelector != null) request["waitForSelector"] = waitForSelector; var response = await client.PostAsJsonAsync( "https://api.fastwebscraper.com/v1/scrape/sync", request); // Don't retry client errors (except 429) if ((int)response.StatusCode >= 400 && (int)response.StatusCode < 500 && (int)response.StatusCode != 429) { var errorResult = await response.Content .ReadFromJsonAsync<JsonElement>(); return (false, null, errorResult.ToString()); } response.EnsureSuccessStatusCode(); var result = await response.Content .ReadFromJsonAsync<JsonElement>(); var data = result.GetProperty("data"); if (data.TryGetProperty("html", out var html) && html.GetString()?.Length > 0) { return (true, data, null); } throw new Exception("Empty response"); } catch (Exception ex) { var delay = (int)Math.Pow(2, attempt) * 1000; Console.WriteLine( $"Attempt {attempt + 1}/{maxRetries} failed: {ex.Message}"); if (attempt < maxRetries - 1) await Task.Delay(delay); } } return (false, null, $"Failed after {maxRetries} attempts"); } // Usage using var client = new HttpClient { Timeout = TimeSpan.FromSeconds(60) }; client.DefaultRequestHeaders.Add("X-API-Key", "YOUR_API_KEY"); var (success, data, error) = await ScrapeWithRetry( client, "https://example.com/product/123", waitForSelector: ".price"); if (success) { var html = data!.Value.GetProperty("html").GetString()!; Console.WriteLine($"Scraped: {html.Length} chars"); } else { Console.WriteLine($"Failed: {error}"); }

Handling Specific Failure Modes

Rate Limiting (429)

When you receive a 429 response, the API is asking you to slow down. The correct response is to:

  1. Wait longer before retrying (respect Retry-After header if present)
  2. Reduce your request concurrency
  3. Spread requests more evenly over time

Blocked Requests

If a scrape returns a BLOCKED error, the target site detected and rejected the request. Try:

  1. Upgrade to a stealth mode (http_stealth or browser_stealth) if using http or browser
  2. Add country targeting to match the site's expected visitors
  3. Add a waitForSelector to handle CAPTCHA challenge pages
  4. Reduce request frequency to the same domain

Timeout Errors

If pages consistently time out:

  1. Check if the site is actually accessible (try in a browser)
  2. Simplify or remove waitForSelector — the element may not exist
  3. Consider using the async API for long-running scrapes

Batch Error Handling

For batch scraping, track success and failure rates per domain:

interface BatchResult { url: string; success: boolean; html?: string; error?: string; } async function scrapeBatch(urls: string[]): Promise<BatchResult[]> { const results: BatchResult[] = []; const concurrency = 5; for (let i = 0; i < urls.length; i += concurrency) { const batch = urls.slice(i, i + concurrency); const batchResults = await Promise.allSettled( batch.map(async (url) => { const result = await scrapeWithRetry(url); return { url, ...result }; }) ); for (const settled of batchResults) { if (settled.status === 'fulfilled') { results.push(settled.value); } else { results.push({ url: 'unknown', success: false, error: settled.reason.message, }); } } } // Log summary const succeeded = results.filter(r => r.success).length; const failed = results.length - succeeded; console.log(`Batch complete: ${succeeded} succeeded, ${failed} failed`); return results; }

Key Takeaways

  1. Always implement retries — transient failures are normal in scraping
  2. Use exponential backoff — don't hammer a failing service
  3. Don't retry 4xx errors (except 429) — they indicate a problem with your request
  4. Track failure rates — sudden drops in success rate indicate a problem to investigate
  5. Log failures — you need the context to debug issues later

For more on the API endpoints and error codes, see the API Reference.