Error Handling and Retry Strategies

Scraping jobs fail. Websites go down, anti-bot systems block requests, and network connections drop. A reliable scraping pipeline needs proper error handling to recover from failures without losing data.

Common Error Types

HTTP-Level Errors

Status Code	Meaning	Action
400	Bad request (invalid parameters)	Fix the request, don't retry
401	Invalid or missing API key	Check your API key
429	Rate limited	Back off and retry after delay
500	Server error	Retry with backoff
503	Service temporarily unavailable	Retry with backoff

Scrape-Level Errors

These appear in the job result when using the async API:

TIMEOUT: The target page took too long to load. Try increasing timeout or simplifying the waitForSelector.
BLOCKED: The target site blocked the request. Try a different mode (upgrade to browser_stealth or http_stealth).
NAVIGATION_ERROR: The URL failed to load (DNS error, SSL error, etc.). Verify the URL is correct and accessible.
SELECTOR_TIMEOUT: The waitForSelector element never appeared. The page structure may have changed.

Retry Strategy: Exponential Backoff

The most effective retry strategy is exponential backoff — each retry waits longer than the last. This prevents hammering a failing service and gives transient issues time to resolve.

Pattern: Wait 2^attempt seconds between retries (1s, 2s, 4s, 8s...).

Node.js / TypeScript

interface ScrapeResult {
  success: boolean;
  data?: { html: string; jobId: string };
  error?: { code: string; message: string };
}

async function scrapeWithRetry(
  url: string,
  options: { mode?: string; waitForSelector?: string } = {},
  maxRetries = 3
): Promise<ScrapeResult> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(
        'https://api.fastwebscraper.com/v1/scrape/sync',
        {
          method: 'POST',
          headers: {
            'X-API-Key': 'YOUR_API_KEY',
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            url,
            mode: options.mode ?? 'auto',
            ...options,
          }),
          signal: AbortSignal.timeout(60000),
        }
      );

      // Don't retry client errors (4xx except 429)
      if (response.status >= 400 && response.status < 500 && response.status !== 429) {
        const error = await response.json();
        return { success: false, error: error.error };
      }

      // Retry on 429 (rate limited) and 5xx errors
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }

      const result = await response.json();

      // Check if the scrape itself succeeded
      if (result.data?.html) {
        return { success: true, data: result.data };
      }

      throw new Error(result.error?.message ?? 'Empty response');
    } catch (error) {
      const delay = Math.pow(2, attempt) * 1000;
      console.warn(
        `Attempt ${attempt + 1}/${maxRetries} failed for ${url}: ${error.message}. ` +
        `Retrying in ${delay}ms...`
      );

      if (attempt < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, delay));
      }
    }
  }

  return {
    success: false,
    error: { code: 'MAX_RETRIES', message: `Failed after ${maxRetries} attempts` },
  };
}

// Usage
const result = await scrapeWithRetry('https://example.com/product/123', {
  mode: 'auto',
  waitForSelector: '.price',
});

if (result.success) {
  console.log('Scraped successfully:', result.data!.html.length, 'chars');
} else {
  console.error('Scrape failed:', result.error!.message);
}

Python

import requests
import time
from typing import Optional

def scrape_with_retry(
    url: str,
    mode: str = 'auto',
    wait_for_selector: Optional[str] = None,
    max_retries: int = 3,
) -> dict:
    """Scrape a URL with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            body = {
                'url': url,
                'mode': mode,
            }
            if wait_for_selector:
                body['waitForSelector'] = wait_for_selector

            response = requests.post(
                'https://api.fastwebscraper.com/v1/scrape/sync',
                headers={
                    'X-API-Key': 'YOUR_API_KEY',
                    'Content-Type': 'application/json',
                },
                json=body,
                timeout=60,
            )

            # Don't retry client errors (except 429)
            if 400 <= response.status_code < 500 and response.status_code != 429:
                return {'success': False, 'error': response.json().get('error')}

            response.raise_for_status()
            result = response.json()

            if result.get('data', {}).get('html'):
                return {'success': True, 'data': result['data']}

            raise Exception(result.get('error', {}).get('message', 'Empty response'))

        except Exception as e:
            delay = (2 ** attempt)
            print(f'Attempt {attempt + 1}/{max_retries} failed for {url}: {e}')

            if attempt < max_retries - 1:
                print(f'Retrying in {delay}s...')
                time.sleep(delay)

    return {
        'success': False,
        'error': {'code': 'MAX_RETRIES', 'message': f'Failed after {max_retries} attempts'}
    }

# Usage
result = scrape_with_retry(
    'https://example.com/product/123',
    mode='auto',
    wait_for_selector='.price',
)

if result['success']:
    print(f'Scraped: {len(result["data"]["html"])} chars')
else:
    print(f'Failed: {result["error"]["message"]}')

C#

using System.Net.Http.Json;

async Task<(bool Success, JsonElement? Data, string? Error)> ScrapeWithRetry(
    HttpClient client,
    string url,
    string mode = "auto",
    string? waitForSelector = null,
    int maxRetries = 3)
{
    for (int attempt = 0; attempt < maxRetries; attempt++)
    {
        try
        {
            var request = new Dictionary<string, string>
            {
                ["url"] = url,
                ["mode"] = mode,
            };
            if (waitForSelector != null)
                request["waitForSelector"] = waitForSelector;

            var response = await client.PostAsJsonAsync(
                "https://api.fastwebscraper.com/v1/scrape/sync", request);

            // Don't retry client errors (except 429)
            if ((int)response.StatusCode >= 400
                && (int)response.StatusCode < 500
                && (int)response.StatusCode != 429)
            {
                var errorResult = await response.Content
                    .ReadFromJsonAsync<JsonElement>();
                return (false, null, errorResult.ToString());
            }

            response.EnsureSuccessStatusCode();
            var result = await response.Content
                .ReadFromJsonAsync<JsonElement>();

            var data = result.GetProperty("data");
            if (data.TryGetProperty("html", out var html)
                && html.GetString()?.Length > 0)
            {
                return (true, data, null);
            }

            throw new Exception("Empty response");
        }
        catch (Exception ex)
        {
            var delay = (int)Math.Pow(2, attempt) * 1000;
            Console.WriteLine(
                $"Attempt {attempt + 1}/{maxRetries} failed: {ex.Message}");

            if (attempt < maxRetries - 1)
                await Task.Delay(delay);
        }
    }

    return (false, null, $"Failed after {maxRetries} attempts");
}

// Usage
using var client = new HttpClient { Timeout = TimeSpan.FromSeconds(60) };
client.DefaultRequestHeaders.Add("X-API-Key", "YOUR_API_KEY");

var (success, data, error) = await ScrapeWithRetry(
    client,
    "https://example.com/product/123",
    waitForSelector: ".price");

if (success)
{
    var html = data!.Value.GetProperty("html").GetString()!;
    Console.WriteLine($"Scraped: {html.Length} chars");
}
else
{
    Console.WriteLine($"Failed: {error}");
}

Handling Specific Failure Modes

Rate Limiting (429)

When you receive a 429 response, the API is asking you to slow down. The correct response is to:

Wait longer before retrying (respect Retry-After header if present)
Reduce your request concurrency
Spread requests more evenly over time

Blocked Requests

If a scrape returns a BLOCKED error, the target site detected and rejected the request. Try:

Upgrade to a stealth mode (http_stealth or browser_stealth) if using http or browser
Add country targeting to match the site's expected visitors
Add a waitForSelector to handle CAPTCHA challenge pages
Reduce request frequency to the same domain

Timeout Errors

If pages consistently time out:

Check if the site is actually accessible (try in a browser)
Simplify or remove waitForSelector — the element may not exist
Consider using the async API for long-running scrapes

Batch Error Handling

For batch scraping, track success and failure rates per domain:

interface BatchResult {
  url: string;
  success: boolean;
  html?: string;
  error?: string;
}

async function scrapeBatch(urls: string[]): Promise<BatchResult[]> {
  const results: BatchResult[] = [];
  const concurrency = 5;

  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);
    const batchResults = await Promise.allSettled(
      batch.map(async (url) => {
        const result = await scrapeWithRetry(url);
        return { url, ...result };
      })
    );

    for (const settled of batchResults) {
      if (settled.status === 'fulfilled') {
        results.push(settled.value);
      } else {
        results.push({
          url: 'unknown',
          success: false,
          error: settled.reason.message,
        });
      }
    }
  }

  // Log summary
  const succeeded = results.filter(r => r.success).length;
  const failed = results.length - succeeded;
  console.log(`Batch complete: ${succeeded} succeeded, ${failed} failed`);

  return results;
}

Key Takeaways

Always implement retries — transient failures are normal in scraping
Use exponential backoff — don't hammer a failing service
Don't retry 4xx errors (except 429) — they indicate a problem with your request
Track failure rates — sudden drops in success rate indicate a problem to investigate
Log failures — you need the context to debug issues later

For more on the API endpoints and error codes, see the API Reference.