How to Scrape JavaScript-Heavy Websites

FastWebScraper Team5 min read

If you've ever scraped a website and gotten back empty <div> tags where data should be, you've hit the JavaScript rendering problem. Most modern websites — built with React, Vue, Angular, or Next.js — load their content dynamically after the initial page load.

A simple HTTP GET request only returns the server-rendered HTML shell. The actual data is fetched and rendered by JavaScript running in the browser. To scrape this data, you need a different approach.

The Problem: Static HTML vs. Dynamic Content

When you make a request with fetch, requests, or curl, you get what the server sends in the initial HTTP response. For a JavaScript-heavy site, that often looks like this:

<html> <body> <div id="root"></div> <script src="/bundle.js"></script> </body> </html>

The actual product listings, prices, and text content are loaded by bundle.js after the page loads in a browser. Your HTTP request never executes that JavaScript, so the data simply isn't there.

Solution 1: Headless Browser Rendering

The most reliable approach is to render the page in a real browser environment. FastWebScraper runs headless Chromium instances behind the scenes, so you get the fully rendered DOM — including all dynamically loaded content.

const response = await fetch('https://api.fastwebscraper.com/v1/scrape/sync', { method: 'POST', headers: { 'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ url: 'https://react-spa.example.com/products', mode: 'auto', waitForSelector: '.product-card', // Wait until product cards render }), }); const { data } = await response.json(); // data.html contains the fully rendered page with all products

Using waitForSelector

The waitForSelector parameter is critical for JavaScript-heavy pages. It tells the scraper to wait until a specific CSS selector appears in the DOM before capturing the HTML.

Tips for choosing selectors:

  • Use selectors that only appear after data loads (e.g., .product-card, [data-loaded="true"])
  • Avoid generic selectors like div or body — they exist in the initial HTML
  • Combine selectors with commas for pages with variable structure: '.product-card, .no-results'
  • Use data attributes when class names are auto-generated (common in React/CSS-in-JS)

Solution 2: Intercept API Calls

Many JavaScript-heavy sites fetch data from internal APIs. Instead of scraping the rendered HTML, you can sometimes call those APIs directly.

How to find internal APIs:

  1. Open Chrome DevTools → Network tab
  2. Load the page and filter by XHR/Fetch
  3. Look for JSON responses containing the data you need
  4. Note the URL, headers, and parameters

If the API is accessible without authentication, you can call it directly — no rendering needed. This approach is faster and more reliable than parsing HTML — but it doesn't always work. Many sites protect their APIs with authentication tokens, CORS restrictions, or anti-bot measures. In those cases, headless browser rendering is the fallback.

Solution 3: Pre-rendering Detection

Some sites use server-side rendering (SSR) or static site generation (SSG), which means the initial HTML already contains the data. But they might still load additional content dynamically.

Identifying SSR vs. CSR:

  • View page source in the browser (not DevTools). If you see the actual data in the source, it's server-rendered.
  • If the source is mostly empty <div> containers, it's client-side rendered and needs a headless browser.

SSR sites are simpler to scrape — a basic HTTP request gets you the data without JavaScript rendering. FastWebScraper automatically detects this and optimizes accordingly.

Handling Common JavaScript Frameworks

React / Next.js

React apps are single-page applications (SPAs) that render entirely in the browser. Next.js adds server-side rendering, but dynamic routes and data fetching still happen client-side.

Strategy: Use waitForSelector targeting the rendered content container. Avoid relying on React-generated class names (they change between builds).

Vue / Nuxt

Similar to React, but Vue apps often use more predictable class naming. Nuxt provides SSR by default, so check if the initial HTML contains your data before adding wait selectors.

Angular

Angular apps are typically fully client-rendered. They tend to be slower to render than React or Vue, so you may need longer timeouts.

Infinite Scroll

Sites that load content as you scroll (like social media feeds or product listings) require special handling:

  1. The initial render only shows the first batch of items
  2. More items load as the user scrolls down
  3. You typically get 10-20 items per "page"

Approach: Look for pagination parameters in the underlying API calls rather than trying to simulate scrolling. This is more reliable and faster.

Practical Example: Scraping a React E-Commerce Site

Here's a complete example of scraping a JavaScript-heavy product page:

import * as cheerio from 'cheerio'; // Step 1: Scrape the rendered page const response = await fetch('https://api.fastwebscraper.com/v1/scrape/sync', { method: 'POST', headers: { 'X-API-Key': 'YOUR_API_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ url: 'https://react-store.example.com/category/electronics', mode: 'auto', waitForSelector: '[data-product-id]', }), }); const result = await response.json(); const html = result.data.html; // Step 2: Parse the rendered HTML const $ = cheerio.load(html); $('[data-product-id]').each((_, el) => { const name = $(el).find('.product-name').text().trim(); const price = $(el).find('.product-price').text().trim(); console.log(`${name}: ${price}`); });

Key Takeaways

  1. Check the source first — if data is in the initial HTML, you don't need JavaScript rendering
  2. Use waitForSelector to ensure dynamic content is loaded before capturing
  3. Look for underlying APIs — they're often simpler to scrape than rendered HTML
  4. Use data attributes over class names for more stable selectors
  5. FastWebScraper handles rendering — you don't need to manage headless browsers yourself

For more on configuring scrape jobs, see the API Reference. For specific industry examples, check out our use cases.