HeadlinesBriefing favicon HeadlinesBriefing.com

Cloudflare's Browser Rendering Launches Async Website Crawling API in Beta

Hacker News •
×

Cloudflare's Browser Rendering service introduced a beta /crawl endpoint enabling developers to scrape entire websites via a single API call. Submit a starting URL, and the system automatically discovers pages, renders them in headless browsers, and returns data in HTML, Markdown, or structured JSON formats. This tool streamlines tasks like training AI models, building RAG pipelines, and monitoring site-wide content changes.

Crawling operates asynchronously: users submit a request via curl, receive a job ID, and poll for results as pages process. The API supports curl commands for initiation and status checks, simplifying integration for developers. Features include adjustable crawl depth, URL path exclusions, and sitemap/page link discovery.

Key technical highlights: automatic URL discovery from sitemaps or hyperlinks, incremental crawling to skip unchanged pages via modifiedSince/maxAge parameters, and a static mode (render: false) that fetches HTML without browser overhead. The tool adheres to robots.txt directives, including crawl-delay rules, ensuring ethical scraping practices.

Available on Cloudflare's Workers Free and Paid plans, the API caters to SEO analysis, competitive intelligence, and large-scale data aggregation. Developers are advised to review robots.txt best practices when configuring crawls to avoid access issues.