Web Scraper API
GET /scraper
Endpoint to interact with Web Scraper API.
Proxy server ID, can be found in Dashboard / Proxies. You can specify either proxyId or proxyUrl, but not both at the same time.
Proxy server URL. HTTP and SOCKS5 protocols supported. Use format: proto://user:password@host:port. If proxyUrl is missing, then it will use a proxy from a profile.
Profile ID to use for the browser session. If this param is missing, first available random profile will be picked automatically.
Array of field names to include in the scrape results. Leave empty to return all fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
// possible values [ "requestHeaders", "responseHeaders", "responseBody", "responseBodyLengthBytes", "responseDurationMs", "responseStatus", "responseStatusText", "pageContent", "xhrs", "cookies", "cookiesString" ]
Open and use browser for this scrape or just send a raw request that looks like a browser request.
Clean profile data (cookies and other) before navigating to the target URL.
Settings to filter XHRs in the scrape results. Leave empty to disable xhrs capturing mode.
1 2 3 4 5 6 7 8
{ "url": { // these keys could be used all together or one by one "startsWith": ["targetwebsite.com"], "includes": ["api/events"], "endsWith": [".json"] } }
Page resources settings. Leave empty to allow everything.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
{ // all possible resource types "blockTypes": [ "document", "stylesheet", "image", "media", "font", "script", "texttrack", "xhr", "fetch", "eventsource", "websocket", "manifest", "other" ], "blockUrls": { // same rules as in xhrs filter "startsWith": [], "includes": [], "endsWith": [] } }
Settings to define what to wait for after the page is loaded.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
{ // wait for specific xhr to be loaded "xhr": { "url": { // same rules as in xhrs filter "startsWith": [], "includes": [], "endsWith": [] } }, // wait for a selector "elementSelector": "body", // sleep for X milliseconds "waitMs": 5000, /* wait for a lifecycle event, possible values: domcontentloaded (default): Wait until the DOM is loaded load: Wait until the page is fully loaded networkidle0: Wait until there are no more than 0 network connections for at least 500 ms networkidle2: Wait until there are no more than 2 network connections for at least 500 ms */ "pageLifecycleEvent": "domcontentload" }
Headers to use in a raw request. Leave empty to use headers from a real browser on a windows machine.