Does your company rely on browser automation or web scraping? We have a wild offer for our early customers! Read more →

Web Scraper API

Endpoint to interact with Web Scraper API.

URL to scrape.

Proxy server ID, can be found in Dashboard / Proxies. You can specify either proxyId or proxyUrl, but not both at the same time.

Proxy server URL. HTTP and SOCKS5 protocols supported. Use format: proto://user:password@host:port. If proxyUrl is missing, then it will use a proxy from a profile.

Profile ID to use for the browser session. If this param is missing, first available random profile will be picked automatically.

Maximum timeout to get the response in milliseconds.

Use debug mode when reporting issues to the support team.

Array of field names to include in the scrape results. Leave empty to return all fields.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// possible values
[
  "requestHeaders",
  "responseHeaders",
  "responseBody",
  "responseBodyLengthBytes",
  "responseDurationMs",
  "responseStatus",
  "responseStatusText",
  "pageContent",
  "xhrs",
  "cookies",
  "cookiesString"
]

Open and use browser for this scrape or just send a raw request that looks like a browser request.

Clean profile data (cookies and other) before navigating to the target URL.

Settings to filter XHRs in the scrape results. Leave empty to disable xhrs capturing mode.

1
2
3
4
5
6
7
8
{
  "url": {
    // these keys could be used all together or one by one
    "startsWith": ["targetwebsite.com"],
    "includes": ["api/events"],
    "endsWith": [".json"]
  }
}

Page resources settings. Leave empty to allow everything.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  // all possible resource types
  "blockTypes": [
    "document",
    "stylesheet",
    "image",
    "media",
    "font",
    "script",
    "texttrack",
    "xhr",
    "fetch",
    "eventsource",
    "websocket",
    "manifest",
    "other"
  ],
  "blockUrls": {
    // same rules as in xhrs filter 
    "startsWith": [],
    "includes": [],
    "endsWith": []
  }
}

Settings to define what to wait for after the page is loaded.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  // wait for specific xhr to be loaded
  "xhr": {
    "url": {
      // same rules as in xhrs filter 
      "startsWith": [],
      "includes": [],
      "endsWith": []
    }
  },
  // wait for a selector
  "elementSelector": "body",
  // sleep for X milliseconds
  "waitMs": 5000,
  /*
    wait for a lifecycle event, possible values:
    domcontentloaded (default): Wait until the DOM is loaded
    load: Wait until the page is fully loaded
    networkidle0: Wait until there are no more than 0 network connections for at least 500 ms
    networkidle2: Wait until there are no more than 2 network connections for at least 500 ms
  */
  "pageLifecycleEvent": "domcontentload"
}

Headers to use in a raw request. Leave empty to use headers from a real browser on a windows machine.

HTTP method to use.

Request body for POST and PUT methods.

Script ID, used for custom scripts made by our team for enterprise clients. Just let us know if the scraper doesn't work for you, we will figure something out.