Web Scraper API

GET /scrape

Endpoint to interact with Web Scraper API.

urlstringrequired

URL to scrape.

proxyIdstring

Proxy server ID, can be found in Dashboard / Proxies. You can specify either proxyId or proxyUrl, but not both at the same time.

proxyUrlstring

Proxy server URL. HTTP and SOCKS5 protocols supported. Use format: proto://user:password@host:port. If proxyUrl is missing, then it will use a proxy from a profile.

profileIdnumber

Profile ID to use for the browser session. If this param is missing, first available random profile will be picked automatically.

timeoutMsnumberdefault: 60000

Maximum timeout to get the response in milliseconds.

debugModebooleandefault: false

Use debug mode when reporting issues to the support team.

fieldsjson

Array of field names to include in the scrape results. Leave empty to return all fields.

// possible values
[
  "requestHeaders",
  "responseHeaders",
  "responseBody",
  "responseBodyLengthBytes",
  "responseDurationMs",
  "responseStatus",
  "responseStatusText",
  "pageContent",
  "xhrs",
  "cookies",
  "cookiesString"
]

useBrowserbooleandefault: false

Open and use browser for this scrape or just send a raw request that looks like a browser request.

cleanProfileDatabooleanOnly when useBrowser = true

Clean profile data (cookies and other) before navigating to the target URL.

xhrsSettingsjsonOnly when useBrowser = true

Settings to filter XHRs in the scrape results. Leave empty to disable xhrs capturing mode.

{
  "url": {
    // these keys could be used all together or one by one
    "startsWith": ["targetwebsite.com"],
    "includes": ["api/events"],
    "endsWith": [".json"]
  }
}

resourcesSettingsjsonOnly when useBrowser = true

Page resources settings. Leave empty to allow everything.

{
  // all possible resource types
  "blockTypes": [
    "document",
    "stylesheet",
    "image",
    "media",
    "font",
    "script",
    "texttrack",
    "xhr",
    "fetch",
    "eventsource",
    "websocket",
    "manifest",
    "other"
  ],
  "blockUrls": {
    // same rules as in xhrs filter 
    "startsWith": [],
    "includes": [],
    "endsWith": []
  }
}

waitForSettingsjsondefault: { "pageLifecycleEvent": "domcontentload" }Only when useBrowser = true

Settings to define what to wait for after the page is loaded.

{
  // wait for specific xhr to be loaded
  "xhr": {
    "url": {
      // same rules as in xhrs filter 
      "startsWith": [],
      "includes": [],
      "endsWith": []
    }
  },
  // wait for a selector
  "elementSelector": "body",
  // sleep for X milliseconds
  "waitMs": 5000,
  /*
    wait for a lifecycle event, possible values:
    domcontentloaded (default): Wait until the DOM is loaded
    load: Wait until the page is fully loaded
    networkidle0: Wait until there are no more than 0 network connections for at least 500 ms
    networkidle2: Wait until there are no more than 2 network connections for at least 500 ms
  */
  "pageLifecycleEvent": "domcontentload"
}

headersjsonOnly when useBrowser = false

Headers to use in a raw request. Leave empty to use headers from a real browser on a windows machine.

Important: it doesn't use any cookies from the profile, you have to specify cookie header manually.

methodstringdefault: GETOnly when useBrowser = false

HTTP method to use.

bodystringOnly when useBrowser = false

Request body for POST and PUT methods.

followLocationbooleandefault: trueOnly when useBrowser = false

Follow redirects.

modestringdefault: sync

Scrape mode, possible values: sync, async.

scriptIdstring

Script ID, used for custom scripts made by our team for enterprise clients. Just let us know if the scraper doesn't work for you, we will figure something out.

GET /getScrapeResult

Get result by scrape ID.

idstringrequired

Scrape ID.

← Runs

Custom CDP Methods →