Does your company rely on browser automation or web scraping? We have a wild offer for our early customers! Read more →

How to Access Main Context Objects from Isolated Context in Puppeteer & Playwright

published 12 days ago
by Jennifer Taylor

Our rebrowser-patches library that we released some time ago got really good feedback from the web automation community. It significantly increased success rates and removed captcha challenges from many major anti-automation players on the market.

However, our approach has one flaw: when you use always isolated mode (REBROWSER_PATCHES_RUNTIME_FIX_MODE=alwaysIsolated), it means that all your code is going to be executed in a separate isolated JS context which doesn't have access to the main context of the page.

To discover detailed reasons for that, you can read our previous post How to fix Runtime.Enable CDP detection of Puppeteer, Playwright and other automation libraries?.

For many people, it's a dealbreaker as they have to deal with some JS objects that are defined in the main context.

Here is a real use case from one of our customers. They've been running this code to catch the moment when the recaptcha script was loaded:

await page.waitForFunction(`typeof window.grecaptcha.execute === 'function'`)

After the patch is applied, this code will be executed in a new isolated context that will never have the object window.grecaptcha.

To get around this limitation, we can borrow an idea from Chrome docs for extension developers, more specifically "Communication with the embedding page".

This approach leverages window messages to communicate between contexts. page.evaluateOnNewDocument is not patched and still uses the main context to execute the script, so we can inject some good stuff in there.

Let's try to create a proof of concept.

// add event listener for window messages (executed in the main context)
await page.evaluateOnNewDocument(() => {
  window.addEventListener('message', (event) => {
    console.log('[main] msg', event)
    if (!event.data.scriptId || event.data.fromMain) {
      // ignore messages without scriptId and from ourselves (from main context)
      return
    }

    const response = {
      scriptId: event.data.scriptId,
      fromMain: true,
    }
    try {
      response.result = eval(event.data.scriptText)
    } catch (err) {
      response.error = err.message
    }

    window.postMessage(JSON.parse(JSON.stringify(response)))
  })
})

await page.goto('https://bot-detector.rebrowser.net', { waitUntil: 'load' })

// add a helper that we can reuse (executed in an isolated context)
await page.evaluate(() => {
  // listen for messages from main and emit custom event with a response for a specific scriptId
  window.addEventListener('message', (event) => {
    if (!(event.data.scriptId && event.data.fromMain)) {
      // ignore irrelevant messages
      return
    }
    console.log('[isolated] msg', event)
    window.dispatchEvent(new CustomEvent(`scriptId-${event.data.scriptId}`, { detail: event.data }))
  })

  // a helper that can be reused in other page.evaluate calls
  window.evaluateMain = (scriptFn) => {
    // generate unique scriptId for each call
    window.evaluateMainScriptId = (window.evaluateMainScriptId || 0) + 1
    const scriptId = window.evaluateMainScriptId
    return new Promise(resolve => {
      // listen for the response
      window.addEventListener(`scriptId-${scriptId}`, (event) => {
        resolve(event.detail)
      }, {
        once: true,
      })

      // prepare and send a message for the main context
      let scriptText = scriptFn
      if (typeof scriptText !== 'string') {
        scriptText = `(${scriptText.toString()})()`
      }
      window.postMessage({
        scriptId,
        scriptText,
      })
    })
  }
})

// use our helper to evaluate code in the main context
await page.evaluate(() => window.evaluateMain(() => document.getElementsByClassName('div'))

Boom. This code successfully passes main world execution test in rebrowser-bot-detector.

Issues with unsafe-eval

Using this approach, you might get this error:

Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: ...

It means that your page has CSP prohibiting eval() that we used in the code above.

You could use page.setBypassCSP(true) to fix this issue, but it's not recommended as it could be detected by a remote website quite easily. You can read more in rebrowser-bot-detector.

Another way to fix it is to not use eval() at all. So, instead of:

response.result = eval(event.data.scriptText)

You can use more explicit code:

await page.evaluate(() => window.evaluateMain(JSON.stringify({
  function: 'document.getElementById',
  args: ['detections-json'],
}))
// ...
const scriptData = JSON.parse(event.data.scriptText)
if (scriptData.function === 'document.getElementById') {
  response.result = document.getElementById(...scriptData.args)
}

This code won't break any CSP and will return the same result. Yes, it's more explicit and less flexible as you need to edit it every time you need to introduce a new function, but it gets the job done.

Can it be detected by anti-automation solutions?

Yes, but no.

Yes, because they can just add window.addEventListener('message', ...) to their script and they will receive all the messages from your isolated context. So, they can check the message for the scriptId property and flag you as a suspicious guy who reads Rebrowser blogs.

But no, because the messages mechanism is used on many major websites for legitimate reasons - to communicate with web workers, for example. Also, a huge number of extensions use it for communication, too. So, the fact of the presence of any window messages on the page is just not enough to conclude that you're using any kind of automation.

So, you can adjust the code and instead of scriptId, use userId or anything else, and change scriptText to just text. It's quite impossible for an anti-automation script to know about all the cases on all the websites. There are quite low chances that it's going to be ever detected if you just copy-paste the code from this post. Unless it becomes so popular that this approach will be a default in any automation script 🤔

What's next?

Now you've got your code running in an isolated context but having access to the main world objects. Congrats!

To test your code for automation detections and to try this approach, you can use rebrowser-bot-detector. Safe automation!

Jennifer Taylor
Author
Jennifer Taylor
Marketing Manager
Jennifer is a seasoned marketing manager from Phoenix, Arizona. She specializes in digital marketing and brand storytelling, with a decade of experience driving successful campaigns. Outside work, Jennifer enjoys rock climbing and volunteers at animal shelters. Her innovative strategies have earned her recognition in Arizona's marketing community.
Try Rebrowser for free. Join our waitlist.
Due to high demand, Rebrowser is currently available by invitation only.
We're expanding our user base daily, so join our waitlist today.
Just share your email to unlock a new world of seamless automation.
Get invited within 7 days
No credit card required
No spam
Other Posts
datacenter-proxies-vs-residential-proxies-which-to-choose-in-2024
Datacenter and residential proxies serve different purposes in online activities. Learn their distinctions, advantages, and ideal applications to make informed decisions for your web tasks.
published 3 months ago
by Robert Wilson
how-to-scrape-seatgeek-com-protected-by-datadome-in-2024
This article presents a technical analysis of SeatGeek.com's data protection measures, focusing on the challenges posed by DataDome's anti-bot system. The study explores potential methodologies for accessing publicly available ticket information at scale.
published 2 days ago
by Jennifer Taylor
what-is-ip-leak-understanding-preventing-and-protecting-your-online-privacy
Discover what IP leaks are, how they occur, and effective ways to protect your online privacy. Learn about VPNs, proxy servers, and advanced solutions like Rebrowser for maintaining anonymity online.
published 2 months ago
by Jennifer Taylor
pay-per-gb-vs-pay-per-ip-choosing-the-right-proxy-pricing-model-for-your-needs
Explore the differences between Pay-Per-GB and Pay-Per-IP proxy pricing models. Learn which option suits your needs best and how to maximize value in your proxy usage.
published 3 months ago
by Jennifer Taylor
what-to-do-when-your-facebook-ad-account-is-disabled
Learn expert strategies to recover your disabled Facebook ad account, understand common reasons for account suspension, and prevent future issues. Discover step-by-step solutions and best practices for maintaining a healthy ad account.
published 3 months ago
by Robert Wilson
how-to-fix-runtime-enable-cdp-detection-of-puppeteer-playwright-and-other-automation-libraries
Here's the story of how we fixed Puppeteer to avoid the Runtime.Enable leak - a trick used by all major anti-bot companies. We dove deep into the code, crafted custom patches, and emerged with a solution that keeps automation tools running smoothly under the radar.
published a month ago
by Jennifer Taylor