Our rebrowser-patches library that we released some time ago got really good feedback from the web automation community. It significantly increased success rates and removed captcha challenges from many major anti-automation players on the market.
However, our approach has one flaw: when you use always isolated mode (REBROWSER_PATCHES_RUNTIME_FIX_MODE=alwaysIsolated
), it means that all your code is going to be executed in a separate isolated JS context which doesn't have access to the main context of the page.
To discover detailed reasons for that, you can read our previous post How to fix Runtime.Enable CDP detection of Puppeteer, Playwright and other automation libraries?.
For many people, it's a dealbreaker as they have to deal with some JS objects that are defined in the main context.
Here is a real use case from one of our customers. They've been running this code to catch the moment when the recaptcha script was loaded:
await page.waitForFunction(`typeof window.grecaptcha.execute === 'function'`)
After the patch is applied, this code will be executed in a new isolated context that will never have the object window.grecaptcha
.
To get around this limitation, we can borrow an idea from Chrome docs for extension developers, more specifically "Communication with the embedding page".
This approach leverages window messages to communicate between contexts. page.evaluateOnNewDocument
is not patched and still uses the main context to execute the script, so we can inject some good stuff in there.
Let's try to create a proof of concept.
// add event listener for window messages (executed in the main context) await page.evaluateOnNewDocument(() => { window.addEventListener('message', (event) => { console.log('[main] msg', event) if (!event.data.scriptId || event.data.fromMain) { // ignore messages without scriptId and from ourselves (from main context) return } const response = { scriptId: event.data.scriptId, fromMain: true, } try { response.result = eval(event.data.scriptText) } catch (err) { response.error = err.message } window.postMessage(JSON.parse(JSON.stringify(response))) }) }) await page.goto('https://bot-detector.rebrowser.net', { waitUntil: 'load' }) // add a helper that we can reuse (executed in an isolated context) await page.evaluate(() => { // listen for messages from main and emit custom event with a response for a specific scriptId window.addEventListener('message', (event) => { if (!(event.data.scriptId && event.data.fromMain)) { // ignore irrelevant messages return } console.log('[isolated] msg', event) window.dispatchEvent(new CustomEvent(`scriptId-${event.data.scriptId}`, { detail: event.data })) }) // a helper that can be reused in other page.evaluate calls window.evaluateMain = (scriptFn) => { // generate unique scriptId for each call window.evaluateMainScriptId = (window.evaluateMainScriptId || 0) + 1 const scriptId = window.evaluateMainScriptId return new Promise(resolve => { // listen for the response window.addEventListener(`scriptId-${scriptId}`, (event) => { resolve(event.detail) }, { once: true, }) // prepare and send a message for the main context let scriptText = scriptFn if (typeof scriptText !== 'string') { scriptText = `(${scriptText.toString()})()` } window.postMessage({ scriptId, scriptText, }) }) } }) // use our helper to evaluate code in the main context await page.evaluate(() => window.evaluateMain(() => document.getElementsByClassName('div'))
Boom. This code successfully passes main world execution test in rebrowser-bot-detector.
Using this approach, you might get this error:
Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: ...
It means that your page has CSP prohibiting eval()
that we used in the code above.
You could use page.setBypassCSP(true)
to fix this issue, but it's not recommended as it could be detected by a remote website quite easily. You can read more in rebrowser-bot-detector.
Another way to fix it is to not use eval()
at all. So, instead of:
response.result = eval(event.data.scriptText)
You can use more explicit code:
await page.evaluate(() => window.evaluateMain(JSON.stringify({ function: 'document.getElementById', args: ['detections-json'], })) // ... const scriptData = JSON.parse(event.data.scriptText) if (scriptData.function === 'document.getElementById') { response.result = document.getElementById(...scriptData.args) }
This code won't break any CSP and will return the same result. Yes, it's more explicit and less flexible as you need to edit it every time you need to introduce a new function, but it gets the job done.
Yes, but no.
Yes, because they can just add window.addEventListener('message', ...)
to their script and they will receive all the messages from your isolated context. So, they can check the message for the scriptId
property and flag you as a suspicious guy who reads Rebrowser blogs.
But no, because the messages mechanism is used on many major websites for legitimate reasons - to communicate with web workers, for example. Also, a huge number of extensions use it for communication, too. So, the fact of the presence of any window messages on the page is just not enough to conclude that you're using any kind of automation.
So, you can adjust the code and instead of scriptId, use userId or anything else, and change scriptText to just text. It's quite impossible for an anti-automation script to know about all the cases on all the websites. There are quite low chances that it's going to be ever detected if you just copy-paste the code from this post. Unless it becomes so popular that this approach will be a default in any automation script 🤔
Now you've got your code running in an isolated context but having access to the main world objects. Congrats!
To test your code for automation detections and to try this approach, you can use rebrowser-bot-detector. Safe automation!