Describe the bug
It looks like sometimes browserless doesn't launch chrome with the ignoreHTTPSErrors
flag even though it's specified in the docker-compose.yml
file and/or using the URL parameter.
To Reproduce
In my docker-compose.yml
:
chrome:
image: browserless/chrome:latest
security_opt:
- "seccomp=../chromium.json"
deploy:
replicas: 2
expose:
- 3000
environment:
# https://docs.browserless.io/docs/docker.html
- DEFAULT_IGNORE_HTTPS_ERRORS=true
- ENABLE_DEBUGGER=false
- DEFAULT_IGNORE_DEFAULT_ARGS=["--no-sandbox"]
- DEFAULT_STEALTH=true
- FUNCTION_ENABLE_INCOGNITO_MODE=true
- KEEP_ALIVE=true
- PREBOOT_CHROME=true
- EXIT_ON_HEALTH_FAILURE=true
Python script using playwright:
import asyncio
import logging
from urllib.parse import urlparse
from playwright.async_api import async_playwright
from playwright._impl._api_types import Error as PlaywrightError
URLS = [
'https://bot.sannysoft.com/',
'https://arh.antoinevastel.com/bots/areyouheadless',
'https://200.70.58.134:8443/',
'https://200.55.247.6:3000/',
'https://190.227.183.117:8443/',
'https://190.221.139.82:8443/',
'https://181.30.162.226:4433/'
]
queue = asyncio.Queue()
log = logging.getLogger('play')
log.addHandler(logging.StreamHandler())
log.setLevel(logging.DEBUG)
async def screenshot(url):
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp('ws://chrome:3000/?ignoreHTTPSErrors=true&ignoreDefaultArgs=--no-sandbox')
try:
log.debug(url)
parsed_url = urlparse(url)
page = await browser.new_page()
try:
await page.goto(url, wait_until="networkidle")
except PlaywrightError as e:
if not 'CERT' in str(e):
raise
log.debug('Caught certificate error')
await page.goto(url, wait_until="networkidle")
await page.screenshot(path=f'./play/screenshot_{parsed_url.scheme}_{parsed_url.netloc}_{parsed_url.port}.png', full_page=True)
finally:
await browser.close()
async def producer():
for url in URLS:
queue.put_nowait(url)
async def consumer():
url = await queue.get()
try:
await screenshot(url)
except Exception as e:
log.error(f'{e}')
finally:
queue.task_done()
async def main():
n_threads = 8
asyncio.create_task(producer())
while queue.qsize() == 0:
await asyncio.sleep(0.1)
while queue.qsize() > 0:
tasks = [
consumer()
for _ in range(
n_threads if queue.qsize() > n_threads else queue.qsize()
)
]
await asyncio.gather(*tasks)
asyncio.run(main())
Expected behavior
Browserless should be instructing chrome to ignore HTTPS certificate errors if DEFAULT_IGNORE_HTTPS_ERRORS=true
is specified through docker or via the connection URL .
Screenshots
First run of the Python script everything is fine:
Third time running the script, certificate error is thrown when calling page.goto()
:
Additional context
Currently working around this by catching any error with CERT
in the string and calling page.goto()
again (line 35-40 in the python script, no clue why this works). Obviously this isn't ideal, chrome should be ignoring cert errors everytime it gets started if the correct knobs are turned.
Using the following script, I'm able to reproduce it, you can see it ran fine 3 times, then on the 4th..
Usingbrowserless/chrome:1.53-chrome-stable
in two different places
Tested with playwright 1.28.0
and 1.27.1
- same outcome
#!/usr/bin/python3
from playwright.sync_api import sync_playwright
import playwright._impl._api_types
# pip3 install playwright
# docker run -d --name browserless --rm -p 3000:3000 --shm-size="2g" browserless/chrome:1.53-chrome-stable
def letsgo():
print ("Trying...")
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp('ws://127.0.0.1:3000/?ignoreHTTPSErrors=true&stealth=1&--disable-web-security=true', timeout=10000)
context = browser.new_context(
bypass_csp=True,
service_workers='block',
accept_downloads=False
)
page = context.new_page()
page.on("console", lambda msg: print(f"Playwright console: Watch URL: {msg.type}: {msg.text} {msg.args}"))
page.goto("https://untrusted-root.badssl.com/", wait_until='commit')
page.wait_for_timeout(1 * 1000)
context.close()
browser.close()
if __name__ == '__main__':
while True:
letsgo()
Here
# ./test.py
Trying...
Trying...
Trying...
Traceback (most recent call last):
File "/root/./test.py", line 30, in <module>
letsgo()
File "/root/./test.py", line 22, in letsgo
page.goto("https://untrusted-root.badssl.com/", wait_until='commit')
File "/usr/local/lib/python3.10/dist-packages/playwright/sync_api/_generated.py", line 8200, in goto
self._sync(
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_sync_base.py", line 104, in _sync
return task.result()
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_page.py", line 491, in goto
return await self._main_frame.goto(**locals_to_params(locals()))
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_frame.py", line 147, in goto
await self._channel.send("goto", locals_to_params(locals()))
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 44, in send
return await self._connection.wrap_api_call(
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call
return await cb()
File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 79, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: net::ERR_CERT_AUTHORITY_INVALID at https://untrusted-root.badssl.com/
=========================== logs ===========================
navigating to "https://untrusted-root.badssl.com/", waiting until "commit"
============================================================
Owner Name | browserless |
Repo Name | chrome |
Full Name | browserless/chrome |
Language | TypeScript |
Created Date | 2017-11-17 |
Updated Date | 2023-03-22 |
Star Count | 5309 |
Watcher Count | 47 |
Fork Count | 516 |
Issue Count | 29 |
Issue Title | Created Date | Updated Date |
---|