Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix page_options for PlaywrightBrowserPlugin #796

Merged
merged 3 commits into from
Dec 16, 2024

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Dec 9, 2024

Description

  • fix page_options for PlaywrightBrowserPlugin

Issues

Testing

  • Add test for check workability page_options in PlaywrightBrowserPlugin

Checklist

  • CI passed

@Mantisus
Copy link
Collaborator Author

Mantisus commented Dec 9, 2024

My initial thought was to change the name of page_options to context_options. However, I think this might confuse users who are not so familiar with playwright. Especially considering that the parameters taken by new_context and new_page in playwright are identical.

So I've updated the docstring to explicitly state that these parameters are passed when the context is created.

@vdusek vdusek requested review from Pijukatel and vdusek December 10, 2024 08:32
@Pijukatel Pijukatel self-requested a review December 10, 2024 17:08
@Mantisus Mantisus self-assigned this Dec 12, 2024
@janbuchar janbuchar merged commit bd3bdd4 into apify:master Dec 16, 2024
23 checks passed
"""Create a new browser context with the specified proxy settings."""
if self._header_generator:
common_headers = self._header_generator.get_common_headers()
sec_ch_ua_headers = self._header_generator.get_sec_ch_ua_headers(browser_type=self.browser_type)
user_agent_header = self._header_generator.get_user_agent_header(browser_type=self.browser_type)
extra_http_headers = dict(common_headers | sec_ch_ua_headers | user_agent_header)
user_agent = user_agent_header.get('User-Agent')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was here to replace the default "headless"-something user agent. Why is it removed? Does it mean we're now using the default (headless) user agent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test to double-check the default UA is not leaking, it burned as several times in the past in the JS version.

I think we have it tested on other places too, found this one now

https://github.com/apify/crawlee/blob/fdd895e94e848c7e5545a2e239f24130d341db61/test/core/crawlers/playwright_crawler.test.ts#L88

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace page_options with context_options for PlaywrightBrowserPlugin
5 participants