Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass optional arguments to requests.get #282

Merged
merged 3 commits into from
Sep 4, 2024
Merged

Conversation

jkawamoto
Copy link
Contributor

@jkawamoto jkawamoto commented Jul 31, 2024

It would be beneficial if users could pass optional arguments, such as authentication information and proxy configurations, to requests made by NewsPlease.from_url and NewsPlease.from_urls.

This PR adds **kwargs to the above functions and passes them to requests.get in SimpleCrawler._fetch_url.

With this change, users can add authentication information like this:

from newsplease import NewsPlease
from requests.auth import HTTPBasicAuth

NewsPlease.from_url("https://example.com", auth=HTTPBasicAuth("user", "password"))

and also add proxy configurations like this:

from newsplease import NewsPlease

proxies = {
    "http": "http://10.10.1.10:3128",
    "https": "http://10.10.1.10:1080",
}

NewsPlease.from_url("https://example.com"proxies=proxies)

Related to #234 and #254.

@fhamborg
Copy link
Owner

fhamborg commented Jul 31, 2024

Hi! Thanks for the suggestion. Sounds like an interesting extension, though I think it would be more future-proof if we add a kwargs parameter specifically for requests. For example:

from newsplease import NewsPlease

request_args = {
    "proxies": {
           "http": "http://10.10.1.10:3128",
           "https": "http://10.10.1.10:1080",
     }
}

NewsPlease.from_url("https://example.com", request_args=request_args)

This way, if at any point in the future we want to expose / forward arguments to / of other libraries used by NP, we could easily do so using the above.

It would be beneficial if users could pass optional arguments, such as authentication
information and proxy configurations, to requests made by `NewsPlease.from_url` and
`NewsPlease.from_urls`.

This commit adds `**kwargs` to the above functions and passes them to `requests.get`
in `SimpleCrawler._fetch_url`.
@jkawamoto
Copy link
Contributor Author

Thank you for your review. I added a commit so that the fetch functions now use request_args instead of **kwargs.

@fhamborg
Copy link
Owner

Thanks a lot for the changes. I'll have to think about what to do with the other two parameters of request that are directly exposed (timeout and user_agent). Currently have a tendency towards removing them to have a clean interface. I'll get back to you soon regarding this.

@fhamborg
Copy link
Owner

fhamborg commented Sep 2, 2024

For cleaner consistency of the changes introduced by the PR, could you remove timeout and user_agent from the method signature as they would be set through the new parameter the PR introduces?

Copy link
Owner

@fhamborg fhamborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

These parameters can now be passed via `request_args`.

For more details, see fhamborg#282 (comment).
@jkawamoto
Copy link
Contributor Author

I've just added a commit to remove timeout and user_agent. These parameters can now be passed via request_args.

@jkawamoto jkawamoto requested a review from fhamborg September 4, 2024 05:19
@fhamborg
Copy link
Owner

fhamborg commented Sep 4, 2024

hi @jkawamoto thanks for the quick changes! could you change the name to requests_args (note the s as in the original library name)? other than that this looks good!

edit: just noted that the function is called request, so this is actually fine. thank you!

@fhamborg fhamborg merged commit c4c3d0a into fhamborg:master Sep 4, 2024
@jkawamoto jkawamoto deleted the kwargs branch September 5, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants