-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improved mediafusion validation and scrape logic #932
Conversation
WalkthroughThe changes in the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
src/program/services/scrapers/mediafusion.py (3)
64-79
: Document configuration optionsThe payload configuration is well-structured, but could benefit from documentation explaining:
- Why
None
is included in the resolutions list- The implications of empty
torrent_sorting_priority
- The available options for
nudity_filter
andcertification_filter
Consider adding docstrings or comments explaining these configuration choices.
151-158
: LGTM! Consider extracting the rate limit message constantThe rate limit check is well-implemented with proper defensive programming. Consider extracting "rate-limit exceeded" as a constant for maintainability.
+RATE_LIMIT_EXCEEDED_MESSAGE = "rate-limit exceeded" + if ( not hasattr(stream, "description") and hasattr(stream, "title") - and "rate-limit exceeded" in stream.title + and RATE_LIMIT_EXCEEDED_MESSAGE in stream.title ):
161-163
: Document the series title transformationThe code splits the raw title for series, but the reasoning isn't clear. Please add a comment explaining:
- Why this transformation is needed for series
- What the expected format of raw_title is before/after the split
if scrape_type == "series": + # Extract the main title part before the episode information raw_title = raw_title.split("/")[0]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/program/services/scrapers/mediafusion.py
(6 hunks)
🔇 Additional comments (4)
src/program/services/scrapers/mediafusion.py (4)
86-92
: LGTM! Clean and well-structured request handling
The request execution is well-implemented with proper parameter specification and response type handling.
122-124
: LGTM! Clear and contextual error logging
The multi-line format improves readability while maintaining all necessary context.
168-170
: LGTM! Improved logging format
The multi-line logging format enhances readability while providing useful metrics about found streams.
37-41
: Verify the rate limit configuration
The rate limit of 1 call per 10 seconds seems quite restrictive and might impact performance. Please confirm if these values align with the service's requirements.
None, | ||
], | ||
"max_streams_per_resolution": 100, | ||
"live_search_streams": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhdzumair what is live search streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you enable this, it will fetch the streams from mediafusion scraper with time limit. Also note that, it's not going to fetch from scrapers every time. If you disable this, mediafusion will only respond with existing db data.
description_split = stream.description.replace("📂 ", "") | ||
raw_title = description_split.split("\n")[0] | ||
info_hash = re.search(r"info_hash=([A-Za-z0-9]+)", stream.url).group(1) | ||
if scrape_type == "series": | ||
raw_title = raw_title.split("/")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..
In previous updates I've been pretty back and forth on this one.. Do we want the torrent or do we want the file in these scenarios.. We would prefer the torrent I think because it would have more episodes that might be needed from that same torrent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this one. Thanks man!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's providing the episode file name. If you preferred to get the episode file name, that's also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Pull Request Check List
Resolves: #issue-number-here
Description:
Summary by CodeRabbit
New Features
Bug Fixes
Refactor