-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Bluesky changes needed for backfeed #571
Conversation
Awesome! Looking now. And yeah sorry about the whole tag URI thing, seemed like a good idea at the time, something like 12 yrs ago. 🤷 Definitely not necessary in my book any more, hopefully didn't get in the way too much. |
Oh btw the vendored lexicons in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
My main thought is, let's just go with at://
URIs for ids, and ditch tag URIs? Is there anything in granary or Bridgy that required them?
Otherwise a few minor suggestions, but nothing else big.
granary/bluesky.py
Outdated
if type in ('app.bsky.actor.defs#profileView', | ||
'app.bsky.actor.defs#profileViewBasic'): | ||
images = [{'url': obj.get('avatar')}] | ||
banner = obj.get('banner') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was removing this intentional? Assuming not, let's keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I removed this because it's not used anywhere and it also doesn't parse properly- image
is expected to be an object with a url
member, not an array. I used mastodon.pyL490
as reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking if my understanding is actually correct here as I think I've been looking at the AS2 spec by mistake. Is image in AS on a 'person' object supposed to be an array, an object, or either? I notice a lot of the Bluesky tests assume an array and that seems to match the as.json in the test data but it doesn't match what the Mastodon silo does so I'm a bit confused!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah yeah AS1 vs AS2 can be confusing. The two useful spec links for AS1 are https://activitystrea.ms/specs/json/schema/activity-schema.html and https://activitystrea.ms/specs/json/1.0/ . granary also extends/overloads AS1 a bit here and there when we've outgrown it, and we don't have an explicity catalog of those extensions.
In this case, technically AS1 says image
is an object with a url
field (https://activitystrea.ms/specs/json/1.0/#object , https://activitystrea.ms/specs/json/1.0/#media-link ), and/or a bare string URL (https://activitystrea.ms/specs/json/schema/activity-schema.html#rfc.section.4.6 ). We sometimes need an object to have multiple images, though, so we support both lists and singular objects. Examples:
- https://github.com/snarfed/granary/blob/main/granary/tests/testdata/note_with_composite_photo.as.json
- https://github.com/snarfed/granary/blob/main/granary/tests/testdata/note_with_multiple_photos.as.json
(AS2 does this too, all of the properties not marked "Functional" can be either singular or plural. Whee! https://www.w3.org/TR/activitystreams-vocabulary/#h-properties )
We use a few helpers to deal with single vs plural, eg:
util.get_list
always returns a listutil.get_list
returns the first valueas1.get_object
always returns an object,as1.get_objects
always a list. bare string values get wrapped in an object as theid
valueas1.object_urls
returns a de-duped list of theurl
(singular) +urls
(plural) values
etc...
Sorry this isn't simpler!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also AS1 is only used internally, so we get to be a bit looser with it. We used to use it externally with Google+ and OStatus, and maybe others, but those are all long dead.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool cool. I think I'll need to make a corresponding change in bridgy itself:
diff --git a/pages.py b/pages.py
index 7d4098a2..c17f6779 100644
--- a/pages.py
+++ b/pages.py
@@ -197,10 +197,12 @@ def user(site, id):
r.response['content'] = f'{r.actor.get("displayName") or ""} {phrase}.'
# convert image URL to https if we're serving over SSL
- image_url = r.actor.setdefault('image', {}).get('url')
+ # account for fact image might be a list
+ image = util.get_first(r.actor, 'image', {})
+ image_url = image.get('url')
if image_url:
- r.actor['image']['url'] = util.update_scheme(image_url, request)
-
+ image['url'] = util.update_scheme(image_url, request)
+ r.actor['image'] = image
# generate original post links
r.links = process_webmention_links(r)
r.original_links = [util.pretty_link(url, new_tab=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! You probably want image_url = util.get_url(image)
to handle bare string values as well as objects, otherwise looks good.
granary/bluesky.py
Outdated
@@ -596,6 +593,7 @@ def to_as1(obj, type=None): | |||
reason = obj.get('reason') | |||
if reason and reason.get('$type') == 'app.bsky.feed.defs#reasonRepost': | |||
ret = { | |||
'id': obj.get('post').get('viewer').get('repost'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding , {}
to these get
s to make them fully forgiving? eg:
'id': obj.get('post').get('viewer').get('repost'), | |
'id': obj.get('post', {}).get('viewer', {}).get('repost'), |
granary/bluesky.py
Outdated
@@ -645,6 +643,10 @@ class Bluesky(Source): | |||
BASE_URL = 'https://bsky.app' | |||
NAME = 'Bluesky' | |||
TRUNCATE_TEXT_LENGTH = 300 # TODO: load from feed.post lexicon | |||
URL_CANONICALIZER = util.UrlCanonicalizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is unused here, let's move it to Bridgy? The ones in meetup.py and twitter.py are only there because they're actually used in granary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do!
granary/bluesky.py
Outdated
thread = resp.get('thread') | ||
if thread: | ||
ret = self._recurse_replies(thread) | ||
return sorted(ret, key = lambda thread: thread.get('post').get('record').get('createdAt')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto re , {}
granary/bluesky.py
Outdated
Returns: dict, AS1 like activity | ||
""" | ||
assert verb in ('like', 'share') | ||
label = 'favorited' if verb == 'like' else 'reblogged' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, my fault originally, hoping to gradually standardize these terms now
label = 'favorited' if verb == 'like' else 'reblogged' | |
label = 'liked' if verb == 'like' else 'reposted' |
granary/bluesky.py
Outdated
@@ -709,9 +711,9 @@ def post_url(cls, handle, tid): | |||
return f'{cls.user_url(handle)}/post/{tid}' | |||
|
|||
def get_activities_response(self, user_id=None, group_id=None, app_id=None, | |||
activity_id=None, fetch_replies=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these defaults as is? Esp since fetch_*
all incur extra API calls.
granary/bluesky.py
Outdated
Args: | ||
uri: string, post uri | ||
|
||
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost | |
Returns: list, Bluesky app.bsky.feed.defs#threadViewPost |
granary/bluesky.py
Outdated
def _recurse_replies(self, thread): | ||
""" | ||
Recurses through a Bluesky app.bsky.feed.defs#threadViewPost | ||
and returns its replies as an array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit
and returns its replies as an array. | |
and returns its replies as an list. |
granary/bluesky.py
Outdated
Args: | ||
thread: dict, Bluesky app.bsky.feed.defs#threadViewPost | ||
|
||
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost | |
Returns: list, Bluesky app.bsky.feed.defs#threadViewPost |
Need to double check but I think yes, actually - the |
Ah! Right. Or the reverse, they extract post id(s) out of the URL and generate tag URIs from them to look up stored https://github.com/snarfed/bridgy/blob/8550bc286411bc4cfd6b541211cc4daa023c62db/handlers.py#L226 Would this work if |
yes please, not familiar with that bit of the codebase! |
Tried this - unfortunately some places e.g.
|
Done! Updated them in main.
Ugh ok. What do you think we should do? Considering all the options, |
Honestly my usual approach with this stuff, and my gut feeling here, is that consistency is better than reducing complexity in one specific bit of the codebase. In addition I think this would add additional abstraction across the whole codebase for the sake of reducing complexity in one small bit of it. IMO it's not worth the tradeoff right now - we'd be better getting rid of tag URLs wholesale across all of Granary later down the line (aware that would come with a migration necessity) |
Hmm, ok. So are you then proposing that we use tag URIs for Bluesky ids in granary/Bridgy? And convert back and forth to |
The tag URI contains the AT URI doesn't it? Or have I misunderstood what those tag URIs actually are? Worth noting I don't think we could use raw AT URIs everywhere anyway as e.g. a like doesn't have an AT URI |
Hmm, really? I think everything has an AT URI...? They may not always be exposed at the Bluesky API level, but we can probably still construct them from the user's DID, the collection, and the like ID/TID. Regardless, as for tag URI, we make up each silo's tag URI format ourselves, so we can definitely just stick the whole AT URI in there if we want, true! |
Yeah currently we create likes for bluesky by doing |
Ah, ok. I remember you had the same problem with reposts, snarfed/bridgy#1453 (comment) , but eventually found their id. Looking at https://github.com/snarfed/atproto/blob/main/lexicons/app/bsky/feed/like.json , If we can find the like URI or TID anywhere else, it'd be ideal to standardize on AT URIs for everything (wrapped in tag URIs where necessary), but ok if it's not possible. I'll also ask on their developer Discord, https://discord.com/invite/3srmDsHSZJ , feel free to join me. |
@ericvolp12 is awesome, he jumped on this and added |
Fab :) We'll need it for I guess it's a lil dependent on timelines then in terms of when this makes it into Bluesky itself, but I have tests to write in the meantime. Does this change our thinking about whether or not to refactor the tag URL stuff, do you think? |
We should definitely try for AT URIs for all Bluesky objects, but I think that's orthogonal to tag URIs. Here, I'm open to either wrapping AT URIs in tag URIs, or support both as in #571 (comment) , but if we did decide to migrate everything off tag URIs, that should probably be a separate project. |
Makes sense! I think I'm actually inclined more towards keeping tag URIs as it gives us a degree of consistency between silos that might otherwise be a bit hard to grok (might be a me problem but I find navigating unfamiliar inheritance hierarchies quite difficult) |
A question about handles - there's a test failing, |
Yes! Bluesky handles are all domains. |
Ok thanks, I’ll need to change the existing implementation then :) |
Just to check though, handle would get translated to/from AS1 |
Yeah that’s my understanding! |
Hey @JoelOtter! Just checking in. Anything I can help with here? |
Hey, sorry, been away from my computer for a few days (turned 30!). Honestly I'm kind of struggling with this handle stuff - I branched off main to try and just do it against the existing stuff without my changes complicating matters but I still can't untangle it. I think there's a mixup between my assumption of everything being Bluesky and what the Granary code actually is which is intended to be forward-compatible with a future federated version of the service, hence the URLs sometimes being e.g. |
Definitely understood! You're right about the future proofing for federation, and even right now since the web frontend is separate from the PDS (backend). https://bsky.app/ is indeed the (official) web frontend, but there are already others. Likewise, https://bsky.social/ is the (official) PDS, but we'll start seeing others in prod as soon as they turn on federation. If I remember right, we're going with handles in Apart from that, doing the handle work on its own definitely makes sense. Want to jump on a call sometime? You're in the UK, right? Happy to carve out some time in the morning PT sometime if that would help. |
Yeah agreed on those points - the issue I think is that we have code to try and interpret handles from usernames or URLs, which is going to be very fragile no matter which way we slice it. I'd suggest we just require the username contain a valid handle and error out if it's missing - think that would simplify things quite a bit. I am in the UK! Though things are a bit manic throughout this summer just in terms of having lots of stuff on. I could do tomorrow (Saturday) evening if that's not too short-notice? |
@JoelOtter Yes! Tomorrow (Sat) works. 7-10p UK time (11a-2p PT) is my best window, but I can meet as early as 4p UK time (8a PT). Lmk when works for you, I can send an invite to the email in your GitHub profile. |
7pm UK time works well for me, let’s do it :) |
Thanks :) If you have the time, a few suggestions of tests that are currently missing would help me get the ball rolling. |
Sure! Off the top of my head:
|
I'm a bit confused by how Python's dependency resolution works WRT GitHub repos - from what I can tell I appear to be getting a really old version of lexrpc. Is there a recommended workflow for this - should I clone that repo? |
Ugh yeah I had to fix this in the bridgy-fed repo recently, I think the fix is to manually uninstall it first in the circle config. |
Ah it was in arroba: snarfed/arroba@07b18ea |
Hmm, how about locally? Does this mean it's a pip-cache issue? I could try clearing out that cache - I'd assumed there was one cache per venv but perhaps that is not the case UPDATE: Getting the same thing when creating a clean venv and running pip with UPDATE THE SECOND: Replacing lexrpc with a local version (latest I'm assuming this is a problem with my Python setup rather than with the project itself! |
Ugh Python envs are the worst. Sorry for the trouble. Glad you got it working! |
77caed3
to
5725ee0
Compare
Don't really understand why ATproto appear to have deleted half their docs |
Awesome! Will look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Noticed a few minor things that can be removed, but otherwise this seems right, let's ship it.
granary/bluesky.py
Outdated
'actor': obj.get('author'), | ||
'object': obj, | ||
'objectType': 'activity', | ||
'context': {'inReplyTo': obj.get('inReplyTo')}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to drop this, it's mostly vestigial, we don't really depend on it in Bridgy or anywhere else any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which bit, sorry? The non-activity case? What should I do instead, just replace the whole method with
return to_as1(post, type='app.bsky.feed.defs#feedViewPost')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, just the context
field!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha - done :)
def _make_share(self, post, actor): | ||
return self._make_like_or_share(post, actor, 'share') | ||
|
||
def _make_like_or_share(self, post, actor, verb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have like/share support in to_as1
, so we may want to use that eventually instead. Low priority though, definitely doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I might leave this for now just until I untangle what we need to do with actors/IDs/tags and whether that stuff ought to live in to_as1
or out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! Yeah I think to_as1
can do most of these, especially if we start using its existing did and handle kwargs for actor, but we can do that later.
granary/tests/test_bluesky.py
Outdated
@@ -58,9 +58,15 @@ | |||
'description': 'hi there', | |||
} | |||
|
|||
POST_OBJ_AS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops!
granary/tests/test_bluesky.py
Outdated
@patch('requests.get') | ||
def test_get_comment(self, mock_get): | ||
mock_get.return_value = requests_response({ | ||
'$type': 'app.bsky.feed.defs#threadViewPost', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop this and replies
, they're supposed to be inside thread
instead of outside it, like in test_get_activities_with_replies
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good spot!
Looks great! Merging now. Thanks for sticking with this! |
profileViewDetailed
get_activities_response
, with parsing for replies, likes and sharesuser_to_actor
andget_comment
There's likely some weirdness/inconsistency here, particularly due to a) my initial confusion about AS activities vs. objects and b) I'm probably not using tag URIs everywhere I should be. It does seem to work though!
Tests incoming.