Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bluesky changes needed for backfeed #571

Merged
merged 12 commits into from
Oct 23, 2023
Merged

Conversation

JoelOtter
Copy link
Contributor

  • handle profileViewDetailed
  • implement get_activities_response, with parsing for replies, likes and shares
  • add IDs to reposts
  • implement user_to_actor and get_comment

There's likely some weirdness/inconsistency here, particularly due to a) my initial confusion about AS activities vs. objects and b) I'm probably not using tag URIs everywhere I should be. It does seem to work though!

Tests incoming.

@snarfed
Copy link
Owner

snarfed commented Jul 24, 2023

Awesome! Looking now. And yeah sorry about the whole tag URI thing, seemed like a good idea at the time, something like 12 yrs ago. 🤷 Definitely not necessary in my book any more, hopefully didn't get in the way too much.

@snarfed
Copy link
Owner

snarfed commented Jul 24, 2023

Oh btw the vendored lexicons in lexicons/ are out of date, I assume we should update them? I can do that in a separate PR if you want.

Copy link
Owner

@snarfed snarfed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

My main thought is, let's just go with at:// URIs for ids, and ditch tag URIs? Is there anything in granary or Bridgy that required them?

Otherwise a few minor suggestions, but nothing else big.

if type in ('app.bsky.actor.defs#profileView',
'app.bsky.actor.defs#profileViewBasic'):
images = [{'url': obj.get('avatar')}]
banner = obj.get('banner')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was removing this intentional? Assuming not, let's keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed this because it's not used anywhere and it also doesn't parse properly- image is expected to be an object with a url member, not an array. I used mastodon.pyL490 as reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking if my understanding is actually correct here as I think I've been looking at the AS2 spec by mistake. Is image in AS on a 'person' object supposed to be an array, an object, or either? I notice a lot of the Bluesky tests assume an array and that seems to match the as.json in the test data but it doesn't match what the Mastodon silo does so I'm a bit confused!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah yeah AS1 vs AS2 can be confusing. The two useful spec links for AS1 are https://activitystrea.ms/specs/json/schema/activity-schema.html and https://activitystrea.ms/specs/json/1.0/ . granary also extends/overloads AS1 a bit here and there when we've outgrown it, and we don't have an explicity catalog of those extensions.

In this case, technically AS1 says image is an object with a url field (https://activitystrea.ms/specs/json/1.0/#object , https://activitystrea.ms/specs/json/1.0/#media-link ), and/or a bare string URL (https://activitystrea.ms/specs/json/schema/activity-schema.html#rfc.section.4.6 ). We sometimes need an object to have multiple images, though, so we support both lists and singular objects. Examples:

(AS2 does this too, all of the properties not marked "Functional" can be either singular or plural. Whee! https://www.w3.org/TR/activitystreams-vocabulary/#h-properties )

We use a few helpers to deal with single vs plural, eg:

  • util.get_list always returns a list
  • util.get_list returns the first value
  • as1.get_object always returns an object, as1.get_objects always a list. bare string values get wrapped in an object as the id value
  • as1.object_urls returns a de-duped list of the url (singular) + urls (plural) values
    etc...

Sorry this isn't simpler!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also AS1 is only used internally, so we get to be a bit looser with it. We used to use it externally with Google+ and OStatus, and maybe others, but those are all long dead.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool cool. I think I'll need to make a corresponding change in bridgy itself:

diff --git a/pages.py b/pages.py
index 7d4098a2..c17f6779 100644
--- a/pages.py
+++ b/pages.py
@@ -197,10 +197,12 @@ def user(site, id):
         r.response['content'] = f'{r.actor.get("displayName") or ""} {phrase}.'
 
       # convert image URL to https if we're serving over SSL
-      image_url = r.actor.setdefault('image', {}).get('url')
+      # account for fact image might be a list
+      image = util.get_first(r.actor, 'image', {})
+      image_url = image.get('url')
       if image_url:
-        r.actor['image']['url'] = util.update_scheme(image_url, request)
-
+        image['url'] = util.update_scheme(image_url, request)
+        r.actor['image'] = image
       # generate original post links
       r.links = process_webmention_links(r)
       r.original_links = [util.pretty_link(url, new_tab=True)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! You probably want image_url = util.get_url(image) to handle bare string values as well as objects, otherwise looks good.

@@ -596,6 +593,7 @@ def to_as1(obj, type=None):
reason = obj.get('reason')
if reason and reason.get('$type') == 'app.bsky.feed.defs#reasonRepost':
ret = {
'id': obj.get('post').get('viewer').get('repost'),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding , {} to these gets to make them fully forgiving? eg:

Suggested change
'id': obj.get('post').get('viewer').get('repost'),
'id': obj.get('post', {}).get('viewer', {}).get('repost'),

@@ -645,6 +643,10 @@ class Bluesky(Source):
BASE_URL = 'https://bsky.app'
NAME = 'Bluesky'
TRUNCATE_TEXT_LENGTH = 300 # TODO: load from feed.post lexicon
URL_CANONICALIZER = util.UrlCanonicalizer(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is unused here, let's move it to Bridgy? The ones in meetup.py and twitter.py are only there because they're actually used in granary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

thread = resp.get('thread')
if thread:
ret = self._recurse_replies(thread)
return sorted(ret, key = lambda thread: thread.get('post').get('record').get('createdAt'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re , {}

granary/bluesky.py Show resolved Hide resolved
Returns: dict, AS1 like activity
"""
assert verb in ('like', 'share')
label = 'favorited' if verb == 'like' else 'reblogged'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, my fault originally, hoping to gradually standardize these terms now

Suggested change
label = 'favorited' if verb == 'like' else 'reblogged'
label = 'liked' if verb == 'like' else 'reposted'

@@ -709,9 +711,9 @@ def post_url(cls, handle, tid):
return f'{cls.user_url(handle)}/post/{tid}'

def get_activities_response(self, user_id=None, group_id=None, app_id=None,
activity_id=None, fetch_replies=False,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these defaults as is? Esp since fetch_* all incur extra API calls.

Args:
uri: string, post uri

Returns: array, Bluesky app.bsky.feed.defs#threadViewPost
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit

Suggested change
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost
Returns: list, Bluesky app.bsky.feed.defs#threadViewPost

def _recurse_replies(self, thread):
"""
Recurses through a Bluesky app.bsky.feed.defs#threadViewPost
and returns its replies as an array.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit

Suggested change
and returns its replies as an array.
and returns its replies as an list.

Args:
thread: dict, Bluesky app.bsky.feed.defs#threadViewPost

Returns: array, Bluesky app.bsky.feed.defs#threadViewPost
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit

Suggested change
Returns: array, Bluesky app.bsky.feed.defs#threadViewPost
Returns: list, Bluesky app.bsky.feed.defs#threadViewPost

@JoelOtter
Copy link
Contributor Author

My main thought is, let's just go with at:// URIs for ids, and ditch tag URIs? Is there anything in granary or Bridgy that required them?

Need to double check but I think yes, actually - the /comment, /like etc endpoints do actually parse tag URIs iirc

@snarfed
Copy link
Owner

snarfed commented Jul 27, 2023

Need to double check but I think yes, actually - the /comment, /like etc endpoints do actually parse tag URIs iirc

Ah! Right. Or the reverse, they extract post id(s) out of the URL and generate tag URIs from them to look up stored Responses, right? eg:

https://github.com/snarfed/bridgy/blob/8550bc286411bc4cfd6b541211cc4daa023c62db/handlers.py#L226

Would this work if granary.bluesky.Bluesky.tag_uri was implemented to just be the identity and return the post id (which is an at:// URI) as is? Method name isn't ideal, but otherwise...?

@JoelOtter
Copy link
Contributor Author

Oh btw the vendored lexicons in lexicons/ are out of date, I assume we should update them? I can do that in a separate PR if you want.

yes please, not familiar with that bit of the codebase!

@JoelOtter
Copy link
Contributor Author

Need to double check but I think yes, actually - the /comment, /like etc endpoints do actually parse tag URIs iirc

Ah! Right. Or the reverse, they extract post id(s) out of the URL and generate tag URIs from them to look up stored Responses, right? eg:

https://github.com/snarfed/bridgy/blob/8550bc286411bc4cfd6b541211cc4daa023c62db/handlers.py#L226

Would this work if granary.bluesky.Bluesky.tag_uri was implemented to just be the identity and return the post id (which is an at:// URI) as is? Method name isn't ideal, but otherwise...?

Tried this - unfortunately some places e.g. /propagate on the backend do use util.parse_tag_uri which prevents this from working.

Traceback (most recent call last):
  File "/Users/joel/Code/bridgy/tasks.py", line 581, in do_send_webmentions
    source_url = self.source_url(target)
  File "/Users/joel/Code/bridgy/tasks.py", line 795, in source_url
    _, response_id = util.parse_tag_uri(self.entity.key.string_id())

@snarfed
Copy link
Owner

snarfed commented Aug 1, 2023

yes please, not familiar with that bit of the codebase!

Done! Updated them in main.

Tried this - unfortunately some places e.g. /propagate on the backend do use util.parse_tag_uri which prevents this from working.

Ugh ok. What do you think we should do? Considering all the options, at:// URIs still seem like the right idea, so maybe we should relax the expectation that ids are all tag URIs, and abstract their parsing into something like a new models.Source.uri_id_to_post_id() in Bridgy (open to other name ideas) and then override that in Bridgy's bluesky.Bluesky. Not sure if I feel strongly though. Thoughts?

@JoelOtter
Copy link
Contributor Author

Tried this - unfortunately some places e.g. /propagate on the backend do use util.parse_tag_uri which prevents this from working.

Ugh ok. What do you think we should do? Considering all the options, at:// URIs still seem like the right idea, so maybe we should relax the expectation that ids are all tag URIs, and abstract their parsing into something like a new models.Source.uri_id_to_post_id() in Bridgy (open to other name ideas) and then override that in Bridgy's bluesky.Bluesky. Not sure if I feel strongly though. Thoughts?

Honestly my usual approach with this stuff, and my gut feeling here, is that consistency is better than reducing complexity in one specific bit of the codebase. In addition I think this would add additional abstraction across the whole codebase for the sake of reducing complexity in one small bit of it. IMO it's not worth the tradeoff right now - we'd be better getting rid of tag URLs wholesale across all of Granary later down the line (aware that would come with a migration necessity)

@snarfed
Copy link
Owner

snarfed commented Aug 2, 2023

Hmm, ok. So are you then proposing that we use tag URIs for Bluesky ids in granary/Bridgy? And convert back and forth to at:// URIs as necessary?

@JoelOtter
Copy link
Contributor Author

The tag URI contains the AT URI doesn't it? Or have I misunderstood what those tag URIs actually are? Worth noting I don't think we could use raw AT URIs everywhere anyway as e.g. a like doesn't have an AT URI

@snarfed
Copy link
Owner

snarfed commented Aug 2, 2023

Hmm, really? I think everything has an AT URI...? They may not always be exposed at the Bluesky API level, but we can probably still construct them from the user's DID, the collection, and the like ID/TID.

Regardless, as for tag URI, we make up each silo's tag URI format ourselves, so we can definitely just stick the whole AT URI in there if we want, true!

@JoelOtter
Copy link
Contributor Author

Yeah currently we create likes for bluesky by doing {post id}_liked_by_{user id}, then send that through the tag URI function, so if we were determined to strip tag URIs out we could just continue doing that. It's not a real AT proto URI though!

@snarfed
Copy link
Owner

snarfed commented Aug 2, 2023

Ah, ok. I remember you had the same problem with reposts, snarfed/bridgy#1453 (comment) , but eventually found their id. Looking at https://github.com/snarfed/atproto/blob/main/lexicons/app/bsky/feed/like.json , tid is in there, so a like's URI would be something like at://[DID]/app.bsky.feed.like/[TID]...but looking at https://github.com/snarfed/atproto/blob/main/lexicons/app/bsky/feed/getLikes.json , all we get from getLikes on a per like basis is actor and some timestamps? Ugh.

If we can find the like URI or TID anywhere else, it'd be ideal to standardize on AT URIs for everything (wrapped in tag URIs where necessary), but ok if it's not possible. I'll also ask on their developer Discord, https://discord.com/invite/3srmDsHSZJ , feel free to join me.

@snarfed
Copy link
Owner

snarfed commented Aug 2, 2023

@ericvolp12 is awesome, he jumped on this and added uri to the getLikes response! bluesky-social/atproto#1422

@JoelOtter
Copy link
Contributor Author

Fab :) We'll need it for getRepostedBy too - have asked!

I guess it's a lil dependent on timelines then in terms of when this makes it into Bluesky itself, but I have tests to write in the meantime. Does this change our thinking about whether or not to refactor the tag URL stuff, do you think?

@snarfed
Copy link
Owner

snarfed commented Aug 2, 2023

We should definitely try for AT URIs for all Bluesky objects, but I think that's orthogonal to tag URIs. Here, I'm open to either wrapping AT URIs in tag URIs, or support both as in #571 (comment) , but if we did decide to migrate everything off tag URIs, that should probably be a separate project.

@JoelOtter
Copy link
Contributor Author

Makes sense! I think I'm actually inclined more towards keeping tag URIs as it gives us a degree of consistency between silos that might otherwise be a bit hard to grok (might be a me problem but I find navigating unfamiliar inheritance hierarchies quite difficult)

@JoelOtter
Copy link
Contributor Author

A question about handles - there's a test failing, test_from_as1_post_with_author, which fails on handle format - bsky.app/profile/alice.com vs alice.com. I think the correct handle here is alice.com? My reading of the spec is that handles should be a domain format, e.g. alice.bsky.social or alice.com. What do you think?

@snarfed
Copy link
Owner

snarfed commented Aug 6, 2023

Yes! Bluesky handles are all domains.

@JoelOtter
Copy link
Contributor Author

Ok thanks, I’ll need to change the existing implementation then :)

@snarfed
Copy link
Owner

snarfed commented Aug 6, 2023

Just to check though, handle would get translated to/from AS1 username, right? id would be DID, at least for actors, and url would stay the full URL that's viewable in a browser?

@JoelOtter
Copy link
Contributor Author

Yeah that’s my understanding!

@snarfed
Copy link
Owner

snarfed commented Aug 14, 2023

Hey @JoelOtter! Just checking in. Anything I can help with here?

@JoelOtter
Copy link
Contributor Author

Hey, sorry, been away from my computer for a few days (turned 30!). Honestly I'm kind of struggling with this handle stuff - I branched off main to try and just do it against the existing stuff without my changes complicating matters but I still can't untangle it. I think there's a mixup between my assumption of everything being Bluesky and what the Granary code actually is which is intended to be forward-compatible with a future federated version of the service, hence the URLs sometimes being e.g. alice.com instead of bsky.app/profile/alice.com, that kind of thing. Would love some help on that if you have time to look at it.

@snarfed
Copy link
Owner

snarfed commented Aug 15, 2023

Definitely understood! You're right about the future proofing for federation, and even right now since the web frontend is separate from the PDS (backend). https://bsky.app/ is indeed the (official) web frontend, but there are already others. Likewise, https://bsky.social/ is the (official) PDS, but we'll start seeing others in prod as soon as they turn on federation.

If I remember right, we're going with handles in username, DIDs and at:// URIs in id, and https://bsky.app/... URLs in url, right?

Apart from that, doing the handle work on its own definitely makes sense. Want to jump on a call sometime? You're in the UK, right? Happy to carve out some time in the morning PT sometime if that would help.

@JoelOtter
Copy link
Contributor Author

Yeah agreed on those points - the issue I think is that we have code to try and interpret handles from usernames or URLs, which is going to be very fragile no matter which way we slice it. I'd suggest we just require the username contain a valid handle and error out if it's missing - think that would simplify things quite a bit.

I am in the UK! Though things are a bit manic throughout this summer just in terms of having lots of stuff on. I could do tomorrow (Saturday) evening if that's not too short-notice?

@snarfed
Copy link
Owner

snarfed commented Aug 18, 2023

@JoelOtter Yes! Tomorrow (Sat) works. 7-10p UK time (11a-2p PT) is my best window, but I can meet as early as 4p UK time (8a PT). Lmk when works for you, I can send an invite to the email in your GitHub profile.

@JoelOtter
Copy link
Contributor Author

7pm UK time works well for me, let’s do it :)

@JoelOtter
Copy link
Contributor Author

Thanks :) If you have the time, a few suggestions of tests that are currently missing would help me get the ball rolling.

@snarfed
Copy link
Owner

snarfed commented Sep 20, 2023

Sure! Off the top of my head:

  • get_activities with:
    • default params
    • fetch_shares=True, with mocked out API response with some reposts
    • fetch_replies=True, with similar mocked API response
    • fetch_replies=True, with replies to replies to exercise _recurse_replies
    • fetch_likes=True, same
  • get_comment

@JoelOtter
Copy link
Contributor Author

I'm a bit confused by how Python's dependency resolution works WRT GitHub repos - from what I can tell I appear to be getting a really old version of lexrpc. Is there a recommended workflow for this - should I clone that repo?

@snarfed
Copy link
Owner

snarfed commented Oct 7, 2023

Ugh yeah I had to fix this in the bridgy-fed repo recently, I think the fix is to manually uninstall it first in the circle config.

@snarfed
Copy link
Owner

snarfed commented Oct 7, 2023

Ah it was in arroba: snarfed/arroba@07b18ea

@JoelOtter
Copy link
Contributor Author

JoelOtter commented Oct 8, 2023

Hmm, how about locally? Does this mean it's a pip-cache issue? I could try clearing out that cache - I'd assumed there was one cache per venv but perhaps that is not the case

UPDATE: Getting the same thing when creating a clean venv and running pip with --no-cache-dir

UPDATE THE SECOND: Replacing lexrpc with a local version (latest main) makes it work again.

I'm assuming this is a problem with my Python setup rather than with the project itself!

@snarfed
Copy link
Owner

snarfed commented Oct 8, 2023

Ugh Python envs are the worst. Sorry for the trouble. Glad you got it working!

@JoelOtter JoelOtter force-pushed the main branch 2 times, most recently from 77caed3 to 5725ee0 Compare October 21, 2023 11:56
@JoelOtter
Copy link
Contributor Author

Don't really understand why ATproto appear to have deleted half their docs </grumble>

@JoelOtter
Copy link
Contributor Author

OK @snarfed I think this might be ready for review! Have added the tests. It needs rebased on #616 once that goes in

@snarfed
Copy link
Owner

snarfed commented Oct 22, 2023

Awesome! Will look.

Copy link
Owner

@snarfed snarfed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Noticed a few minor things that can be removed, but otherwise this seems right, let's ship it.

'actor': obj.get('author'),
'object': obj,
'objectType': 'activity',
'context': {'inReplyTo': obj.get('inReplyTo')},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to drop this, it's mostly vestigial, we don't really depend on it in Bridgy or anywhere else any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which bit, sorry? The non-activity case? What should I do instead, just replace the whole method with

return to_as1(post, type='app.bsky.feed.defs#feedViewPost')

?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, just the context field!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha - done :)

def _make_share(self, post, actor):
return self._make_like_or_share(post, actor, 'share')

def _make_like_or_share(self, post, actor, verb):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have like/share support in to_as1, so we may want to use that eventually instead. Low priority though, definitely doesn't have to be in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I might leave this for now just until I untangle what we need to do with actors/IDs/tags and whether that stuff ought to live in to_as1 or out here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Yeah I think to_as1 can do most of these, especially if we start using its existing did and handle kwargs for actor, but we can do that later.

@@ -58,9 +58,15 @@
'description': 'hi there',
}

POST_OBJ_AS = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops!

@patch('requests.get')
def test_get_comment(self, mock_get):
mock_get.return_value = requests_response({
'$type': 'app.bsky.feed.defs#threadViewPost',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop this and replies, they're supposed to be inside thread instead of outside it, like in test_get_activities_with_replies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good spot!

@snarfed
Copy link
Owner

snarfed commented Oct 23, 2023

Looks great! Merging now. Thanks for sticking with this!

@snarfed snarfed merged commit 514f8ce into snarfed:main Oct 23, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants