feat: P2P ACP #3317

fredcarle · 2024-12-11T18:47:32Z

Relevant issue(s)

Resolves #3179

Description

This PR brings ACP functionality to the P2P system. This ensures that permissioned collections do not share blocks of registered documents without ensuring the requesting node has access. This is bypassed on collection replication because the node sending the blocks is sending to peers explicitly.

Tasks

I made sure the code is well commented, particularly hard-to-understand areas.
I made sure the repository-held documentation is changed accordingly.
I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Some new and updated integration tests.

Specify the platform(s) on which this was tested:

MacOS

codecov · 2024-12-11T19:04:42Z

Codecov Report

Attention: Patch coverage is 65.15152% with 92 lines in your changes missing coverage. Please review.

Project coverage is 77.85%. Comparing base (e1502c5) to head (eae46e0).

Files with missing lines	Patch %	Lines
net/server.go	59.85%	40 Missing and 15 partials ⚠️
net/grpc.go	45.45%	11 Missing and 1 partial ⚠️
net/client.go	52.63%	7 Missing and 2 partials ⚠️
acp/identity/identity.go	63.64%	6 Missing and 2 partials ⚠️
internal/db/permission/check.go	88.89%	2 Missing and 1 partial ⚠️
net/errors.go	0.00%	2 Missing ⚠️
internal/db/db.go	85.71%	1 Missing ⚠️
net/sync_dag.go	80.00%	0 Missing and 1 partial ⚠️
node/node.go	93.75%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3317      +/-   ##
===========================================
- Coverage    78.18%   77.85%   -0.33%     
===========================================
  Files          382      382              
  Lines        35448    35663     +215     
===========================================
+ Hits         27712    27763      +51     
- Misses        6106     6226     +120     
- Partials      1630     1674      +44

Flag	Coverage Δ
all-tests	`77.85% <65.15%> (-0.33%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
http/auth.go	`57.89% <100.00%> (-7.73%)`	⬇️
net/peer.go	`76.59% <100.00%> (+0.23%)`	⬆️
internal/db/db.go	`66.80% <85.71%> (+0.28%)`	⬆️
net/sync_dag.go	`60.00% <80.00%> (+0.74%)`	⬆️
node/node.go	`60.43% <93.75%> (+3.07%)`	⬆️
net/errors.go	`57.14% <0.00%> (-9.52%)`	⬇️
internal/db/permission/check.go	`87.14% <88.89%> (+1.10%)`	⬆️
acp/identity/identity.go	`67.74% <63.64%> (+0.17%)`	⬆️
net/client.go	`83.64% <52.63%> (-16.36%)`	⬇️
net/grpc.go	`56.00% <45.45%> (-7.33%)`	⬇️
... and 1 more

... and 18 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1502c5...eae46e0. Read the comment docs.

islamaliev

Look good overall. Good job, Fred!

I left some comments and suggestions.

In general I think it would be also nice to have it written down somewhere how the new bitswap-level ACP works together with the pubsub KMS. Otherwise it's not clear when to use what.

internal/db/permission/check.go

islamaliev · 2024-12-12T06:46:52Z

net/acp.go

+type ACP interface {
+	acp.ACP
+	// GetCollections returns the list of collections according to the given options.
+	GetCollections(ctx context.Context, opts client.CollectionFetchOptions) ([]client.Collection, error)
+	// GetIndentityToken returns an identity token for the given audience.
+	GetIdentityToken(ctx context.Context, audience immutable.Option[string]) ([]byte, error)
+	// GetNodeIdentity returns the node's public raw identity.
+	GetNodeIdentity(ctx context.Context) (immutable.Option[identity.PublicRawIdentity], error)
+}


suggestion: GetCollections has nothing to do with ACP. I'd suggest either to rename this interface to something more suitable. I can't suggest anything elegant here other then CollectionsAndACPProvider.

Or you can segregate these interfaces, i.e. move GetCollection to another CollectionsProvider interface.

Update: after seeing how it's being used. I don't think it makes much sense to make this interface embed acp.ACP. You can pass acp.ACP to a new peer as is without wrapping it with NewNetACP and just instantiate a struct that implements this interface that provides node (or DB) information. These 2 have really no reason to be together other then the Peer needs this information, but it can get if from 2 different objects.

In my opinion wrappers make it a bit harder to understand the execution flow. So if would could avoid using an additional layer, I think it would be better.

I'll try to improve this.

Let me know what you think of the change.

net/server.go

tests/integration/acp.go

net/acp.go

tests/integration/acp/p2p/create_test.go

islamaliev · 2024-12-12T08:02:43Z

tests/integration/acp/p2p/create_test.go

+	test := testUtils.TestCase{
+		Description: "Test acp, p2p create private documents on different nodes, with source-hub",
+		SupportedACPTypes: immutable.Some(
+			[]testUtils.ACPType{
+				testUtils.SourceHubACPType,
+			},
+		),
+		Actions: []any{


question: so the only thing that makes this test different from the KMS ones is that you don't activate KMS here, right?

That seems to be the most important difference yes.

net/server.go

islamaliev · 2024-12-12T08:27:57Z

net/peer.go

@@ -149,6 +150,10 @@ func NewPeer(
 		return nil, err
 	}

+	bswapnet := network.NewFromIpfsHost(h, ddht)
+	bswap := bitswap.New(ctx, bswapnet, blockstore, bitswap.WithPeerBlockRequestFilter(p.server.hasAccess))


question: don't we want to have on option to turn it off? Maybe uses would want just purely pubsub KMS or Orbis. Might also be usefull for testing to be able to turn things on and off.

I don't think we want to turn this off no. It's not a replacement for KMS or Orbis. If users want to turn it off they can turn off ACP. Having ACP on and this off would be like locking the front door of your house but having a garden level window wide open where anyone can get in. It doesn't really make sense.

internal/db/permission/check.go

shahzadlone · 2024-12-17T17:02:07Z

internal/db/db.go

@@ -104,7 +104,7 @@ func NewDB(
 	acp immutable.Option[acp.ACP],
 	lens client.LensRegistry,
 	options ...Option,
-) (client.DB, error) {
+) (*db, error) {


question: Why was this needed/done?

Ah is it because net.DB doesn't implement client.DB as GetNodeIdentityToken method is missing?

There are a few reasons. One of them is because otherwise we have to make the client.DB API support everything even though some methods may just be useful internally. The maintenance overhead of that is large since we have to update the CLI and HTTP clients and test wrappers every time we update the interface. On top of that, there is no good reason to return an interface if the returned value is always of the same type. It's an unnecessary abstraction.

net.DB doesn't implement client.DB as GetNodeIdentityToken method is missing

This feels regressive, and to me it looks like we are forgetting the lessons learnt before client.DB was implemented by all packages - allowing the 3 clients to diverge (they now diverge in both directions, it is no longer a case of CLI contains embedded+some, CLI is now missing embedded stuff).

Returning a private type here, with public functions not declared in the client package also smashes the reasoning behind the client package in the first place - GetNodeIdentityToken is essentially a hidden public function that lives in an unexpected location, a location that we are very likely to forget about when making breaking changes (think semver). And will unlikely be documented to the same standards as client members.

The multiple clients act as complexity multipliers. The greater the divergence between clients, the greater that multiplier is. This increases our maintenance costs (including documentation) and increases the learning required by users, in way that is far greater than the immediate X lines of code added.

Perhaps we can visualise this with (baseClient being the embedded client, ignoring purge command)(^ meaning to-the-power-of, not bit toggling stuff):

totalComplexity = complexityOfBaseClient*(divergence^numberOfClients)

IMO we need to be very very careful doing stuff like this, and seeing this so soon after another divergence (purge CLI command), makes it feel like we are falling back towards the time-sink that was our 3 un-interfaced, separate clients without making a conscious decision to do so.

I feel quite strongly about this, as I think it can get very expensive if we continue in this direction. But as I am basically on holiday for the next week I don't want to mark this as a todo/request-changes.

Maybe we can have the following:

todo: discuss this in the next standup (I probs wont be there)

I actually did the change because I believe returning an interface where only one concrete type is ever returned is a needlessly restrictive abstraction.

The goal of an interface is to ensure that a given type implements a set of methods. Not to limit that type to the set of methods. There is a set of functionalities that we decided had to be covered by all clients/types (Go, CLI, HTTP) and that's where the interface is useful. If anything changes with those functionalities, it needs to change for all clients.

There are some functionalities that will only ever make sense for one client. Purge for example only really makes sense for a server instance so it would be unnecessary overhead to require it for all clients through the client.DB interface.

What I think we can do is have a discussion on whether we want a given functionality to be part of the interface or not. But I really do think that some divergence between server and embedded is quite ok when it makes sense.

I see the signature of concrete type as an interface, by exposing the maximalist concrete interface you expose consuming code to problems in swapping it out for something else (essentially reducing the usefulness of defining the actual interface-interface).

That aside, IMO interface segregation goes beyond that one liner. By segregating via interfaces you control the number pathways through the codebase and reduce complexity.

We don't want our users to be able to call all our internal funcs, and they probably don't want to see them either.

We end up with a public function that returns the public interface. The user-code that consumes that type only ever directly depends-on/has-access-to the client/interface funcs, instead of a maximal concrete implementation.

The contract/interface of the public NewDB function then also remains identical to all other NewDB functions, allowing them to be used interchangeably without the swap being a breaking change.

I see the signature of concrete type as an interface, by exposing the maximalist concrete interface you expose consuming code to problems in swapping it out for something else (essentially reducing the usefulness of defining the actual interface-interface).

Why would internal/DB be swapped for something else? And if it is, what prevents it from supporting the client.DB interface?

That aside, IMO interface segregation goes beyond that one liner. By segregating via interfaces you control the number pathways through the codebase and reduce complexity.

That one maybe. But I find that overly constraining to the point where one has to create more complex patterns to bypass the restrictions.

We end up with a public function that returns the public interface. The user-code that consumes that type only ever directly depends-on/has-access-to the client/interface funcs, instead of a maximal concrete implementation.

We don't. NewDB is an internal function. Users don't have access to it.

The contract/interface of the public NewDB function then also remains identical to all other NewDB functions, allowing them to be used interchangeably without the swap being a breaking change.

There is only one NewDB function. But even if we created many other DB alternatives, the important part is that those DB types cover the client.DB interface so that our http and cli clients along with the other internal types that need client.DB can use the client.DB methods. It doesn't matter that those other DB types provide more features.

While I agree with the points raised by Andy, I don't think it's worth blocking the PR over. Specially in light of us doing the "moving fast experiment" I happy for this discussion to even resolve outside this PR. Just noting that the helper private function that was being called newDB was already using this signature before (irrelevant to the discussion I know).

Side question: I noticed that since the discussion thread was started the *db was made public (*DB) now, curious for reasons behind that?

We don't. NewDB is an internal function. Users don't have access to it.

FFS lol - I forgot we moved it, sorry. 90% of me bothering about this is because I thought it was still a public func and you were exposing internals to users.

There is only one NewDB function

There are ones for the other 2 clients, and one from the node package no? They may have different names atm, but they are all basically the same func (and should probs have the same name).

Why would internal/DB be swapped for something else?

The callers to the NewDB func(s) may wish to swap them out - this becomes much harder if the type changes and extra/non-interface stuff on the return value is used.

But anyway, this is just internal stuff, the extra exposure does make stuff easier, and we (in theory :)) should really be able to tell what the newly exposed stuff does anyway.

Happy for this convo to be resolved lol

Just adding for historical sake since I think Andy has dropped this (and as shahzad pointed out, we're aiming not to get bogged down during this cycle) - I think adding compile time checks to the db package that make sure the DB type implements the client.DB is all that is really necessary. That along with the downstream callers (internal to the repo) having the option to assign/cast to client.DB for their own safety if they want should cover us for most of our cases.

Fred and I briefly discussed this whole return type a few weeks ago and one thing it raised is we currently don't have a proper "embedded" client flow ever since we moved the db package to internal. The way the integration tests access the embedded flow is to instanciate a node.Node instance, then getting the client.DB instance from it. But we should have a dedicated and public flow for instantiating the client.DB without the complexitity of the node.New constructor as follow up work to this

shahzadlone · 2024-12-17T17:03:30Z

internal/db/db.go

@@ -339,13 +339,20 @@ func (db *db) DeleteDocActorRelationship(
 	return client.DeleteDocActorRelationshipResult{RecordFound: recordFound}, nil
 }

-func (db *db) GetNodeIdentity(context.Context) (immutable.Option[identity.PublicRawIdentity], error) {
+func (db *db) GetNodeIdentity(_ context.Context) (immutable.Option[identity.PublicRawIdentity], error) {


question: What is with the underscore _? Why was this added?

Because just context.Context seems like a mistake. The underscore says that the function parameters are this way because we're following some interface pattern or something like that but we aren't using this parameter within the function.

This is a known go idiom?

It is the same as the other areas we use _ to ignore a returned argument. https://go.dev/ref/spec#Blank_identifier

The only reason why just using context.Context worked is because it was the only parameter in the method. If there was more than one, it would be a syntax error.

Unused function parameters are allowed in Go and that's why we often have ctx context.Context as parameter with ctx never being used. I feel that's misleading. Doing _ context.Context at least tells the reader that this parameter needed to be there but we aren't using it.

shahzadlone · 2024-12-17T17:26:00Z

internal/db/db.go

+func (db *db) GetIdentityToken(_ context.Context, audience immutable.Option[string]) ([]byte, error) {
+	if db.nodeIdentity.HasValue() {
+		return db.nodeIdentity.Value().NewToken(time.Hour*24, audience, immutable.None[string]())
+	}
+	return nil, nil
+}


suggestion: Remove the context.Context arg? doesn't seem to be needed anywhere?

It's a pattern we decided to keep after a discussion we had several months back about removing context. Happy to bring that up for discussion but for now leaving as is.

shahzadlone · 2024-12-17T17:36:02Z

internal/db/permission/check.go

+// - Document is public (unregistered), whether signatured request or not doesn't matter.
+func CheckDocAccessWithIdentityFunc(
+	ctx context.Context,
+	identityFunc func() immutable.Option[acpIdentity.Identity],


question: Why did we need to do this function thing? instead of just immutable.Option[acpIdentity.Identity]

Because in some cases in doesn't make sense to do an extra network call for the identity before knowing if the document is even registered.

Can you explain a bit more? Which document is registered check are we talking about? and which network call for identity?

In the net package, when we want to check if a peer has access to a doc we have to request the identity from the node requesting. That's a network call. If we request it before checking that the document is registered, then we might have done that network call for nothing. So the order of priority should be checking that the doc is registered and then requesting the identity to check if it has access.

Is the identity not included from the node requesting from the get go? Maybe I am missing something, why does it need to be requested again?

I'd also like clarification here :)

net/grpc.go

node/node.go

internal/db/permission/check.go

net/server.go

shahzadlone · 2024-12-17T18:25:14Z

net/server.go

+	)
+	if err != nil {
+		log.ErrorE("Failed to check access", err)
+		return true


question: If access checking fails we will assume they have access?

it's not they. It's self. It's a best effort check. If the check fails, the node receiving the request can make the decision.

I still am not sure I follow, if the local node receiving (self) fails to check weather it has access then the check will assume it does have access and the sender node can still decide if they want to share or not?

Yes. The initial block received is just a composite so there is no protected info within it. The node receiving checks if it has access to the document to see if it's worth requesting the other block. Ultimately, the node sending is responsible for the verification.

Ah I got it now so the best effort check is there for the receiver to not make unnecessary requests (for it's own sake to not waste effort if it doesn't have access to other blocks). I forget if this is documented somewhere already, but might be worth it to have a one liner comment explaining this near here somewhere.

net/server.go

AndrewSisley · 2024-12-18T16:21:33Z

net/server.go

+
+// trySelfHasAccess checks if the local node has access to the given block.
+//
+// This is a best-effort check and returns true unless we explicitly find that we don't have access.


todo: Please document why we are even bothering to call/define this function if a 'best effort' is correct enough.

If the reason is that a proper check will be carried out somewhere else (e.g. another node), please also document why false-negatives are either impossible, or are tolerable, as it looks like we could refuse to process a block that should actually be processed.

as it looks like we could refuse to process a block that should actually be processed.

I think you probably meant the opposite?

This method is for self check of access so it will only block processing if it can confirm no granted access. In any other case it will continue with the sync process.

I think you probably meant the opposite?

Nope.

This function looks like it can falsely claim that we dont have access (e.g. local node doesn't have access (yet), but the other remote node does), in which case the block will not be processed.

This function looks like it can falsely claim that we dont have access (e.g. local node doesn't have access (yet), but the other remote node does), in which case the block will not be processed.

Doesn't have access yet is a correct claim that we don't have access. It doesn't matter what the remote node has access to. Trying to process a block that we know we don't have access will just result in the remote node not giving us access anyways.

Ah okay - can you add that to the func docs, along with a bit detailing as to why a 'best-effort' is relevant/okay here?

It is documented already no?

Ah yes it has been since I opened this thread, thanks! LGTM :)

nasdf

Implementation looks solid. One blocking issue to resolve.

nasdf · 2024-12-30T17:31:52Z

net/peer.go

 	bus *event.Bus,
+	acp immutable.Option[acp.ACP],


thought: would it be safer to use the same ACP implementation as the DB? For example: db.GetACP() instead of the parameter here so that there's no possibility for misconfiguration.

nasdf · 2024-12-30T17:35:27Z

net/server.go

@@ -447,3 +460,135 @@ func (s *server) SendPubSubMessage(
 	}
 	return t.Publish(ctx, data)
 }
+
+// hasAccess checks if the requesting peer has access to the given ci.


typo: cid instead of ci

nasdf · 2024-12-30T17:59:35Z

net/server.go

+	}
+
+	// If the requesting peer is in the replicators list for that collection, then they have access.
+	if peerList, ok := s.replicators[cols[0].SchemaRoot()]; ok {


todo: is this thread safe?

shahzadlone

LGTM, thanks for answering all the questions, just left some minor suggestions. Please do resolve the on going discussions before merge from others.

todo: Please make the PR title more descriptive right now it only says P2P ACP which is quite vague for example it can be confused with the initial P2P acp work that has been done for #2366

fredcarle added the feature New feature or request label Dec 11, 2024

fredcarle requested a review from a team December 11, 2024 18:47

fredcarle self-assigned this Dec 11, 2024

fredcarle force-pushed the fredcarle/feat/3179-p2p-acp branch from 25d14f1 to 3de2aaf Compare December 11, 2024 20:40

islamaliev reviewed Dec 12, 2024

View reviewed changes

shahzadlone reviewed Dec 12, 2024

View reviewed changes

internal/db/permission/check.go Outdated Show resolved Hide resolved

fredcarle requested review from shahzadlone and islamaliev December 12, 2024 22:44

fredcarle force-pushed the fredcarle/feat/3179-p2p-acp branch 3 times, most recently from d35d257 to eae46e0 Compare December 13, 2024 02:36

shahzadlone reviewed Dec 17, 2024

View reviewed changes

node/node.go Show resolved Hide resolved

shahzadlone reviewed Dec 17, 2024

View reviewed changes

internal/db/permission/check.go Outdated Show resolved Hide resolved

shahzadlone reviewed Dec 17, 2024

View reviewed changes

AndrewSisley reviewed Dec 18, 2024

View reviewed changes

net/server.go Outdated Show resolved Hide resolved

AndrewSisley reviewed Dec 18, 2024

View reviewed changes

fredcarle added 6 commits December 20, 2024 14:08

add P2P ACP

c1a0e05

remove unused function

de1bf0e

apply feedback

6ada932

change db to internal public DB

18f4f13

clean up for net.DB interface

ace6e9b

apply feedback

137e97e

fredcarle force-pushed the fredcarle/feat/3179-p2p-acp branch from eae46e0 to 137e97e Compare December 20, 2024 20:33

nasdf requested changes Dec 30, 2024

View reviewed changes

shahzadlone approved these changes Dec 30, 2024

View reviewed changes

feat: P2P ACP #3317

Are you sure you want to change the base?

feat: P2P ACP #3317

Conversation

fredcarle commented Dec 11, 2024

Relevant issue(s)

Description

Tasks

How has this been tested?

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

islamaliev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shahzadlone Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nasdf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shahzadlone left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2024 •

edited

Loading

AndrewSisley Dec 18, 2024 •

edited

Loading

shahzadlone Dec 30, 2024 •

edited

Loading

AndrewSisley Dec 18, 2024 •

edited

Loading