-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: P2P ACP #3317
base: develop
Are you sure you want to change the base?
feat: P2P ACP #3317
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3317 +/- ##
===========================================
- Coverage 78.18% 77.85% -0.33%
===========================================
Files 382 382
Lines 35448 35663 +215
===========================================
+ Hits 27712 27763 +51
- Misses 6106 6226 +120
- Partials 1630 1674 +44
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 18 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
25d14f1
to
3de2aaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good overall. Good job, Fred!
I left some comments and suggestions.
In general I think it would be also nice to have it written down somewhere how the new bitswap-level ACP works together with the pubsub KMS. Otherwise it's not clear when to use what.
net/acp.go
Outdated
type ACP interface { | ||
acp.ACP | ||
// GetCollections returns the list of collections according to the given options. | ||
GetCollections(ctx context.Context, opts client.CollectionFetchOptions) ([]client.Collection, error) | ||
// GetIndentityToken returns an identity token for the given audience. | ||
GetIdentityToken(ctx context.Context, audience immutable.Option[string]) ([]byte, error) | ||
// GetNodeIdentity returns the node's public raw identity. | ||
GetNodeIdentity(ctx context.Context) (immutable.Option[identity.PublicRawIdentity], error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: GetCollections
has nothing to do with ACP. I'd suggest either to rename this interface to something more suitable. I can't suggest anything elegant here other then CollectionsAndACPProvider
.
Or you can segregate these interfaces, i.e. move GetCollection
to another CollectionsProvider
interface.
Update: after seeing how it's being used. I don't think it makes much sense to make this interface embed acp.ACP
. You can pass acp.ACP
to a new peer as is without wrapping it with NewNetACP
and just instantiate a struct that implements this interface that provides node (or DB) information. These 2 have really no reason to be together other then the Peer
needs this information, but it can get if from 2 different objects.
In my opinion wrappers make it a bit harder to understand the execution flow. So if would could avoid using an additional layer, I think it would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to improve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think of the change.
test := testUtils.TestCase{ | ||
Description: "Test acp, p2p create private documents on different nodes, with source-hub", | ||
SupportedACPTypes: immutable.Some( | ||
[]testUtils.ACPType{ | ||
testUtils.SourceHubACPType, | ||
}, | ||
), | ||
Actions: []any{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: so the only thing that makes this test different from the KMS ones is that you don't activate KMS here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to be the most important difference yes.
net/peer.go
Outdated
@@ -149,6 +150,10 @@ func NewPeer( | |||
return nil, err | |||
} | |||
|
|||
bswapnet := network.NewFromIpfsHost(h, ddht) | |||
bswap := bitswap.New(ctx, bswapnet, blockstore, bitswap.WithPeerBlockRequestFilter(p.server.hasAccess)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: don't we want to have on option to turn it off? Maybe uses would want just purely pubsub KMS or Orbis. Might also be usefull for testing to be able to turn things on and off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to turn this off no. It's not a replacement for KMS or Orbis. If users want to turn it off they can turn off ACP. Having ACP on and this off would be like locking the front door of your house but having a garden level window wide open where anyone can get in. It doesn't really make sense.
d35d257
to
eae46e0
Compare
internal/db/db.go
Outdated
@@ -104,7 +104,7 @@ func NewDB( | |||
acp immutable.Option[acp.ACP], | |||
lens client.LensRegistry, | |||
options ...Option, | |||
) (client.DB, error) { | |||
) (*db, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why was this needed/done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah is it because net.DB
doesn't implement client.DB
as GetNodeIdentityToken
method is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few reasons. One of them is because otherwise we have to make the client.DB API support everything even though some methods may just be useful internally. The maintenance overhead of that is large since we have to update the CLI and HTTP clients and test wrappers every time we update the interface. On top of that, there is no good reason to return an interface if the returned value is always of the same type. It's an unnecessary abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net.DB doesn't implement client.DB as GetNodeIdentityToken method is missing
This feels regressive, and to me it looks like we are forgetting the lessons learnt before client.DB
was implemented by all packages - allowing the 3 clients to diverge (they now diverge in both directions, it is no longer a case of CLI contains embedded+some, CLI is now missing embedded stuff).
Returning a private type here, with public functions not declared in the client
package also smashes the reasoning behind the client
package in the first place - GetNodeIdentityToken
is essentially a hidden public function that lives in an unexpected location, a location that we are very likely to forget about when making breaking changes (think semver). And will unlikely be documented to the same standards as client
members.
The multiple clients act as complexity multipliers. The greater the divergence between clients, the greater that multiplier is. This increases our maintenance costs (including documentation) and increases the learning required by users, in way that is far greater than the immediate X lines of code added.
Perhaps we can visualise this with (baseClient being the embedded client, ignoring purge command)(^
meaning to-the-power-of, not bit toggling stuff):
totalComplexity = complexityOfBaseClient*(divergence^numberOfClients)
IMO we need to be very very careful doing stuff like this, and seeing this so soon after another divergence (purge
CLI command), makes it feel like we are falling back towards the time-sink that was our 3 un-interfaced, separate clients without making a conscious decision to do so.
I feel quite strongly about this, as I think it can get very expensive if we continue in this direction. But as I am basically on holiday for the next week I don't want to mark this as a todo/request-changes.
Maybe we can have the following:
- todo: discuss this in the next standup (I probs wont be there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually did the change because I believe returning an interface where only one concrete type is ever returned is a needlessly restrictive abstraction.
The goal of an interface is to ensure that a given type implements a set of methods. Not to limit that type to the set of methods. There is a set of functionalities that we decided had to be covered by all clients/types (Go, CLI, HTTP) and that's where the interface is useful. If anything changes with those functionalities, it needs to change for all clients.
There are some functionalities that will only ever make sense for one client. Purge for example only really makes sense for a server instance so it would be unnecessary overhead to require it for all clients through the client.DB interface.
What I think we can do is have a discussion on whether we want a given functionality to be part of the interface or not. But I really do think that some divergence between server and embedded is quite ok when it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the signature of concrete type as an interface, by exposing the maximalist concrete interface you expose consuming code to problems in swapping it out for something else (essentially reducing the usefulness of defining the actual interface-interface).
That aside, IMO interface segregation goes beyond that one liner. By segregating via interfaces you control the number pathways through the codebase and reduce complexity.
We don't want our users to be able to call all our internal funcs, and they probably don't want to see them either.
We end up with a public function that returns the public interface. The user-code that consumes that type only ever directly depends-on/has-access-to the client/interface funcs, instead of a maximal concrete implementation.
The contract/interface of the public NewDB
function then also remains identical to all other NewDB
functions, allowing them to be used interchangeably without the swap being a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the signature of concrete type as an interface, by exposing the maximalist concrete interface you expose consuming code to problems in swapping it out for something else (essentially reducing the usefulness of defining the actual interface-interface).
Why would internal/DB
be swapped for something else? And if it is, what prevents it from supporting the client.DB
interface?
That aside, IMO interface segregation goes beyond that one liner. By segregating via interfaces you control the number pathways through the codebase and reduce complexity.
That one maybe. But I find that overly constraining to the point where one has to create more complex patterns to bypass the restrictions.
We end up with a public function that returns the public interface. The user-code that consumes that type only ever directly depends-on/has-access-to the client/interface funcs, instead of a maximal concrete implementation.
We don't. NewDB
is an internal function. Users don't have access to it.
The contract/interface of the public NewDB function then also remains identical to all other NewDB functions, allowing them to be used interchangeably without the swap being a breaking change.
There is only one NewDB
function. But even if we created many other DB alternatives, the important part is that those DB types cover the client.DB
interface so that our http and cli clients along with the other internal types that need client.DB
can use the client.DB
methods. It doesn't matter that those other DB types provide more features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree with the points raised by Andy, I don't think it's worth blocking the PR over. Specially in light of us doing the "moving fast experiment" I happy for this discussion to even resolve outside this PR. Just noting that the helper private function that was being called newDB
was already using this signature before (irrelevant to the discussion I know).
Side question: I noticed that since the discussion thread was started the *db
was made public (*DB
) now, curious for reasons behind that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. NewDB is an internal function. Users don't have access to it.
FFS lol - I forgot we moved it, sorry. 90% of me bothering about this is because I thought it was still a public func and you were exposing internals to users.
There is only one NewDB function
There are ones for the other 2 clients, and one from the node package no? They may have different names atm, but they are all basically the same func (and should probs have the same name).
Why would internal/DB be swapped for something else?
The callers to the NewDB
func(s) may wish to swap them out - this becomes much harder if the type changes and extra/non-interface stuff on the return value is used.
But anyway, this is just internal stuff, the extra exposure does make stuff easier, and we (in theory :)) should really be able to tell what the newly exposed stuff does anyway.
Happy for this convo to be resolved lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding for historical sake since I think Andy has dropped this (and as shahzad pointed out, we're aiming not to get bogged down during this cycle) - I think adding compile time checks to the db
package that make sure the DB
type implements the client.DB
is all that is really necessary. That along with the downstream callers (internal to the repo) having the option to assign/cast to client.DB
for their own safety if they want should cover us for most of our cases.
Fred and I briefly discussed this whole return type a few weeks ago and one thing it raised is we currently don't have a proper "embedded" client flow ever since we moved the db
package to internal
. The way the integration tests access the embedded flow is to instanciate a node.Node
instance, then getting the client.DB
instance from it. But we should have a dedicated and public flow for instantiating the client.DB
without the complexitity of the node.New
constructor as follow up work to this
internal/db/db.go
Outdated
@@ -339,13 +339,20 @@ func (db *db) DeleteDocActorRelationship( | |||
return client.DeleteDocActorRelationshipResult{RecordFound: recordFound}, nil | |||
} | |||
|
|||
func (db *db) GetNodeIdentity(context.Context) (immutable.Option[identity.PublicRawIdentity], error) { | |||
func (db *db) GetNodeIdentity(_ context.Context) (immutable.Option[identity.PublicRawIdentity], error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: What is with the underscore _
? Why was this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because just context.Context
seems like a mistake. The underscore says that the function parameters are this way because we're following some interface pattern or something like that but we aren't using this parameter within the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a known go idiom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same as the other areas we use _
to ignore a returned argument. https://go.dev/ref/spec#Blank_identifier
The only reason why just using context.Context
worked is because it was the only parameter in the method. If there was more than one, it would be a syntax error.
Unused function parameters are allowed in Go and that's why we often have ctx context.Context
as parameter with ctx
never being used. I feel that's misleading. Doing _ context.Context
at least tells the reader that this parameter needed to be there but we aren't using it.
internal/db/db.go
Outdated
func (db *db) GetIdentityToken(_ context.Context, audience immutable.Option[string]) ([]byte, error) { | ||
if db.nodeIdentity.HasValue() { | ||
return db.nodeIdentity.Value().NewToken(time.Hour*24, audience, immutable.None[string]()) | ||
} | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Remove the context.Context
arg? doesn't seem to be needed anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pattern we decided to keep after a discussion we had several months back about removing context. Happy to bring that up for discussion but for now leaving as is.
// - Document is public (unregistered), whether signatured request or not doesn't matter. | ||
func CheckDocAccessWithIdentityFunc( | ||
ctx context.Context, | ||
identityFunc func() immutable.Option[acpIdentity.Identity], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why did we need to do this function thing? instead of just immutable.Option[acpIdentity.Identity]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in some cases in doesn't make sense to do an extra network call for the identity before knowing if the document is even registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more? Which document is registered check are we talking about? and which network call for identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the net package, when we want to check if a peer has access to a doc we have to request the identity from the node requesting. That's a network call. If we request it before checking that the document is registered, then we might have done that network call for nothing. So the order of priority should be checking that the doc is registered and then requesting the identity to check if it has access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the identity not included from the node requesting from the get go? Maybe I am missing something, why does it need to be requested again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like clarification here :)
net/server.go
Outdated
) | ||
if err != nil { | ||
log.ErrorE("Failed to check access", err) | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: If access checking fails we will assume they have access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not they. It's self. It's a best effort check. If the check fails, the node receiving the request can make the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still am not sure I follow, if the local node receiving (self) fails to check weather it has access then the check will assume it does have access and the sender node can still decide if they want to share or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The initial block received is just a composite so there is no protected info within it. The node receiving checks if it has access to the document to see if it's worth requesting the other block. Ultimately, the node sending is responsible for the verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I got it now so the best effort check is there for the receiver to not make unnecessary requests (for it's own sake to not waste effort if it doesn't have access to other blocks). I forget if this is documented somewhere already, but might be worth it to have a one liner comment explaining this near here somewhere.
net/server.go
Outdated
|
||
// trySelfHasAccess checks if the local node has access to the given block. | ||
// | ||
// This is a best-effort check and returns true unless we explicitly find that we don't have access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Please document why we are even bothering to call/define this function if a 'best effort' is correct enough.
If the reason is that a proper check will be carried out somewhere else (e.g. another node), please also document why false-negatives are either impossible, or are tolerable, as it looks like we could refuse to process a block that should actually be processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it looks like we could refuse to process a block that should actually be processed.
I think you probably meant the opposite?
This method is for self check of access so it will only block processing if it can confirm no granted access. In any other case it will continue with the sync process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you probably meant the opposite?
Nope.
This function looks like it can falsely claim that we dont have access (e.g. local node doesn't have access (yet), but the other remote node does), in which case the block will not be processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks like it can falsely claim that we dont have access (e.g. local node doesn't have access (yet), but the other remote node does), in which case the block will not be processed.
Doesn't have access yet
is a correct claim that we don't have access. It doesn't matter what the remote node has access to. Trying to process a block that we know we don't have access will just result in the remote node not giving us access anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay - can you add that to the func docs, along with a bit detailing as to why a 'best-effort' is relevant/okay here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is documented already no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes it has been since I opened this thread, thanks! LGTM :)
eae46e0
to
137e97e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks solid. One blocking issue to resolve.
bus *event.Bus, | ||
acp immutable.Option[acp.ACP], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: would it be safer to use the same ACP implementation as the DB? For example: db.GetACP()
instead of the parameter here so that there's no possibility for misconfiguration.
@@ -447,3 +460,135 @@ func (s *server) SendPubSubMessage( | |||
} | |||
return t.Publish(ctx, data) | |||
} | |||
|
|||
// hasAccess checks if the requesting peer has access to the given ci. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: cid
instead of ci
} | ||
|
||
// If the requesting peer is in the replicators list for that collection, then they have access. | ||
if peerList, ok := s.replicators[cols[0].SchemaRoot()]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: is this thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for answering all the questions, just left some minor suggestions. Please do resolve the on going discussions before merge from others.
todo: Please make the PR title more descriptive right now it only says P2P ACP
which is quite vague for example it can be confused with the initial P2P acp work that has been done for #2366
Relevant issue(s)
Resolves #3179
Description
This PR brings ACP functionality to the P2P system. This ensures that permissioned collections do not share blocks of registered documents without ensuring the requesting node has access. This is bypassed on collection replication because the node sending the blocks is sending to peers explicitly.
Tasks
How has this been tested?
Some new and updated integration tests.
Specify the platform(s) on which this was tested: