Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core db stubs and create flow #405

Draft
wants to merge 39 commits into
base: go-1.18
Choose a base branch
from
Draft

Core db stubs and create flow #405

wants to merge 39 commits into from

Conversation

neb42
Copy link
Member

@neb42 neb42 commented May 9, 2022

Opening this early so people can have a look. It probably shouldn't be merged until we're happy with the architecture.

  • New layout for microservice code. Here the code related to core db is entirely isolated from the rest of the code base (save for some of the deployment commands/config). The only code I've added outside of this directory is for common use across services. Code is also split out by resource within the microservice for further isolation.
  • I think the main improvement over our current flows is that the previous store/sqlmanager/sql layers have been moved to service/model/sqlbuilder layers. I feel each layer here has a clearer responsibility. Before the sqlmanager was too tied to the business logic and cut across resources.
  • validation.Validate to make validation more consistent and simple
  • Stubs for unimplemented flows. This is something we should do going forward. It will make parallelising on the backend much quicker if we can define the flows before hand and implement each stub one by one.
  • Added migrations outside of gorm. This uses migrate with migrations written in raw sql.

One thing I noticed is that creating a new microservice touches a lot of places. This should either be improved or documented.

I initially wanted to do just the stubs here, but I got carried away and started the create flow as an example. It is probably 70-80% finished.
TODO:

  • Finish the flow (eg. creating the record table)
  • Validation + JSON marsheling/unmarsheling for each struct that needs it
  • Tests

@rational-terraforming-golem
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rational-terraforming-golem
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neb42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

package types

type PaginatedData[T any] struct {
Data []T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this have info on start and end index of the contained data? or page number? some meta data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's not complete. I was just messing around with generics. I'd probably flesh this out when doing the list endpoint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather add metadata + interface to our current Lists objects rather than implementing another model collection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And returning a next bookmark in the metadata perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as

type ListMetadata map[string]string
type RecordList struct {
  Metadata ListMetadata
  Items: ?
}
/////
{
  "metadata":{"next":"9e44c0273dc3b838"}
  "items":[
    {...}
  ]
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the generic is much cleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but in this case, we can't really use generics for each entity because we deal with runtime types.
We sure can define a generic ObjectList such as

type ObjectList[T:Object] struct{
  Metadata: ListMetadata
  Items: []T
} 

and reuse that across

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that just what I had? (with the metadata)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just seeing a []Data field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, lol yeah I see. Yes

@@ -0,0 +1,43 @@
BEGIN;
Copy link
Member

@internetti internetti May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change things about the sql schemas later, we'd add another file for the updates in this folder, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the next change would be 2_<name>.up.sql

"constraint_enum",
"constraint_custom",
},
[]any{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be typed more specifically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of

}

m, err := migrate.NewWithDatabaseInstance(
"file:///Users/bmcalindin/workspace/core/pkg/server/core-db/migrations",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would need to be configurable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I need to figure out how to get this to work with relative paths

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can embed the migrations with the (really good) https://pkg.go.dev/embed package

ID string `json:"id"`
Cardinality Cardinality `json:"cardinality"`
SourceEntityID string `json:"sourceEntityId"`
SourceAttributeID string `json:"sourceAttributeId"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't see this in the frontend, what's this needed for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do any relationships in the frontend client as I was only creating a single entity. It's used for defining relationships between entities (see the confluence docs for more details).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the relationship I was aware of, but why do the relationships need attributes? what's that for? and the relationship type in the frontend doesn't have these properties

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I want to think through relationships a bit more. I've added a line in the confluence doc that the relationship stuff may be wrong. Previously I was thinking of relationships as being defined by the primary keys, but I was thinking here that they could also be defined on an attribute level. I'm actually leaning back towards doing it based on ids, but I want to spend more time thinking through this again.

As I was typing this one other thought I had, if we were to have a "question bank" then if entities pulled attributes from the question bank we would actually have some implicit relationships. Maybe not useful functionally, but something interesting to see on the analytics side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's gonna bite us in the ass. I would rather add low-level FOREIGN KEY indexes in table objects rather than trying to find an abstraction for those relationships

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's gonna bite us in the ass.

Which bit? Also I wouldn't over index on the relationship stuff. I think I've got it wrong here and I want to go back and review it.

columns []string,
values []interface{},
) schema.DDL {
var placeholders = make([]string, len(columns))
Copy link
Member

@internetti internetti May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"placeholders" as in "defaults"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy+paste of another function that was unfortunately made private

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, are they meant to be defaults or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what it does is make $1, $2, $3, etc which correspond to Args: values,. So when the statement is executed values[0] is $1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So not defaults

Copy link
Member

@internetti internetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about having a swagger documentation for the endpoints?

the new structure is a lot more intuitive to me, I like it

@neb42
Copy link
Member Author

neb42 commented May 9, 2022

what do you think about having a swagger documentation for the endpoints?

I think we do have open api json being generated. From my understanding swagger just uses that.

@ludydoo
Copy link
Contributor

ludydoo commented May 11, 2022

Boy.. this is a massive PR

Rollback(tx *gorm.DB) *gorm.DB
}

type transactionManager struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the use of TransactionManager and transactionManager... We already have these methods on the db object (either Gorm or SQLx or sql.DB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if we change our db tooling (like we discussed) then the transaction interface will remain the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think its a bit early to introduce this since the sql.DB is exclusively called from whatever store we have. We kind of don't need yet the handlers to have access to this interface. I think it's likely that it will remain like that, basically that the handler will simply Unmarshal Validate Call Store.Something, where the Store will do pretty much everything related to DB calls and so on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely do need it. When you get to the service layer you'll see how.

Controller
  Unmarshal
  Validate
  Call Service.Create
Service
  TransactionManager.Begin
  Call Store.InsertA
  Call Store.InsertB
  Iterate over x:
     Call Store.InsertC
  TransactionManager.Commit
Controller
  Validate
  Marshal

Copy link
Contributor

@ludydoo ludydoo May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see why we need a service layer ? This microservice "business task" is to talk to the database. So there is no need to encapsulate the database layer in an infrastructure layer (if you think DDD design). The database concepts should be first-class citizen of this whole microservice. What exactly would be the role of the service layer in that case ? I think the core-db-api is a really fancy store over http (or eventually websockets/grpc). But we basically expose a database through an api. So the role of service layer in this case eludes me.

I think the service layer IS the store layer

Copy link
Member Author

@neb42 neb42 May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever we implement in this layer we basically have to fully and accurately replicate on the client side in Typescript

Build the api in typescript so we can reuse it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes in the Java, .Net, Python, Ruby, dot net or other programming paradigms where the frameworks encourage you to use dependency injection containers like Spring, Windsor & al., automatic field injection and so on. But the advantage and strength of go is that it provides enough out of the standard library to create an application that is almost as fast as raw C++, supports extreme parallelization, etc. without needing any third party library. It is MUCH faster than NodeJs, Python, and others. It is also much easier to reason about because there is no framework-specific, automagical, convention-over-configuration stuff. It's all there in the code. There's also no annotation-based logic, such as NestJS @controller and others.

Another advantage is portability, since you can compile a go program into a portable executable that runs on all platforms, x86, windows, ios, raspberry pies, wasm, etc. You can end up with a container that contains a single file, the executable itself, which is a huge win for security, performance, and eventually cost. The recurring problem with NodeJS, python & al, is that you end up with thousands of dependencies, direct or indirect. And it makes it extremely difficult to bundle, and very long to build. By removing these dependencies, you also remove the need to keep up with vulns that are spread over thousands of libraries, and have to manage a very limited subset instead. NodeJS's founder even publicly said that he would rather use Go (or deno) for distributed apis rather than Node.

Unless there is a library that absolutely requires it, I don't see any reason not to use go.

And in the microservice architecture we are leaning towards, the service is the program. I think there would need to be a very specific usecase if we were to use services within microservices. For example, in a monolith app, we would probably have some kind of a CoreDbService or something like that. But with microservices, we already made that cut upstream, and divided code so that CoreDbService is its own microservice. So it seems to me that introducing a N-Layered architecture within a microservice architecture is overkill.

What we are building here is not a service per-se. We're more like coming up with a protocol, and programming a node to be part of a network of peer-to-peer clients, so that the data is available at all times. I think that's the main shift in how we think about this project vs other regular web application projects. This is not a web application. Given a node in the network, it would see the address of this server like the address of any other probably

# peers.ts
{"peers": [10.22.104.203, 11.103.34.21, ...]}

Thie core-db resembles more a storage node than a web application. A node that only provides the Durability guarantee of a distributed, p2p storage solution. Which in a way, would perhaps make more sense if it was a Pull-based system rather than a Push-based system. That would probably make it much easier for the reconciliation logic, as each actor in the network would be responsible to keep its data up to date, rather than expecting other actors to push data to it. If each actor exposes a read api, it is sufficient for each actor to keep things up to date within themselves. In a way, there is no need for a public write api. Each actor could poll other actors in the network in a pull-based fashion, rather than having to deal with potential concurrent pushes to itself.

We could use something like the gossip protocol to distribute the latest timestamp of the latest change etc. between all actors in the network. Those could choose to pull the data when they see fit. If there is a discrepancy between the lastest local pulled state vs the gossiped latest state, then the actor knows it should pull stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advantage of gossip is that it will support fragmented networks, such as when internet goes down, and is extremely fast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really advocating for typescript as I agree with most of your points about go. The main benefit it would have is development speed and code sharing with the frontend. I think it's not the best choice, but probably the more pragmatic choice given a team of our size. Again, not suggesting we switch.

I don't agree with your point about not using dependency injection in go. Sure it may not explicitly encourage it, but it's still a good pattern to use. For example when we switch from writing to standard sql tables to writing to an event stream, we can easily swap out a single module.

I don't think a microservice architecture is a reason not to use a layered architecture within the service. If a service requires it then we should at it. Otherwise it can easily become very bloated and difficult to work with.

On the service layer, I think there are other use cases for it. In the new forms api it will need to communicate with the core-db api. I suspect this will warrant a service layer, although I haven't thought through this in detail yet.

On the distributed stuff. Yeah that all sounds good (without thinking too much on it), but I think we need to be pragmatic and consider this as a web service for now. If we start trying to design this as a p2p system without actually making a p2p system we'll end up down a rabbit hole and nothing will get done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up an example to models/foo that removes the service layer. I don't think it's right as the logic in Create is generic enough to be removed from the postgres implementation. But I thought it's worth having an example there to discuss.

server.go
```

### Controllers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that exactly what the current handlers do ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handlers = controller, sorry maybe I should have used handlers but I was just using the language I'm used to using. Happy with either

USED FOR TESTING PURPOSES ONLY
*/

import { BaseRESTClient } from 'core-api-client';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where to put this comment, but just based on naming (code-DB-client), we should probably be dealing with tables, rows, fields, constraints rather than entities

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the advantage of having a low-level api such as a DB api would be to allow clients to interact with lower level concepts that will become necessary for reconciliation/offline use, like row versioning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concept of Entity is too high-level for this kind of API, and we already have a some kind of a similar concept encapsulated in the forms api.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably be dealing with tables, rows, fields, constraints rather than entities

Happy to talk about naming of resources. I'm not set on anything, but I thought an abstraction over the specific database technology used would be good.

we already have a some kind of a similar concept encapsulated in the forms api

What exactly are you referring to here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think these conversations are better for the confluence docs. This pr is more about the go api structure. I'm just using core-db as a reason to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean by that is concerning the conversation we had about exposing low level concepts around tables, records, etc, its upcoming usefulness in terms of implementing offline, and that the forms api is basically syntactic sugar that makes it easy to create those complicated, low-level objects. The forms-api is a higher level concept that depends on the db-api. Wheras the db-api is basically the lowest level thing in core, and the most important one probably.

What I foresee is that a subset of this core-db-api will basically become our reconciliation endpoint for offline use.

There are 2 apis I see in this microservice

  1. Management api to create tables, constraints, stuff like that
  2. Data api, to push data to those tables

The write part of (1) - Management API, should be exclusively for online use
The read part of (1) - Management Api, closely resemble what we need in terms of forms discovery for offline
The (2) - Data Api, closely resemble what we need in terms of apis for data reconciliation for offline

And whatever we come up with this Data Api will probably be what we need to implement on the client side as well, when we implement cross-device reconciliation. So this Data Api should be written in a way that it is agnostic to the fact that it is a server or a client or whatever else.

We don't want the mobile devices or website to have a different api whether its talking to a server or to another client in p2p mode. We probably want this api to be exactly the same. So imho the most sensible part of this whole PR is exactly in the design of this data api.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I pretty much agree with you. It's the interface of the CoreDBClient that should stay the same between offline and online. Seen as the client basically just maps to api endpoints then the api interface should also be fairly consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tables, rows, fields, constraints rather than entities

When you talk about the naming here, do you mean we keep the same data structures but rename them or are you talking about changing the data structures?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have a DB service, but the entities within this service should be form the DB domain, such as tables fields constraints etc. rather than entities

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that for the api interface rather than GET /entity, you would have:

GET /table
GET /field
GET /constraint

or would the table entity still encompass the fields and constraints?


import "reflect"

func Validate(s any, isNew bool) ErrorList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that objects have a Validate method. In my opinion this violates the SOLID (Single-Responsibility) principle. Validation logic should be in its own encapsulated function/component/class. That will become increasingly evident once validation logic needs to make database calls, and then we need to inject a database handle into each Data Transfer Object (dto).

@neb42
Copy link
Member Author

neb42 commented May 11, 2022

Boy.. this is a massive PR

Haha sorry, got carried away. Happy to talk anyone through it in more detail if useful.

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
6.9% 6.9% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants