-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🤯🤯🤯 OMG! Restate + Xstate library #2
Comments
Ah, it is published here: https://www.npmjs.com/package/@restatedev/xstate But you did some updates in the meantime and they are not on NPM. Will this pipeline be placed so that this starts to grow. I am crusing very difficult tasks with Xstate on the frontend and basic backend, but to be able to do it in a "durable" way is really next level stuff. EDIT: I can just use "public_api.js"... ma bad :). All works, testing now.. EDIT2: I literally cant believe this exists. Biggest fan! |
hi! I am so so glad to see your excitement about this. youre right there are some improvements since 0.0.1, let me do a release and the pipeline will run :) |
Thank you for pushing the new version. In this early stage, it would be really valuable to see the full list of tasks that need to be done before the lib reaches version 1.0, just by putting "# Road to 1.0" in README.md and just listing all the tasks that remain to be done. This will bring clarity and get people involved. I might contribute too if capable to pinch in, but I need a list of clear targets. You envisioned the solution, it works on the version 0.0.1. When it comes to 1.0 it will move mountains on the shoulders of Restate. |
What sort of things would you like to see in 1.0? For us, we need to see real world usage before solidifying the featureset and offering a backwards compatibility promise, etc |
I will definitely try this stuff in production on non-critical stuff to check how it is working. I would need a bit of time to dig into this deeply and report back. Give me couple of days please and I will post my feedback. |
edit: i am also the milanbgd011 account :) I got a bit of time, not still digging deep enough unfortunately because of time. But all works out of the box zero problems:
Even remembers the "after" trigger, but when it exists the state where after was initiated then it does not work. Even if I try to get it again to the state where the "pending AFTER:{} needs to fire" it will reject this "AFTER:{}" because it seems that each state has unique "UUID" which guarantees that the "after" will be executed if that unique ID of the current state is the same, otherwise it will be discarded. Which means that this "pending" after will never accidentally trigger if we end up in the same state, it simply does not belong to that state unique ID. >>> [restate] [auth/send][inv_1h3VdnA6H65X5hvfJibdxkttFWETR00te9][2024-10-19T23:28:02.235Z] INFO: Received now cancelled event 447f3f3a-61e9-4f54-b8e6-9b1d6bea4994.xstate.after.10000.client.authorizing for target {
id: 'xyy',
sessionId: '447f3f3a-61e9-4f54-b8e6-9b1d6bea4994',
_parent: { id: 'fakeRoot', sessionId: 'fakeRoot' }
} I tried parallel states, works perfectly. Also tried versioning the machine, like I leave machine in some "invoke" state to wait for 30 seconds, I stop the service, change the machine to go to "stopped" instead of "idle" state and start service. Restate uses new onDone target and just moves to stopped. This means that versioning the machines is a piece of cake. You just say what you want to happend with users that are in state that is being removed or rewired and anyone in that state will get new path to go to from that point. In your example you demonstrated that sendTo along with delay also works. Nesting machines works.
==== OMG! What have you built here?! This thing can crush every single task no problem :) KUDOS! Not a single problem, glitch or weird behaviour. This thing goes to production instantly (for non-critical stuff until I am 100% sure no problems will occur). ==== I am just wandering is there anything not covered in here. I mean should we make a list of xstate keys available and then check on/off if it works or not in README.md. This is unclear to me currently. |
Thank you! What lovely feedback to receive and I am glad you are trying your hardest to break it - definitely keep trying and if you succeed, I will make sure it gets fixed :D I think we are missing callback actors, observable actors and possibly transition actors. In practice callback is very similar to promise, observable im not sure really makes sense in a serverless/backend context, but you could certainly make rpcs to the restate ingress from some long-running process with an observable streams. transition actors, i am not sure if these can ever be async - if they are purely sync, they should work as-is, but if async they need a similar approach to promise actors where we run them in a separate handler. definitely more help identifiying particular use cases that are not covered - this would be hugely valuable |
Breaking it will not be easy task I think :) But sure I will try! Thank you for the offer to fix what I find broken! Neither of three mentioned actors are imporant. Serverless is very important, so observable is just wrong/weird I think.
Do you want me to create PR listing all xstate actions, actors... as a bullet points and then we can check off if they work or not in current version. I can confirm many of them and I can test others one by one when I get time, so we have full 1:1 mapping this Restate/Xstate repo to Xstate V5 and know what is covered and what not. I think visibility of this is key for progress here. PS. Doing some cool examples and letting David K. Piano like the X post will make this thing visible to all Xstate developers. This is the "marketing channel" perfect to get this off the ground. This is how I found Restate and your repo. |
@milanbgd011 that would be fantastic! and any suggestions re examples that perhaps align with what you see as interesting use cases, let me know and I can build them |
=== deleted the post, nor relevant anymore... |
@jackkleeman I tested 90%+ of the xstate API I think, all seems to be working, zero issues. One thing you mentioned on video was not perfectly clear to me, here is video with timestamp where this segment starts https://youtu.be/vECQbpNOQYI?t=1703 QUOTE: Nice thing is that machine returns the next state machine is in on each CURL sent to it, so that you can use this information to know what is going on the backend immediately, then track the progress with One thing that could be usefull is to have "event does not exist" when fired, to signify that none of the states/parallel states was able to receive this event and do something with it. But I think this is more xstate territory and must be done manually. I guess making the video to explain how you can dig deep into problems would be very usefull, becaue it will show what restate logs and what is available. Running simple machine with a few invokes and events and after. Then assuming that machine is in bad state for some reason, what data is available in Restate to dig into the problem. Since happy path works 100%, no problems at all. Taking the worst paths possible and how to get out of them would be very very usefull, it will show what Restate has baked in. Only thing that remains unclear is how to scheadule some event or state change. Two cases are in there, specific timestamp or after some time dynamically (not just It could be solved by creating parallel state which executes each 1h and in context we store last timestamp (like periodically sending email) and then if the value is more than 10 days we send email, if not another execution will be triggered in 1 hour and check again. .sleep() is basically |
In Restate, some types of services are 'keyed' and they can store state. For example, virtual object services have this property, and the xstate library is built on top of a virtual object. For a given key, handlers of a virtual object will by default lock the object and can not run concurrently with each other. However, you can also set a handler to be 'shared' which means it can run concurrently with any other handler running, but it can't mutate state, and it only reads a snapshot of state. In the xstate integration, the invokePromise handler is one such handler, which doesn't block the main loop. This is necessary because it can run for arbitrarily long to complete a promise execution. However, all other handlers do not block (in line with xstate API) and so should run for very short periods of time. These are executed serially, so that we have strong ordering, and so that they can safely update state instead of just read it.
Agreed. I think it is possible to interpret the state machine at runtime to determine the set of valid events for the current state, and reject invalid events. If you file a separate issue I can take a look at this! Re recurring events, i think broadly the idea of repeatedly polling via self-transitions (after: 1h -> self) makes sense. I think you can use child machines to have multiple of these polls concurrently executing. I would hope the approach to this is the same with our integration as it would be in native xstate. |
@jackkleeman Thank you for the detailed response. I will try to keep it short and precise, not to overwhelm you with too many questions at once. Sorry for late response, private matters. I crated new issue for non-listened to events here: #3 I will express how I percieve the information you shared, then you correct me if I am wrong. BLOCKING (should be instant, very fast to execute because they block machine)
NON-BLOCKING (does not block machine, we can send more events to it no problem)
I think that I fired event against machine with 2 parallel states, while one state was inside "invoke: fromPromise() waiting 5 seconds to resolve". It did take the new event and started transition on the second parallel state without affecting current processing of first state. Is this expected? This is perfect scenario basically. Machine responds to events if possible to receive them on matter which parallel state is doing what. When we are doing the "blocking" stuff, then all incoming events are being scheduled while machine is "stuck" doing some assign(), but none of the events will drop. |
|
@jackkleeman then you for the clarification, it seems very simplistic to reason about, just like a simple xstate machine - but a durable execution one. I am not sure how you built this resilient thing in couple of commits really. Seems really weird :) Really exciting about this xstate + restate. Maybe it is not obvious for others how powerful this combinaiton, but sure enough they will know. Thank you once again for the effort, really appreciated. Keeping you posted. There is so much stuff to test, will report if any problems occur. EXAMPLE 1: HEAVY TIME-RELATED STUFF From the examples perspective, first one needed is with extreme heavy usage of scheaduled events (or anything related to time) could be a good candidate, because simple execution is ok, but doing this durably is another beast. Demoing how you can do newsletters, subscriptions and other time related stuff inside single machine with parallel states. If user subscribes to email, we save something to context and then move the parallel states "subscription" to "active" and each 7 days sends newsletter by circling thru "waitingNextDeliveryWindow" and "invokeSendingEmail" to actually send it. For me to cancel newsletter subscription I just move parallel state into "idle" to break the loop and done. SImple but very effective. EXAMPLE 2: MACHINE VERSIONING One big topic is machine versioning, for example if we have some users in "awaitingApproval" state and we create new improved "awaitingConsesus", and basically put "after" on old state to make it go into the next state as soon it boots the machine again, but I do not think this is possible. The question is how to initiate the transit to new awaitingConsensus using "after" or anything else to make machine switch to the next state. One possible solution is change "on:{}" to capture any event being sent to the macine and this initiates move to the new state then (effectively updating current users to use new workflow). EXAMPLE 3: HARD PROBLEMS IN OTHER SYSTEMS BEING HANDLED ELEGANTLY IN RESTATE+XSTATE Also, since you guys come from Apache Flink, you know the limitations of other systems (kafka blocking entire queue, stuff missing in Flink and others, then showing how elegantly this hard problems for this other systems is handled in xstate and restate, thus avoiding the pain with other solutions. PS. Do not feel obliged to do any of this stuff, I am just spitting out ideas, nothing else. |
I guess the xstate bit is quite simple, but the restate bit is a lot of commits ;) Awesome ideas! Re machine versioning, this is a hard problem in general in durable execution. In some ways its a bit more tractable using XState, because you only have to ensure that the machine state is still valid under the new version, ie the states, inputs etc for the current state (of all existing state machines) still make sense in the new version. Currently to solve the more general versioning problem users can inform Restate of new versions and it will send new invocations to the new versions; however in the case of this library, the new version is still going to read the state set by the old version, so there is still a need for state-compatibility. I think the safest way to do versioning when making changes that can't be state-compatible is to have multiple state machine definitions side by side inside a parent state machine. The first step of the parent state machine would be to select the latest version available in the running service, then delegate to that machine. This selected version is then persisted by xstate like any other state node, and if later new versions are added, we will keep using the old state machine, but new invocations will use the new one. I would hope this is only needed in exceptional circumstances however |
I agree, lets try to clarify a bit more thru examples to be sure we are talking about same stuff. Rule is simple, do not delete old state, just rewire it to forward to right place and keep it in code until you are sure 100% no machines running depend on it. It does not bother if you have these old states lingering around, it is ok. It is way better than doing any versioning which is very hard to maintain. We have 2 different types of "versioning":
Versioning the state exampleV1 {
idle: {},
awaitingConfirmation: {
on: {
CONFIRM: {...},
.....more events
},
},
} V2 It will react to any event sent to machine and forward this event to new state the machine should be in {
idle: {},
awaitingConfirmation2: {
on: {
CONFIRM: {...},
.....more events
}
},
// inactive states (handle old machine versions)
awaitingConfirmation: {
on: {
'*': {
target: 'awaitConfirmation2',
actions: [
({ enqueue, event }) => {
enqueue(event); // <<<<<<<<<< this will execute the event in new "awaitConfirmation2" state
}
]
}
}
},
} This basically means that the UI/API firing the event will use the same event, but the machine structure will be different, which means no need for API request updates or UI updates that fire the APIs. Perfect separation. This example forwards all events to new state, but we can even separate by event and send them to different states too, so the granularity of versioning in machine is very easy. Just take all defined events and decide what they will do (forward to which state) and done. To get traction on this project, we need blogpost of some example that handles impossible scenario, something that other systems are not capable of - or is too complex to do them. Kafka and Temporal have big gaps and problems when running in prod. Kafka can block and Temporal is very unintuitive to write. This example must expose weaknesess of other systems and this blog post will be than liked on X by others and that is the way to go. But needs to be something really impossible to do in other systems. If you know Kafka and Flink and a bit of Temporal, you are probably aware what they have problems with. One more issue that is critical to build ie. Uber like app is realtime tracking of vehicle movement. Now, this is done via Kafka usually and seems like Restate is not good fit for this job, because of the frequency of possible updates to the machine. Am I right here or not? Is there a way to handle this in restate efficiently and be able to run 10.000 of this stuff on single dedicated server? I think using Redis Streams here would do excellent job, because of the speed and transient nature of this tracking.. and redis is very fast and efficient with space. No blocking, no maintenance, easy to setup and run. Right? |
Your examples re versioning makes sense to me I agree it would be cool to write a post comparing kafka/temporal to resstate/xstate. Its definitely on our to do list! I think Restate can definitely handle a large volume of updates, Kafka style. Restate is internally a streaming system and we've benchmarked thousands of events per second on a single machine - and we can horizontally scale, too. I don't think you would need to bring in another tool like Redis streams. That said, you may not want to use the xstate integration for that kind of volume - for maximum throughput you might want a Go or Rust service, using Restates default RPC API, instead of the Xstate one. |
Thank you Jack! In the plethora of tools and approaches, it is easy to miss the stuff Restate brings to the table actually. Creating dummy "uber" app using restate/xstate (with RPC for high thruput GPS location and xstate for all other) and then comparing to Temporal/Kafka, and pinpointing the "weak spots" in their stack will pull in people to give it a try. Building good starting point for others to experiment seems legit. So basically, "let us wire up our product to simplistic but complete real world scenario and then show off where they shine". You know, even all cars seem alike but deeply they are not, but this is very hard to spot without spending a ton of time and investing yourself to learn new framework/tool before going to prod. So, many will not do it unfortunately. Restate goes straight to the backbone of the entire company, whoever decides to use it. Very specific situation. :) I have perfect project for restate/xstate duo, so I hope to get my hands dirty soon. |
Restate server (not services) uses "internal database" and RocksDB for some additional stuff. Only thing that comes to my mind is to deploy ie. 3 independent Restate servers and then send 33% of users to each based on ie. ID of the user, like users that have ID ending with 0,1,2,3 go to restate server 1, then ending with 4/5/6 go to server 2 and 7/8/9 go to server 3. But then splitting to more restate servers for running workflows is impossible. I do understand fully choice to use internal database but personally I think this could be a big no-no for a lot of companies because ie. postgres provides so much depth in terms of performance, replication and all. From business perspective it makes a lot of sense, because if you integrate database into the restate then restate can never become just a layer on top of postgres thus diminishing the value of restate a lot. Still, horizontal scaling of server is big issue for sure here. I could not find this information at all. Replication is very critical part of the horizontally scaled Restate server. I mean, when you look at the levels TigerBeetleDB reaches on this stuff then you appreciate the complexity of reliable replication more: https://sim.tigerbeetle.com/ OR I should think of the "Restate server" (oversimplification) just as database layer. And Restate services are actually the ones doing the heavy lifting. Then the question is how much the "Restate server" can handle then? Services are stateless and we can scale them horizontally indefinitely basically. |
Its not released yet (should have something by end of year), but Restate is implementing a replicated distributed log similar to delos (https://www.usenix.org/system/files/osdi20-balakrishnan.pdf). The log is indeed sharded on partition key which is derived from the virtual object or workflow key (or is random for nonkeyed services). This means that you can scale horizontally in the number of keys very easily by adding more servers. Restate IS a database and our goal for doing this is not one of business defensibility but one of performance. If you use a general purpose database for this, you get all their trade offs that are intended to be sensible for a broad range of use cases. By building our own storage layer, every little trade off decision in the DB is targeted to this quite unique use case. the internals of restate are much closer to something like kafka than they are to something like postgres, so really we are more of a broker than a DB |
@jackkleeman Restate just keeps on giving. Sorry for the hard questions, but this last comment clarifies the idea behind Restate much deeper for me personally. Will read the PDF soonish. Each performance squeezed out is huge influence to the cloud offering and growth of the project. Being able to delegate this responsibility to the creators of the project and just relax - no price on that. Since it is so critical system, stable cloud will deliver so much value to any business. Even allow me to create a bunch of projects quick and not think about infra at all. Looking forward having this "superpower" for building really complex apps that generate huge value for end users. Thank you! |
Dear Jack Kleeman, HUGE thank you! This is a major breakthru!!!
Number of SDKs and this Xstate mapping tells me Restate is the real deal.
Anything remotely similar to this, but very basic remapping is here, same thing you built here properly:
https://youtu.be/GpbOkDjpeYU?t=1631
I tried to somehow make the Temporal click with NextJS > therefore and Xstate:
temporalio/samples-typescript#97
I implemented xstate behind the API call that saves snapshot to Postgres, it works but very basic stuff.
With your mapping and power of Restate and Xstate now the possiblities are endless.
Restate Cloud offer seems attractive.
====
Question:
When will the library be published on NPM so I can try it following the examples you created and try to push it more.
The text was updated successfully, but these errors were encountered: