-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new snapshots functionality #465
base: main
Are you sure you want to change the base?
Conversation
For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, Amory! But we do need more before merging it
- I am sure that this regular/dedup code duplication is a bad SW design decision. there is a ton of replicated code. This bloats the code and making code maintanence much harder. Please merge them and introduce requires runtime arguments etc. Same stands for firecracker-containerd, it has to be 1 version.
- make sure that you allow the user to override the interface name because a node may have several NICs and the user might want to drive the traffic thru a particular interface (e.g., the highest BW NIC).
- we need unit and integration tests for all the added functionality. This is a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
Please see my minor comments, and I look forward to seeing the unit and integration tests ASAP.
29882fb
to
870cb25
Compare
45bd9b3
to
f6a953d
Compare
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
ed85025
to
f5e0aee
Compare
Signed-off-by: Amory Hoste <[email protected]>
Signed-off-by: Amory Hoste <[email protected]>
@@ -99,6 +99,9 @@ if [ "$SANDBOX" == "gvisor" ]; then | |||
fi | |||
|
|||
if [ "$SANDBOX" == "firecracker" ]; then | |||
echo Cleaning snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent seems off
github.com/ease-lab/vhive/examples/protobuf/helloworld => ./examples/protobuf/helloworld | ||
github.com/firecracker-microvm/firecracker-containerd => github.com/ease-lab/firecracker-containerd v0.0.0-20210618165033-6af02db30bc4 | ||
github.com/firecracker-microvm/firecracker-containerd => github.com/amohoste/firecracker-containerd v1.0.0-enhanced-snap // TODO: change to vhive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change
@amohoste could you please update the PR's description, particularly the terms. Much of this text can be moved to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Amory! Just need to split up the lines into smaller ones for more granular version control.
@cvetkovic could you please read the docs and see if there are any issues with the explanations? is everything clear to you?
docs/fulllocal_snapshots.md
Outdated
@@ -0,0 +1,25 @@ | |||
# vHive full local snapshots | |||
|
|||
When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources running upon shutting down a microVM such that it can be re-used in the future. This technique has the advantage that the shim does not have to be recreated and the block and network devices of the previously stopped microVM can be reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the full local snapshot mode can be further configured using the following flags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please split into short lines for more granular version control
@amohoste could you please update CHANGELOG.md? |
Signed-off-by: Amory Hoste <[email protected]>
# vHive full local snapshots | ||
|
||
When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a | ||
full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what is a shim here?
offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. | ||
This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an | ||
extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the | ||
full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fulllocal
looks ugly. Maybe consider full_local
.
reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been | ||
offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. | ||
This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an | ||
extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivate why patching is needed so a user can understand what were the incentives for creating these two modes.
full local snapshot mode can be further configured using the following flags: | ||
|
||
- `isSparseSnaps`: store the memory file as a sparse file to make its storage size closer to the actual size of the memory utilized by the microVM, rather than the memory allocated to the microVM | ||
- `snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amount of capacity - rewrite
## Remote snapshots | ||
|
||
Rather than only using the snapshots available locally on a node, snapshots can also be transferred between nodes to | ||
potentially accelerate cold start times and reduce memory utilization, given that proper mechanisms are in place to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve, not reduce memory utilization.
Summary
This PR adds the functionality to boot multiple uVMs from a single snapshot, giving more flexibility to using snapshots and also opening up the possibility to add remote snapshot restore functionality.
Implementation Notes ⚒️
The following external dependencies have been updated to support the new snapshot functionality. Once these have been merged, the go.mod file should be updated to the respective ease_lab repo's:
Since the new snapshotting without offloading is not yet compatible with the reap snapshots, an orchestrator interface has been added to support the old logic with offloading and the new snapshots. To also support REAP snapshots in the future with the new snapshotting logic, we would need to be able to obtain the UPF socket path to handle page faults before loading the snapshot which is currently not possible. The two snapshotting implementations have been named "deduplicated" for the new snapshots and "regular" for the old snapshots with offloading but these could probably use better names.
The following components have been added to support the new snapshots, but are also used in the old implementation with offloading:
The following components have been added exclusively for the new snapshots: