Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining proposal A for FreeBSD jail configuration #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions docs/proposals/PROPOSAL_A.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Proposal A - FreeBSD Jails

This proposal recommends changes to describe the requirements for mapping an OCI
runtime container to a corresponding FreeBSD jail.

## Modifications

This suggests adding an object to the [FreeBSD-specific section](https://github.com/opencontainers/runtime-spec/blob/main/config.md#platform-specific-configuration) of the [container configuration](https://github.com/opencontainers/runtime-spec/blob/main/config.md) to describe the required parameters for the jail.

## Jail Configuration

Jail parameters and devfs rules for the container's jail

**`devices`** _(array of object, OPTIONAL)_ - devfs rules for this container.

Each element is an object with the following fields:

- **`path`** _(string, REQUIRED)_ - the device path relative to "/dev"
- **`mode`** _(integer, OPTIONAL)_ - device permissions as an integer which is interpreted as in chmod(1).

**`jail`** _(object, OPTIONAL)_ jail parameters for this container.
dfr marked this conversation as resolved.
Show resolved Hide resolved

The following parameters can be specified for for the container jail:

- **`parent`** _(string, OPTIONAL)_ - parent jail.
The value is the name of a jail which should be this container's parent (defaults to none).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name or JID?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ocijail currently, only name but it wouldn't be hard to support both. On the other hand, this interface is not intended for humans and need not be 'user-friendly'. The caller should know the name of the parent.

- **`host`** _(string, OPTIONAL)_ - allow overriding hostname, domainname, hostuuid and hostid.
dfr marked this conversation as resolved.
Show resolved Hide resolved
The value can be "new" which allows these values to be overridden in the container or "inherit" to use the host values (or parent jail values). If set to "new", the values for hostname and domainname are taken from the base config, if present.
- **`ip4`** _(string, OPTIONAL)_ - control the availability of IPv4 addresses.
The value can be "new" which allows the addresses listed in **`ip4Addr`** to be used, "inherit" which allows all addresses in the jail's vnet or "disable" to stop use of IPv4 entirely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a child jail of a parent with "new", I would assume "inherit" means to allow all addresses that are part of the parent's vnet rather than all addresses of the host. Is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. This isn't well documented but the kernel implementation will return EINVAL if vnet is set to "new" and ip4 or ip6 is set at all. In my testing, the way to share host or parent addresses is to leave vnet unset and set both ip4 and ip6 to "inherit". To create a new network namespace for the container, set vnet to "new" and leave ip4 and up6 unset.

- **`ip4Addr`** _(array of string, OPTIONAL)_ - list of IPv4 addresses usable by the jail
- **`ip6`** _(string, OPTIONAL)_ - control the availability of IPv6 addresses.
The value can be "new" which allows the addresses listed in **`ip6Addr`** to be used, "inherit" which allows all addresses in the jail's vnet or "disable" to stop use of IPv6 entirely.
- **`ip6Addr`** _(array of string, OPTIONAL)_ - list of IPv6 addresses usable by the jail
- **`vnet`** _(string, OPTIONAL)_ - control the vnet used for this jail.
The value can be "new" which causes a new vnet to be created for the jail or "inherit" which shares the vnet for the parent (or host if there is no parent).
- **`sysvmsg`** _(string, OPTIONAL)_ - allow access to SYSV IPC message primitives.
If set to "inherit", all IPC objects on the system are visible to this jail, whether they were created by the jail itself, the base system, or other jails. If set to "new", the jail will have its own key namespace, and can only see the objects that it has created; the system (or parent jail) has access to the jail's objects, but not to its keys. If set to "disable", the jail cannot perform any sysvmsg-related system calls.
- **`sysvsem`** _(string, OPTIONAL)_ - allow access to SYSV IPC semaphore primitives, in the same manner as sysvmsg.
- **`sysvshm`** _(string, OPTIONAL)_ - allow access to SYSV IPC shared memory primitives, in the same manner as sysvmsg.
- **`enforceStatfs`** _(integer, OPTIONAL)_ - control visibility of mounts in the jail.
dfr marked this conversation as resolved.
Show resolved Hide resolved
dfr marked this conversation as resolved.
Show resolved Hide resolved
A value of 0 allows visibility of all host mounts, 1 allows visibility of mounts nested under the container's root and 2 only allows the container root to be visible. If unset, the default value is 2.
- **`allow`** _(object, OPTIONAL)_ - Some restrictions of the jail environment may be set on a per-jail basis. With the exception of **`setHostname`** and **`reservedPorts`**, these boolean parameters are off by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all of the fields defined in this proposal on a per-jail basis? Is there something different about these fields specifically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language came from the FreeBSD manpage - but perhaps it would be better to re-word to make it clearer in this context.

- **`setHostname`** _(bool, OPTIONAL)_ - Allow the jail's hostname to be changed.
- **`rawSockets`** _(bool, OPTIONAL)_ - Allow the jail to use raw sockets to support network utilities such as ping and traceroute.
- **`chflags`** _(bool, OPTIONAL)_ - Allow the system file flags to be changed.
- **`mount`** _(array of strings, OPTIONAL)_ - Allow the listed filesystem types to be mounted and unmounted in the jail.
- **`quotas`** _(bool, OPTIONAL)_ - Allow the filesystem quotas to be changed in the jail.
- **`readMsgbuf`** _(bool, OPTIONAL)_ - Jailed users may read the kernel message buffer.
- **`socketAf`** _(bool, OPTIONAL)_ - Allow socket types other than IPv4, IPv6 and unix.
- **`mlock`** _(bool, OPTIONAL)_ - Allow locking and unlocking of physical pages.
- **`nfsd`** _(bool, OPTIONAL)_ - Allow the jail to act as an NFS server.
- **`reservedPorts`** _(bool, OPTIONAL)_ - Allow the jail to bind to ports lower than 1024.
- **`suser`** _(bool, OPTIONAL)_ - The value of the jail's security.bsd.suser_enabled sysctl. The super-user will be disabled automatically if its parent system has it disabled. The super-user is enabled by default.

### Mapping from jail(8) config file

This table shows how to map settings from a typical jail(8) config file to the proposed JSON format.

| Jail parameter | JSON equivalent |
| -------------- | -------------------- |
| jid | - |
| name | see below |
| path | root.path |
| ip4.addr | freebsd.jail.ip4Addr |
| ip4.saddrsel | - |
dfr marked this conversation as resolved.
Show resolved Hide resolved
| ip4 | freebsd.jail.ip4 |
| ip6.addr | freebsd.jail.ip6Addr |
| ip6.saddrsel | - |
| ip6 | freebsd.jail.ip6 |
| vnet | freebsd.jail.vent |
| host.hostname | hostname |
| host | freebsd.jail.host |
| sysvmsg | freebsd.jail.sysvmsg |
| sysvsem | freebsd.jail.sysvsem |
| sysvshm | freebsd.jail.sysvshm |
dfr marked this conversation as resolved.
Show resolved Hide resolved
| securelevel | - |
| devfs_ruleset | see below |
| children.max | see below |
| enforce_statfs | freebsd.jail.enforceStatfs |
| persist | - |
| parent | freebsd.jail.parent |
| osrelease | - |
| osreldate | - |
| allow.set_hostname | freebsd.jail.allow.setHostname |
| allow.sysvipc | freebsd.jail.allow.sysvipc |
| allow.raw_sockers | freebsd.jail.allow.rawSockets |
| allow.chflags | freebsd.jail.allow.chflags |
| allow.mount | freebsd.jail.allow.mount |
| allow.quotas | freebsd.jail.allow.quotas |
| allow.read_msgbuf | freebsd.jail.allow.readMsgbuf |
| allow.socket_af | freebsd.jail.allow.socketAf |
| allow.mlock | freebsd.jail.allow.mlock |
| allow.nfsd | freebsd.jail.allow.nfsd |
| allow.reserved_ports | freebsd.jail.allow.reservedPorts |
| allow.unprivileged_proc_debug | - |
| allow.suser | freebsd.jail.allow.suser |
| allow.mount.* | see below |

The jail name is set to the create command's `container-id` argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to require this? I can imagine a runtime wanting to manipulate the name to some degree, such as a prefix or suffix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I need to be able to locate the jail outside the runtime - this is needed to read metrics from the container using netstat and rctl. The spec states that id MUST be unique across all containers on the host so I don't see any downside to use the id for consistently naming the jails.


The `devfs_ruleset` parameter is only required for jails which create new `devfs` mounts - typically OCI runtimes will mount `devfs` on the host. The value is a rule set number - these rule sets are defined on the host, typically via /etc/defaults/devfs.rules and /etc/default/devfs.rules or using the `devfs` command line utility.

The `children.max` parameter is managed by the OCI runtime e.g when a new container shares namespaces with an existing container.

The `allow.mount.*` parameter set is extensible - this proposal suggests representing allowed mount types as an array. As with `devfs`, typically the OCI runtime will manage mounts for the container by performing mount operations on the host.
dfr marked this conversation as resolved.
Show resolved Hide resolved

Jail parameters not supported by this runtime extension are marked with "-". These parameters will have their default values - see the jail(8) man page for details.

### Example

An example config for a container with its own host and network namespaces. This
container is allowed to see its own mounts and can use raw
sockets. In addition to the minimal set of devices in the container devfs,
`/dev/pf` is exposed, allowing the container to manage firewall rules etc. in its
network namespace.

```json
{
"ociVersion": "1.1.0",
"hostname": "mycontainer",
"process": {
"cwd": "/",
"env": ["PATH=/bin:/sbin:/usr/bin:/usr/sbin"],
"args": ["freebsd-version"]
dfr marked this conversation as resolved.
Show resolved Hide resolved
}
"mounts": [
{
"destination": "/dev",
"options": ["ruleset=4"],
"source": "devfs",
"type": "devfs"
},
{
"destination": "/dev/fd",
"source": "fdesc",
"type": "fdescfs"
}
],
"root": {
"path": "/path/to/container/root"
},
"freebsd": {
"devices": [
{
"path": "pf",
"mode": "0700",
"unhide": true
}
],
"jail": {
"host": "new",
"vnet": "new",
"enforceStatfs": 1,
"allow": {
"rawSockets": true,
"chflags": true
}
}
}
}
```

This example shows a config for a container which is allowed to mount new tmpfs
instances:

```json
{
"ociVersion": "1.1.0",
"hostname": "mycontainer",
"process": {
"cwd": "/",
"env": ["PATH=/bin:/sbin:/usr/bin:/usr/sbin"],
"args": ["freebsd-version"]
}
"mounts": [
{
"destination": "/dev",
"options": ["ruleset=4"],
"source": "devfs",
"type": "devfs"
},
{
"destination": "/dev/fd",
"source": "fdesc",
"type": "fdescfs"
}
],
"root": {
"path": "/path/to/container/root"
},
"freebsd": {
"jail": {
"host": "new",
"vnet": "new",
"enforceStatfs": 1,
"allow": {
"rawSockets": true,
"chflags": true,
"mount": [
"tmpfs"
dfr marked this conversation as resolved.
Show resolved Hide resolved
]
}
}
}
}
```
Loading