Incus OS: The Atomic‑Update, Ceph‑Backed Virtualization Platform That’s Quietly Becoming an Enterprise Contender + Video

Listen to this Post

Featured Image

Introduction:

Incus OS is redefining what an immutable virtualization platform can do. Unlike traditional hypervisors or container runtimes that treat storage and OS updates as afterthoughts, Incus OS embeds a dual‑partition A/B update mechanism and native CephFS distributed storage directly into its API‑driven architecture. For infrastructure engineers, this means zero‑downtime cluster upgrades and stateful workload portability without ever touching a shell – a paradigm shift that moves Incus OS from a hobbyist tool to a serious contender for production DevOps environments.

Learning Objectives:

  • Master the A/B atomic update workflow in Incus OS, including version interrogation, policy tuning, and manual failover triggering.
  • Deploy and attach a CephFS shared filesystem to Incus instances using only REST API calls, enabling live migration and node‑failure resilience.
  • Orchestrate a rolling cluster upgrade – node by node – while maintaining quorum and service availability.
  • Troubleshoot common pitfalls: UID mapping for unprivileged containers, Ceph client installation, and automatic rollback behavior.
  1. A/B Atomic Updates: How Incus OS Eliminates Upgrade Anxiety

Incus OS doesn’t do incremental package updates. Instead, it keeps two side‑by‑side system partitions – call them A and B. At any given time, only one is active (the booted partition). When a new version arrives, it is written to the inactive slot without touching the running system. The next reboot flips the active flag to that freshly populated partition. This design gives you two ironclad guarantees:

  • Atomicity – The node runs on the old version until the reboot; there is no half‑upgraded, inconsistent state.
  • Instant rollback – The old version remains intact on the alternate partition. If the new kernel panics or fails to boot, the bootloader automatically falls back to the known‑good slot.

Step‑by‑step: reading versions, triggering an update, and performing a controlled reboot

All administration is done via Incus’ REST API (prefixed with /os) using the `incus query` command. Each node exposes its own local OS API – unlike the classic Incus API which has a cluster‑wide view.

1.1 Check the current A/B state of a node

incus query node1:/os/1.0/

Look for three critical fields:

– `os_version` – the active version (the one the node booted on).
– `os_version_alternate` – what sits on the inactive partition.
– `os_version_next` – the version that will become active after the next reboot.

If `os_version_next` differs from os_version, a new image has already been downloaded and staged on the inactive slot, waiting for a reboot.

1.2 View and modify the update policy

The `system/update` endpoint shows the current configuration and state:

incus query node1:/os/1.0/system/update

Key fields:

– `state.needs_reboot` – `true` means an update is installed and pending.
– `config.auto_reboot` – `false` is the recommended default for clusters (you orchestrate reboots manually).
– `config.channel` – `stable` (default, weekly security updates) or `testing` (daily builds).
– `config.check_frequency` – defaults to every 6 hours.

To change the policy – e.g., switch a lab node to `testing` with a daily check:

incus query node1:/os/1.0/system/update -X PUT -d '{
"config": {
"channel": "testing",
"check_frequency": "24h",
"auto_reboot": false
}
}'

1.3 Force a manual update check

Without waiting for the scheduled cycle, trigger an immediate check:

incus query node1:/os/1.0/system/update -X POST -d '{}'

The response is empty on success. Query `system/update` again – if a newer image exists, `needs_reboot` will flip to `true` and `os_version_next` will reflect the new version.

1.4 Perform the atomic reboot

Once `needs_reboot` is true, initiate the reboot via API:

incus query node1:/os/1.0/system/:reboot -X POST

The node reboots onto the inactive partition. After a few seconds, re‑query the OS root – the version fields will have swapped:

| Field | Before reboot | After reboot |

|-||–|

| `os_version` (active) | 202607010319 | 202607011621 |

| `os_version_alternate` (inactive) | 202607011621 | 202607010319 |

| `needs_reboot` | true | false |

1.5 Rolling cluster upgrade – one node at a time

In a three‑node cluster, the database quorum requires at least two members online. The golden rule: reboot one node, wait for it to fully return, then move to the next.

  • Check that all nodes show needs_reboot: true.
  • Reboot a non‑leader node first. During its downtime, the other two maintain quorum.
  • Confirm the rebooted node’s version swap and cluster reintegration.
  • Repeat for the remaining nodes, saving the database leader for last. When the leader reboots, another node automatically takes over the `database-leader` role – the cluster API remains available through the other online members.

If a boot fails, the bootloader automatically reverts to the alternate partition – no manual intervention required. The node comes back on the old image, and `needs_reboot` stays `false` until a healthy version is proposed.

2. CephFS Distributed Storage: Making Instances Truly Portable

In a cluster, an instance can be restarted or moved to any node. But if its data lives on a node’s local disk, it doesn’t follow. The answer is shared storage – a volume accessible from all nodes simultaneously. CephFS provides exactly that: a distributed filesystem that multiple clients can mount in read‑write mode. Two instances on different nodes see and modify the same files – the equivalent of a Kubernetes `ReadWriteMany` (RWX) volume.

Step‑by‑step: attaching CephFS to Incus OS entirely via API

Prerequisites:

  • An operational Incus OS cluster (see the headless setup guide).
  • A Ceph cluster with CephFS already created (named `labfs` in this guide) and a dedicated user (e.g., client.incus).
  • A trusted Incus client with a remote configured for each node.

2.1 Install the Ceph client add‑on

Incus OS does not bundle the Ceph client by default. You must install the `incus-ceph` add‑on, which adds the Ceph binaries to the immutable system.

For each node:

incus query node1:/os/1.0/applications -X POST -d '{"name": "incus-ceph"}'

Verify the installation:

incus query node1:/os/1.0/applications

The list should contain `/os/1.0/applications/incus-ceph` alongside `/os/1.0/applications/incus`.

2.2 Declare the Ceph cluster via the `ceph` service

Provide three pieces of information:

  • The Ceph cluster FSID (ceph fsid).
  • At least one monitor address (ceph mon dump).
  • The keyring for the dedicated user (ceph fs authorize labfs client.incus / rw).

Apply the configuration with a `PUT` on `services/ceph` for each node:

incus query node1:/os/1.0/services/ceph -X PUT -d '{
"clusters": {
"ceph": {
"fsid": "a1b2c3d4-...",
"monitors": ["192.168.10.10:6789"],
"keyrings": {
"incus": "AQB...=="
}
}
}
}'

Critical: The key in the `keyrings` object is the username alone (incus), not client.incus.

From this configuration, Incus OS automatically writes `/etc/ceph/ceph.conf` and the keyring file.

2.3 Attach the CephFS to an instance

Use a `disk` device whose `source` starts with cephfs:. Specify the filesystem, a sub‑path, the Ceph user, and the cluster name declared above:

incus config device add my-instance data disk \
source=cephfs:labfs/incusdata \
ceph.user_name=incus \
ceph.cluster_name=ceph \
path=/data

The instance now mounts the CephFS on /data. From inside the instance, `df -T` will show type `ceph` with the FSID in the mount name – confirming that the kernel CephFS client is active and `cephx` authentication succeeded.

2.4 Fixing write permissions for unprivileged containers

An unprivileged container often hits a `Permission denied` on write. The root cause is Incus’ UID mapping: the container’s root (UID 0) is projected onto the host – and thus onto CephFS – as UID `1000000` by default. If the CephFS root belongs to `root` (UID 0), this shifted UID has no rights.

The clean solution: prepare a dedicated subdirectory owned by the mapped UID, from a client that already mounts the CephFS as admin:

 On a Ceph admin client that mounts labfs
mkdir /mnt/labfs/incusdata
chown 1000000:1000000 /mnt/labfs/incusdata

Then point the disk device to that sub‑path (source=cephfs:labfs/incusdata as above). The container can now write without errors, and files appear on Ceph with owner 1000000.

2.5 Verify shared access across nodes

Drop a file from one instance – it is immediately visible from another instance on a different node, and also from the Ceph side. This coherence is what enables an instance restarted on another node to retrieve its data intact, provided you attach the same disk device in its profile or configuration.

3. Troubleshooting Common Pitfalls

| Symptom | Probable Cause | Solution |

|||-|

| `errno 95 / no keyring found` on mount | Keyring named `client.incus` instead of `incus` | Correct the `PUT` on the `ceph` service – the key in `keyrings` is the username alone |
| `ceph` service returns `not found` | `incus-ceph` add‑on missing on the node | Install the application via `POST /os/1.0/applications` |
| Permission denied on write from container | Root UID mapped to 1000000 | Create a subdirectory `chown 1000000` and point the device to that sub‑path |
| `error connecting to the cluster` | Monitor unreachable from the node | Check network routing and the address in `monitors` |
| `needs_reboot` stays `false` after manual check | No newer version on the channel, or `check_frequency` set to `never` | Verify `config.channel` and force a manual `:check` |
| `PUT` on `system/update` returns `invalid update check frequency` | Incomplete body – `check_frequency` empty | Send a complete `config` object with all three fields |
| Cluster goes `OFFLINE` on multiple nodes | Too many nodes rebooted simultaneously – quorum lost | Reboot one member at a time; wait for each to recover before proceeding |

  1. Why This Matters for DevOps and SRE Teams

Incus OS is not just another virtualization layer – it’s a platform brick that solves two of the most painful problems in infrastructure management:

  • Upgrades become boring – A/B partitions with automatic rollback remove the fear of “what if the update breaks something?” You can stage updates during business hours and reboot nodes one by one with zero customer impact.
  • Storage follows workloads – With CephFS, your instances become truly portable. Node failures no longer mean data loss or tedious manual recovery. The shared volume model aligns perfectly with cloud‑native patterns (RWX persistent volumes) while keeping the operational simplicity of a hypervisor.

The fact that everything is exposed via a clean REST API (no shell access required) makes Incus OS inherently automatable – Terraform, Ansible, or custom operators can drive the entire lifecycle without ever needing SSH.

What Undercode Say:

  • Atomic updates are a game‑changer for cluster reliability. The dual‑partition design with automatic boot‑time rollback eliminates the single biggest risk in OS patching. Combined with the ability to stage updates without rebooting, it enables truly “boring” operations that SRE teams dream of.
  • CephFS integration via API is production‑ready. The ability to attach distributed storage without manual `mount` commands or shell scripts brings Incus OS closer to Kubernetes’ storage abstraction. The UID mapping quirk is well‑documented and easily mitigated – a small price for the portability gain.

Analysis: Incus OS is evolving beyond its LXC roots into a serious infrastructure platform. The architectural coherence – immutable OS, API‑first design, and native distributed storage – positions it as a credible alternative to Proxmox for small‑to‑medium clusters, and even as a lightweight competitor to OpenStack for edge or on‑premise deployments. The main barrier remains ecosystem maturity (monitoring, backup integrations, etc.), but the foundation is solid. For DevOps teams already comfortable with Ceph and REST APIs, Incus OS offers a refreshingly simple yet powerful model that reduces operational toil significantly.

Prediction:

  • +1 Incus OS will gain mainstream adoption within 18–24 months, particularly among SRE teams running on‑premise or edge infrastructure who need Kubernetes‑like storage semantics without the complexity of a full container orchestration platform.
  • +1 The API‑first, no‑shell architecture will drive a new wave of GitOps‑style operators for Incus OS, making cluster upgrades and storage provisioning fully declarative and auditable.
  • -1 Without a commercial backing or a large vendor champion, Incus OS risks remaining a “hidden gem” – adoption will grow but may plateau unless a managed service or enterprise support offering emerges.
  • +1 The A/B update mechanism will become the de facto standard for immutable OSes in the virtualization space, influencing other projects (e.g., Proxmox, Xen) to adopt similar dual‑partition strategies.
  • -1 The CephFS integration, while elegant, requires a separate Ceph cluster – a significant operational overhead for smaller teams. This may limit adoption to organizations already invested in Ceph, unless Incus OS later adds support for simpler shared storage backends (NFS, GlusterFS).

▶️ Related Video (82% Match):

🎯Let’s Practice For Free:

🎓 Live Courses & Certifications:

Join Undercode Academy for Verified Certifications

🚀 Request a Custom Project:

Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands

IT/Security Reporter URL:

Reported By: Stephanerobert1 Incus – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeTesting & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky