Quotas in LXC unprivileged container

Getting filesystem quotas work inside of a LXC unprivileged container is relatively a big problem, because linux kernel and LXC currently play against us (= they do not support this yet). But, workaround exists !

quotactl() syscall requires CAP_SYS_ADMIN privilege

  • Easiest option here, is to give up and switch to privileged container and loose all the security advantages of an unprivileged container 😉
  • Or, we can provide this CAP only to some commands by doing ssh to host and then executing required command inside of container using lxc-attach -s MOUNT as partially privileged
  • This way we effectively grind a hole to our unprivileged container. It’s up on you to make it relatively small and tightly controlled (mount checks, hash sums, post-run-checks etc..).
  • For this I’m using set of binaries/scripts/keys/… that are mapped to container from the host using lxc.mount.entry with bind option.
  • Sorry, but because of old-but-good security by obscurity, I’m not providing you a complete solution here – only hints. But, a good admin will surely figure it out 😉

LXC usually maps UIDs/GIDs to 100000 and up range

  • After workarounding quotactl(), we can set quotas on the filesystem, but we are now setting quotas for container mapped UIDs, that are on filesystem present as unmapped UIDs – say UID=5000 in container is on disk stored as UID=105000
  • So we set quotas, but we are not able to enforce them, because users don’t match
  • To fix this, we have to partially change the way the UIDs and GIDs are mapped, like this
    lxc.id_map =
    lxc.id_map = u 0 100000 999
    lxc.id_map = g 0 100000 999
    lxc.id_map = u 1000 1000 64536
    lxc.id_map = g 1000 1000 64536
  • We map IDs within 0-999 (usually system users) to range 100000-100999, thus avoid security problems related to having real UID=0 withing container
  • All remaining IDs stay mapped to same IDs, so we can track quota for this users
  • After fiddling with this mappings you have to re-chown all affected files (with uid or gid > 999) before booting container again.

This workaround works for me on xfs filesystem, without any problems.