Using Go to set UID and GIDs

One of the things you couldn't do in Go, under Linux, prior to the release of Go 1.16 (2021), was call syscall.Setuid() and have it work reliably to drop the privilege of a setuid-root Go program. While the missing functionality was reported in 2011, it was an attempt to create a native port of libcap in Go, in 2019, that revealed this fact to libcap's author. (You might actually be thinking that you know that you could use CGo to do it with an assist from glibc, but then there was this bug which meant that didn't actually work reliably...)

Note: If you are compiling with Go 1.16+ under Linux, you can write code to use syscall.Setuid() etc. now to drop setuid-root privilege. (This works in native Go and, via the glibc's nptl:setxid mechanism, when using CGo.) If that is all you need, end of story.

However, if you are using an earlier version of Go and/or want to also fully manipulate process capabilities the discussion and examples on this page are for you!

The examples on this page assume (from time of writing) a go1.15 or later Go tool-chain. These examples, however, should all work for earlier versions of Go -- go1.11.13 or later -- so long as you first do this:

Work around a Go module build filter

$ export CGO_LDFLAGS_ALLOW="-Wl,-?-wrap[=,][^-.@][^,]*"

Why you need this workaround is the subject of this bug and it is fixed in Go builds including and more recent than go1.15rc1.

Let's get started. First we need some code, let's download it and build it (see above for the CGO_LDFLAGS_ALLOW=... workaround you might need):

Getting and building the setid source code

$ mkdir foo

$ cd foo

$ wget https://git.kernel.org/pub/scm/libs/libcap/libcap.git/plain/goapps/setid/setid.go

$ go mod init setid

$ go mod tidy

$ go build setid

$ ./setid --caps=false

before capability state: "="

before gid:

/proc/4165/status Gid: 1000 1000 1000 1000

/proc/4166/status Gid: 1000 1000 1000 1000

/proc/4167/status Gid: 1000 1000 1000 1000

/proc/4168/status Gid: 1000 1000 1000 1000

/proc/4169/status Gid: 1000 1000 1000 1000

before uid:

/proc/4165/status Uid: 1000 1000 1000 1000

/proc/4166/status Uid: 1000 1000 1000 1000

/proc/4167/status Uid: 1000 1000 1000 1000

/proc/4168/status Uid: 1000 1000 1000 1000

/proc/4169/status Uid: 1000 1000 1000 1000

after capability state: "="

after gid:

/proc/4165/status Gid: 1000 1000 1000 1000

/proc/4166/status Gid: 1000 1000 1000 1000

/proc/4167/status Gid: 1000 1000 1000 1000

/proc/4168/status Gid: 1000 1000 1000 1000

/proc/4169/status Gid: 1000 1000 1000 1000

after uid:

/proc/4165/status Uid: 1000 1000 1000 1000

/proc/4166/status Uid: 1000 1000 1000 1000

/proc/4167/status Uid: 1000 1000 1000 1000

/proc/4168/status Uid: 1000 1000 1000 1000

/proc/4169/status Uid: 1000 1000 1000 1000

$

This program, invoked this way, performs a rather elaborate no-op. When you invoke this program the numbers you see displayed will most likely be different from the above, but the pattern of which numbers are the same is expected to match. One thing to look out for is that the program may, at different times, run different number of threads (as counted by each of the /proc/... lines above) - with Go the number of threads is not expected to be a constant.

At a high level, the program starts by displaying the "before" state for the full Go runtime at the start of execution of the program in terms of: capabilities, UIDs (there are 4 of them) and GIDs (there are 4 of them too). The program is able to set supplementary groups too, but for now this walk through ignores that functionality. The program then attempts to set the UID/GIDs and finally summarizes the new state of the Go runtime. This second "after" summary process also checks the output against what was being attempted and will cause the program to log an error if there is a mismatch.

A Go/CGo runtime is a single process with some number of threads (aka pthreads). The output shows all of the UID/GID values (the final "after" values are all equal to the requested ones if the ID changing attempt succeeds) for each of the threads. A Go program cleverly executes its code on all of the threads, somewhat at random. However, the Linux kernel associates UID/GIDs with individual threads. So, for a Go program to sanely change {G,U}IDs, it somehow needs to keep all the threads in sync. This synchronization currently (prior to go1.16) needs an assist (see the top of the page and pointers). Suffice it to say, the setid program uses this assist.

The --caps=false argument instructs the program to not attempt to raise capabilities, and instead it takes the psx.Syscall3() path, and skips any attempt to set the supplementary groups. As such, nothing it tries requires any privilege to succeed and this is how the program reaches the end with no errors.

In the examples below, the output will be different, and if something goes wrong and the program reports an unexpected error - and you double check that you have typed the correct things... You may have found a bug! Please report it (hint: see the three-line drop-down in the top left of this page).

The remainder of this page is devoted to the various different ways you can run this program to actually enable it to change {U,G}IDs. There are surprisingly many. We'll start with the fully capable way.

Note: The Fully Capable way to change UID/GIDs involves setting up the binary, that performs these operations, to have a file-capability. To do this we will use the setcap binary, which we assume is installed on your system in the directory /sbin/. If that command is not to be found in that location, try omitting the /sbin/ prefix in the examples that follow. If that also fails, consult the documentation for your Linux distribution for a way to install something like libcap, libcap2 or libcap*-bin first.)


$ sudo /sbin/setcap cap_setuid,cap_setgid=p ./setid

$ ls -l ./setid

-rwxr-xr-x 1 morgan morgan 2465488 Jul 7 21:40 ./setid

$ /sbin/getcap ./setid

./setid cap_setgid,cap_setuid=p

$ ./setid --uid=1 --gid=2

before capability state: "cap_setgid,cap_setuid=p"

before gid:

/proc/4363/status Gid: 1000 1000 1000 1000

/proc/4364/status Gid: 1000 1000 1000 1000

/proc/4365/status Gid: 1000 1000 1000 1000

/proc/4366/status Gid: 1000 1000 1000 1000

/proc/4367/status Gid: 1000 1000 1000 1000

before uid:

/proc/4363/status Uid: 1000 1000 1000 1000

/proc/4364/status Uid: 1000 1000 1000 1000

/proc/4365/status Uid: 1000 1000 1000 1000

/proc/4366/status Uid: 1000 1000 1000 1000

/proc/4367/status Uid: 1000 1000 1000 1000

after capability state: "="

after gid:

/proc/4363/status Gid: 2 2 2 2

/proc/4364/status Gid: 2 2 2 2

/proc/4365/status Gid: 2 2 2 2

/proc/4366/status Gid: 2 2 2 2

/proc/4367/status Gid: 2 2 2 2

after uid:

/proc/4363/status Uid: 1 1 1 1

/proc/4364/status Uid: 1 1 1 1

/proc/4365/status Uid: 1 1 1 1

/proc/4366/status Uid: 1 1 1 1

/proc/4367/status Uid: 1 1 1 1

This program is launched with just the permitted (p) capabilities it needs to transition to uid=1 and gid=2 and drops those capabilities before summarizing the final state. The important thing to note is all of the threads share the new UID and GID values. If you observe something else, with this first example, you have found a bug and should report it. At worst, we'll be able to improve the text of this walk through if it isn't an actual bug in some Go/CGo code.

Next, we'll run the code in the old-fashioned setuid-root way.

Changing IDs via setuid-root

$ sudo chown root.root ./setid

$ sudo chmod +s ./setid

$ ls -l ./setid

-rwsr-sr-x 1 root root 2465488 Jul 7 21:40 ./setid

$ ./setid --uid=3 --gid=4 --caps=false

before capability state: "=ep"

before gid:

/proc/4419/status Gid: 1000 0 0 0

/proc/4420/status Gid: 1000 0 0 0

/proc/4421/status Gid: 1000 0 0 0

/proc/4422/status Gid: 1000 0 0 0

/proc/4423/status Gid: 1000 0 0 0

before uid:

/proc/4419/status Uid: 1000 0 0 0

/proc/4420/status Uid: 1000 0 0 0

/proc/4421/status Uid: 1000 0 0 0

/proc/4422/status Uid: 1000 0 0 0

/proc/4423/status Uid: 1000 0 0 0

after capability state: "="

after gid:

/proc/4419/status Gid: 4 4 4 4

/proc/4420/status Gid: 4 4 4 4

/proc/4421/status Gid: 4 4 4 4

/proc/4422/status Gid: 4 4 4 4

/proc/4423/status Gid: 4 4 4 4

after uid:

/proc/4419/status Uid: 3 3 3 3

/proc/4420/status Uid: 3 3 3 3

/proc/4421/status Uid: 3 3 3 3

/proc/4422/status Uid: 3 3 3 3

/proc/4423/status Uid: 3 3 3 3

$

Note, we've invoked the program this time with the argument --caps=false. This is to ensure that the code takes the psx.Syscall3() path, and not use the capability mechanism to change UID. The code generally defaults to always dropping any capabilities that might be present before displaying the final summary, but in this setuid-root case, because we are relying on the legacy method to change UID, the kernel fixes up our capability state to be empty once all the UID values are non-zero. You should verify the equivalent result if you invoke this setuid-root binary with these arguments: "./setid --uid=3 --gid=4 --caps=false --drop=false". Something else that is different about this case is that the initial state has "all" the capabilities raised ("=ep"). In the fully capable example we did before, the program started off with less privilege.

Be warned that the program retains all of its privilege if you invoke this program as follows: "./setid --uid=3 --gid=4 --drop=false". Which is to say that .../libcap/cap knows how to change IDs, without losing privilege as a silent side-effect, and allows the program to do other privileged things afterwards: capabilities represent an explicit privilege handling model and not the subtle, hard to reason about, legacy privilege model based around the [e]uid=0 identity.

Next, we'll use capability inheritance (the Fully Capable way) to, well, inherit privilege. To get this right, we'll start over building the binary:

Instructing a program to inherit privilege

$ rm -f ./setid

$ go build setid

$ ls -l ./setid

-rwxrwxr-x. 1 morgan morgan 2502184 Jul 24 20:07 ./setid

$ sudo /sbin/setcap cap_setuid,cap_setgid=i ./setid

$ ./setid --uid=5 --gid=6

before capability state: "="

before gid:

/proc/26200/status Gid: 1000 1000 1000 1000

/proc/26201/status Gid: 1000 1000 1000 1000

/proc/26202/status Gid: 1000 1000 1000 1000

/proc/26203/status Gid: 1000 1000 1000 1000

/proc/26204/status Gid: 1000 1000 1000 1000

/proc/26205/status Gid: 1000 1000 1000 1000

before uid:

/proc/26200/status Uid: 1000 1000 1000 1000

/proc/26201/status Uid: 1000 1000 1000 1000

/proc/26202/status Uid: 1000 1000 1000 1000

/proc/26203/status Uid: 1000 1000 1000 1000

/proc/26204/status Uid: 1000 1000 1000 1000

/proc/26205/status Uid: 1000 1000 1000 1000

2020/07/24 20:10:39 group setting failed: operation not permitted

That didn't quite work. We can see what was wrong from the perspective of the program, in that it says its before capabilities were "=" (also known as no privilege). Given this, when it comes time to raise privilege to change the process' GID/UIDs, the program finds it can't and fails with an error. What is different about the way we set up the program from the previous time we gave it file capabilities is that here we gave it "=i" (inheritable, also referred to as optional, capabilities) where as previously we had given it "=p" (permitted, also referred to as forced, capabilities). Clearly, inheritable capabilities work differently!

There are some tools besides /sbin/setcap installed on each Linux system that will be handy for us to explore and setup a method for the program to inherit capabilities. The simplest one is /sbin/getpcaps which can read the capabilities of any running process:

Exploring inheritable capabilities

$ /sbin/getpcap 1

1: =ep

$ /sbin/getpcaps $$

16848: =

What this reveals is that the init process (the first, numbered 1, process started by the kernel upon boot, which might actually be named systemd on your system) runs with all of the named capabilities of the system in its permitted and effective flags. Your process ($$ is bash's shorthand for the running process), has a completely empty capability set. Another tool that we can use to display capabilities for the current process is /sbin/capsh which also has some features that we can use to manipulate the inheritable dimension of the capabilities of the process away from being empty:

Raising inheritable capabilities

$ /sbin/capsh --print | grep Current:

Current: =

$ sudo /sbin/capsh --inh=cap_setuid,cap_setgid --user=$(whoami) == --print | grep Current:

Current: =ep cap_setgid,cap_setuid+i

$ sudo /sbin/capsh --user=$(whoami) --caps="cap_setuid,cap_setgid=i" == --print | grep Current:

Current: cap_setgid,cap_setuid=i

$ sudo /sbin/capsh --inh=cap_setuid,cap_setgid --user=$(whoami) -- -c "./setid --uid=5 --gid=6"

before capability state: "cap_setgid,cap_setuid=ip"

before gid:

/proc/28502/status Gid: 1000 1000 1000 1000

/proc/28503/status Gid: 1000 1000 1000 1000

/proc/28504/status Gid: 1000 1000 1000 1000

/proc/28505/status Gid: 1000 1000 1000 1000

/proc/28506/status Gid: 1000 1000 1000 1000

/proc/28507/status Gid: 1000 1000 1000 1000

before uid:

/proc/28502/status Uid: 1000 1000 1000 1000

/proc/28503/status Uid: 1000 1000 1000 1000

/proc/28504/status Uid: 1000 1000 1000 1000

/proc/28505/status Uid: 1000 1000 1000 1000

/proc/28506/status Uid: 1000 1000 1000 1000

/proc/28507/status Uid: 1000 1000 1000 1000

after capability state: "="

after gid:

/proc/28502/status Gid: 6 6 6 6

/proc/28503/status Gid: 6 6 6 6

/proc/28504/status Gid: 6 6 6 6

/proc/28505/status Gid: 6 6 6 6

/proc/28506/status Gid: 6 6 6 6

/proc/28507/status Gid: 6 6 6 6

after uid:

/proc/28502/status Uid: 5 5 5 5

/proc/28503/status Uid: 5 5 5 5

/proc/28504/status Uid: 5 5 5 5

/proc/28505/status Uid: 5 5 5 5

/proc/28506/status Uid: 5 5 5 5

/proc/28507/status Uid: 5 5 5 5

There are three ways of invoking /sbin/capsh shown above, the last of which chain-loads and successfully runs our setid program.

The first invocation is very similar to our use of /sbin/getpcaps, however, we use the --print argument and grep to pick out the line that shows the capabilities of the running process. (Versions of libcap-2.52+ include a capsh that supports the --current argument which provides a simplified mechanism for displaying just this info.)

In the next line we use the --inh argument to raise the capabilities we want raised in the process' inheritable capability dimension. This is a privileged operation so we invoke the command with sudo. (More on this bit later.). The '==' command line argument then instructs capsh to re-execute itself with the sole argument --print. In this invocation we observe that the only capabilities the re-executed capsh has are process inheritable ones.

Finally, in the third command, we use the '--' argument to invoke bash in such a way that it launches our setid program which, recall, has an inheritable file capability on it. The net effect is that the process inheritable capability handshakes with the file inheritable capability and yields a permitted capability on the running setid program. In fact, the program is launched with "cap_setgid,cap_setuid=ip", and the permitted dimension of that is the needed privilege to successfully run the program.

This kind of inheritance is more sophisticated than what we might think of as simple, or naive, inheritance. It crucially requires both the process that launches a program (in our case capsh -> bash ->) to carry inheritable process capabilities and for the program being launched (in our case setid) to have matching file-inheritable capabilities. Any mismatch yields insufficient privilege to do the work of the program. This differs quite significantly from the get-root model of privilege inheritance - something that allows a user to exploit some setuid-root program and launch a shell to wield all superuser powers on the system. Capability based inheritance can't leak in this way because both the exploited program and the shell need to have matching file-inheritable capabilities to actually wield privilege. Note, in our example here, bash never actually wields any privilege and only passively carries Inheritable process capabilities.

Using sudo and /sbin/capsh to provide an inheritable capability is a contorted way to set things up for one off execution of privileged binaries through inheritance. A slightly more manageable way to do it is to use /sbin/capsh to launch a shell for interactive use as follows:

Running a shell with some inheritable capabilities

$ sudo /sbin/capsh --user=$(whoami) --inh=cap_setuid,cap_setgid --

$ whoami

morgan

$ /sbin/getpcaps $$

3128: cap_setgid,cap_setuid=i

$ exit

$ /sbin/getpcaps $$

2718: =

$

What this enables is launching an interactive shell that always provides inheritable capabilities. As discussed above, these are only useful when the user invokes a file-inheritable capable program. In spirit, this is why the /sbin/capsh program is so named. However, launching an environment with inheritable capabilities can also be done at login time, or by init (sic). In the case of login, or sshd or the window manager login, there is a Pluggable Authentication Module that the admin can configure to enable the raising of inheritable capabilities in just such a manner: pam_cap. The pam_cap module is distributed with the libcap library. You can read its source here (like Linux-PAM, it is written in C, but the capabilities it bestows are equally useful for Go programs that use the "..libcap/cap" package. Note, the IAB text representation is what is used by the pam_cap.so module for its configuration file syntax. More about that syntax below.

There are two other types of inheritable capability available under Linux. The first is anticipated by the POSIX.1e draft spec that libcap attempts to fully implement: the Bounding set. The Bounding set holds all the capabilities that an executable program can ever hope to have raised from file-Permitted capabilities, or by a program running as root. The second, the Ambient set, which is not compatible with file based capabilities. We'll show how to use the Bounding set first.

The basic property of the Bounding set is that a process can only ever drop Value bits from it. The process' fork()d and exec*()d descendants inherit the Bounding set from their parent so, once lost, process descendants can never raise these values again in their Bounding sets. That is, by doing this, a process denies any of its subsequently exec*()'d successors the ability to force raise these same capability values via a Permitted (or forced) file capability bit. The POSIX.1e draft rules, however, do not honor the Bounding set for file-Inheritable capabilities. This is because there is a parallel process Inheritable set that is also inherited, and it was reasoned that a process is free to drop those bits too if it truly does not want a descendant to have any way to obtain that capability Value. The following illustrates how this all works with our setid program.

The Bounding Set

$ /sbin/capsh --print|grep Bounding

Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,....[all the values]....

$ sudo /sbin/setcap cap_setuid,cap_setgid=pie ./setid

$ ./setid --uid=5 --gid=6

before capability state: "cap_setgid,cap_setuid=ep"

before gid:

[...all the stuff that says the program works...]

$ sudo /sbin/capsh --drop=cap_setuid --user=$(whoami) --

$ /sbin/capsh --print|grep Bounding

Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,....[all the values]....

$ ./setid

bash: ./setid: Operation not permitted

$ exit

$ sudo /sbin/capsh --inh=cap_setuid --drop=cap_setuid --user=$(whoami) --

$ ./setid --uid=5 --gid=6

before capability state: "cap_setgid,cap_setuid=eip"

before gid:

[...all the stuff that says the program works...]

$ exit

$

The above starts by using /sbin/capsh to list all of the content of the Bounding set. While the POSIX.1e draft anticipated the Bounding set (they called it X), it was not fully integrated with the text representation for capabilities so, for a long time, libcap has not known how to treat it efficiently. Instead, it simply lists all the capability Values it holds in one long comma separated list. In most cases, this is the full list, so it is really long - and gets longer every time the kernel names another capability. (More on this later.)

For this explanation we set Permitted, Inheritable and Effective file capabilities raised on the setid program. As discussed above, the program does not need the Effective bit raised, but with respect to the Bounding set, it has another attribute that we want to demonstrate.

As we've highlighted in bold, the Bounding set initially includes cap_setuid which is one of the two capability values the setid program requires to function. When we execute the capsh ... --drop=cap_setuid command, we are launching a bash shell without this capability Value raised in the Bounding set. A subsequent capsh --print demonstrates this fact.

Recall, the Bounding set denies the use of a Permitted file capability value, so it shouldn't be a surprise that the setid program fails when we attempt to invoke it in an environment where the Bounding set is denying this capability. What might be a surprise, however, is that the kernel is outright refusing to start the program, and not allowing it to run for a bit and then realize it can't do all the things it wants to. Indeed, it is bash that is explaining the error!

What is going on is that the kernel is concerned that the program, setid, is configured to be a legacy program (recall the Effective file capability is raised for legacy, capability unaware, programs). Its effect is to automatically raise all the process Effective bits the program has access to at program start. Since the program has been configured to force enable its Permitted file capabilities, but the Bounding set is preventing them all from being available, the Kernel can't square the circle and fails fast. In case you might be thinking that this seems a little paranoid you might like to read about the Sendmail vs Capabilities bug of 2000 concerning silently denying privilege to privileged programs. The tl;dr being that denying privilege should never be silent.

In short, it is the legacy file-Effective bit that is causing the program execution to be denied. If you reconfigure setid's capabilities to be simply "sudo /sbin/setcap cap_setuid,cap_setgid=pi ./setid", you will discover that the program is allowed to run and it fails when it realizes it can't raise cap_setuid. That is, the program is not a legacy program, it can recognize, itself, when it doesn't have the needed privilege and it doesn't need the kernel to protect it from this detail. Again, POSIX.1e capabilities are all about explicit privilege management.

Finally, this example demonstrates that even when the Bounding set blocks a file-Permitted capability, there is a file-Inheritable path for obtaining it. The successful example raises the process Inheritable capability Value to compensate for what the Bounding set denied, and when the setid program is run, the legacy bit is able to raise all of the Values included in the file-Permitted set. Said more formally, the kernel applies the exec*() rules: pP' = (fP&X) | (fI&PI), and confirms that pE' = fE&pP' contains all of the fP bits. So the kernel determines the program has obtained enough privilege to honor its legacy expectations. [Here, p refers to process and f refers to file. P,I,E refer to the capability flags Permitted, Inheritable and Effective, and X is the Bounding set. The apostrophe ' is used to indicate the post-exec*() state of the process.]

All of these carefully reasoned security semantics are not honored by the remaining inheritable vector of capability values - the Ambient set. This is a non-file capability based way of providing a mini-root mechanism of privilege inheritance. One that is somewhat easier to understand, but subject to the privilege leakage problem (discussed above, and periodically in CERT advisories), and privilege denial issues (see Sendmail discussion above). That is, classes of issues that motivated the POSIX.1e initiative in the first place.

Before we explain how the Ambient set works, we should first introduce the libcap concept of the IAB tuple of inheritable capabilities: a mostly briefer way of textually representing all three of the inheritable capability vectors.

The IAB abstraction:

$ /sbin/capsh --print|grep IAB

Current IAB:

$ sudo /sbin/capsh --inh=cap_setuid --drop=cap_setuid --print|grep IAB

Current IAB: !%cap_setuid

$

The semantics of the three inheritable capability vectors are unique. Let's take them in the order B, I and A.

  • B is a representation of the Bounding vector, but the two's-complement of it. That is, the block-nothing B value is empty (or all zero). B is inherited unchanged by a program through the exec*() process. Every capability value raised in this vector is a Permitted file-capability denied to the permitted bits of the exec*()d program, and once raised cannot be dropped.

  • I is the same I that is inheritable in the regular cap.Set. It is fully inherited through exec*() and can interact with the exec*()d program's Inheritable file capabilities to activate P' capabilities in the resulting process.

  • A is the ambient vector, which is bounded by (I&P) of the running program, and used to force raise P bits in the exec*()'d program. No A capabilities survive the exec*() of a program with any file capability attribute. So called, Ambient inheritance, is ultimately a kernel supported path for a process to directly pass on P bits from a pre-exec*() process to the post-exec*() process.

The text representation of an IAB tuple is a list of comma separated capability values, each prefixed by none or more of the vector component characters ! (for B), % (for I) and ^ (for A). Because A cannot be a super-set of I, ^ is actually equivalent to %^. Further, since the most common form of inheritance has B and A empty, the IAB text representation of just Inheritable capabilities is simply a comma separated list of "cap_foo,cap_..." values. Indeed, the only time you need to use the % prefix is when you want to block a capability value in the B vector but allow it to be inherited through the I vector: "!%cap_foo".

Note: the IAB representation is used as the configuration syntax for the pam_cap.so Linux-PAM Capability module - a module that grants inheritable capabilities to complement the general authentication programs under Linux.

Given all this we can now show how the Ambient capability vector can be used to directly pass permitted capabilities through exec:

$ sudo /sbin/capsh --cap-uid=1 --iab='^cap_chown' --print | grep Current

Current: =p cap_chown+i

Current IAB: ^cap_chown

$ sudo /sbin/capsh --cap-uid=1 --iab='^cap_chown' == --print | grep Current

Current: cap_chown=eip

Current IAB: ^cap_chown

$ sudo /sbin/capsh --cap-uid=1 --iab='^cap_chown' == == --print | grep Current

Current: cap_chown=eip

Current IAB: ^cap_chown

The first command above shows the capability state of a process as we set it up directly using capsh. Setting the UID value to 1 (using the --uid argument) would drop all the capabilities of the process, but capsh has a command line option (--cap-uid) that knows how to use capabilities to avoid that happening. That is, it can change UID but keep the process' permitted capabilities.

Given this setup example, the next command line does the same setup and then, via ==, re-exec*()s the capsh binary. In this re-executed process we --print the surviving capabilities. These capabilities are a little different. The IAB values survive unchanged, however, in the re-exec*()d process the permitted capabilities are now limited to cap_chown (because this is the only configured Ambient-Inheritable one) and the effective capability is raised for it too. The reason the effective one is force raised with this kind of capability inheritance was imagined as a way for capability unaware programs to naively inherit a subset of root's full power. (As explained above, this is fragile, because these legacy programs can't tell when they have insufficient privilege...)

The final example, above, uses == == to re-exec*() capsh twice in a row. This example is just to demonstrate that the Ambient Inheritable capabilities are naively inherited no matter how many times a process exec*()s another file.

This is all true except if the file being exec*()d has a file capability - even an empty one:

$ cp /sbin/capsh ./tcapsh

$ /sbin/getcap -v ./tcapsh

./tcapsh

$ sudo /sbin/capsh --cap-uid=1 --iab='^cap_chown' -- -c "./tcapsh --print" | grep Current

Current: cap_chown=eip

Current IAB: ^cap_chown

$ sudo /sbin/setcap = ./tcapsh

$ /sbin/getcap -v ./tcapsh

./tcapsh =

$ ./tcapsh --print

Current: =

Current IAB:

$ sudo /sbin/capsh --cap-uid=1 --iab='^cap_chown' -- -c "./tcapsh --print" | grep Current

Current: cap_chown+i

Current IAB: cap_chown

$ rm ./tcapsh

In this case, before we add a file capability, capsh can invoke tcapsh as expected. However, the presence of a file capability on ./tcapsh ("=" means force capabilities to empty) the ambient inheritance mechanism is disabled by the kernel and the Ambient vector is cleared through exec*().

So, given all that, we can now explain how to run our setid program using Ambiently inherited capabilities:

$ sudo /sbin/setcap -r ./setid

$ sudo /sbin/capsh --iab='^cap_setuid,^cap_setgid' --user=$(whoami) -- -c "./setid --gid=6 --uid=7"

before capability state: "cap_setgid,cap_setuid=i"

before gid:

/proc/254/status Gid: 1000 1000 1000 1000

/proc/255/status Gid: 1000 1000 1000 1000

/proc/256/status Gid: 1000 1000 1000 1000

/proc/257/status Gid: 1000 1000 1000 1000

/proc/258/status Gid: 1000 1000 1000 1000

/proc/259/status Gid: 1000 1000 1000 1000

before uid:

/proc/254/status Uid: 1000 1000 1000 1000

/proc/255/status Uid: 1000 1000 1000 1000

/proc/256/status Uid: 1000 1000 1000 1000

/proc/257/status Uid: 1000 1000 1000 1000

/proc/258/status Uid: 1000 1000 1000 1000

/proc/259/status Uid: 1000 1000 1000 1000

2021/09/03 16:00:26 group setting failed: operation not permitted

$ sudo /sbin/capsh --user=$(whoami) --iab='^cap_setuid,^cap_setgid' -- -c "./setid --gid=6 --uid=7"

before capability state: "cap_setgid,cap_setuid=eip"

before gid:

/proc/262/status Gid: 1000 1000 1000 1000

/proc/263/status Gid: 1000 1000 1000 1000

/proc/264/status Gid: 1000 1000 1000 1000

/proc/265/status Gid: 1000 1000 1000 1000

/proc/266/status Gid: 1000 1000 1000 1000

/proc/267/status Gid: 1000 1000 1000 1000

before uid:

/proc/262/status Uid: 1000 1000 1000 1000

/proc/263/status Uid: 1000 1000 1000 1000

/proc/264/status Uid: 1000 1000 1000 1000

/proc/265/status Uid: 1000 1000 1000 1000

/proc/266/status Uid: 1000 1000 1000 1000

/proc/267/status Uid: 1000 1000 1000 1000

after capability state: "="

after gid:

/proc/262/status Gid: 6 6 6 6

/proc/263/status Gid: 6 6 6 6

/proc/264/status Gid: 6 6 6 6

/proc/265/status Gid: 6 6 6 6

/proc/266/status Gid: 6 6 6 6

/proc/267/status Gid: 6 6 6 6

after uid:

/proc/262/status Uid: 7 7 7 7

/proc/263/status Uid: 7 7 7 7

/proc/264/status Uid: 7 7 7 7

/proc/265/status Uid: 7 7 7 7

/proc/266/status Uid: 7 7 7 7

/proc/267/status Uid: 7 7 7 7

The above three commands do the following. First, we use sudo setcap -r ... to remove any capability from the ./setid binary. We have to do this to ensure the binary isn't obtaining capabilities from file-capabilities.

Next, we perform a failed attempt to use Ambient capabilities. The reason this fails being that the Ambient set is always reset when the process changes UID. What the failed attempt does is set some Ambient capabilities and then change UID. It is this change of UID that clears the Ambient vector of capabilities and just leaves the Inheritable capabilities raised. This initial state for the execution of ./setid is clearly visible in the "before capability state: cap_setgid,cap_setuid=i". As such, without permitted bits on the running program, it fails as expected.

Finally, we perform a successful attempt to use Ambient capabilities. This one works because we change --user before setting the Ambient capabilities. Now, when ./setid starts, the Ambient vector translates into an "=eip" combination sufficient to run the program.