Capable shared objects

This article contains a walk through example of creating a shared object (you can think shared library) with its own file-capabilities and, via linking, how it can be used to support privileged operations by otherwise unprivileged programs.

Introduction

Elsewhere, we've described how the POSIX.1e (draft) model of privilege focuses on encoding privilege in audited program binaries enhanced with limited file capabilities, and sets itself apart from associating privilege unilaterally with a user identity, or naive inheritance which can leak into the execution of un-audited code.

Whole programs, however, can still be large and sometimes the need for a privileged operation is quite contained. Granting privilege to a whole program can increase the, so called, attack surface with which to find a privilege escalation. There are many such subtleties, for example, in threaded applications.

In other contexts, code complexity can be managed efficiently by introducing the concept of a library (some computer languages refer to these as packages) with a well defined API. To eliminate bugs, programmers can test that the API is robust, and over time users and programmers gain confidence that the library works as intended. Better yet, should a problem be found, fixing the library fixes all programs that use it.

This article identifies a limited system level privileged operation and demonstrates how to embed its functionality into a privileged shared object, a mini library. This shared object can be linked against an unprivileged binary and used for exactly one thing.

The problem

Simple, http serving, webservers have to listen to the network and accept connections from remote client programs - like the web browser you are using to read this article. The address the client connects to has a port address, and http's default port is numbered 80. On a Linux machine, all ports numbered 0-1023 are considered system ports, and require privilege to, so called, bind to.

The problem we want to solve is how to make a shared object (a mini library), that has sufficient privilege to only bind to the http port and (we hope) can't be exploited to bind to any other port. Further, we want this shared library to be linkable to an otherwise unprivileged program.

Our solution

Long story short, we have developed a working example of solving this problem here:

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/tree/contrib/capso

You can build and test it as follows:

$ git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git

$ cd libcap

$ make DYNAMIC=yes

$ cd contrib/capso

$ make

The last make command invocation will run sudo to add a file capability to the capso.so shared object. That capability is the one needed to bind to ports lower than 1024.

Note: the "DYNAMIC=yes", make argument, will build capsh linked dynamically (the default for the, .git present, build tree is static linking because that helps with some of the testing). This is needed to avoid a segfault in glibc. Yes, I've reported that but, apparently by design, it remains broken years later.

The two executables generated are the program bind and the shared object capso.so. You can see that one (bind) has no capabilities on it, and the other (capso.so) has a single Permitted file capability, as follows:

$ getcap -v bind capso.so

bind

capso.so cap_net_bind_service=p

$ ldd ./bind

linux-vdso.so.1 (0x00007fff57fa9000)

capso.so => not found

libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007d67fa7e3000)

libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d67fa622000)

/lib64/ld-linux-x86-64.so.2 (0x00007d67fa7ff000)

The unprivileged binary bind has been linked against capso.so but the loader can't find it by default. So, we can give it a little help with the use of an environment variable:

$ LD_LIBRARY_PATH=. ldd ./bind

linux-vdso.so.1 (0x00007ffc1c93f000)

capso.so => ./capso.so (0x00007a44d54ee000)

libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007a44d54d9000)

libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007a44d5318000)

libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007a44d5313000)

/lib64/ld-linux-x86-64.so.2 (0x00007a44d54fa000)

The shared library object capso.so uses some APIs not found in older versions of libcap, so you might also want to include the local build of libcap in the linker search path as follows: LD_LIBRARY_PATH=.:../../libcap.

Given all this, you can test that the unprivileged program bind can now bind to port 80 like this:

$ LD_LIBRARY_PATH=.:../../libcap ./bind

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

This is the expected output, it says that our bind was able to use capso.so to bind to port 80 on its behalf. The program sleeps for 60 seconds, which should be enough time for you to confirm it is working by running the suggested netstat command from another terminal as follows:

$ netstat -tlnp|grep :80

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN 8907/./bind

HOW does this work?

The above is a walk though that validates the capable shared library, capso.so, can be used to grant an unprivileged binary the privilege to bind to port 80. In this section we explain how the shared library supports this functionality.

The setup of having an unprivileged program and a privileged library is the opposite arrangement from how programs normally have privilege. So, this is novel. In truth we are leveraging a few tricks to make this work. They are:

  • linking our .so as a stand alone executable as well as a shared library

    • I wrote up how to do this on stackoverflow. In the libcap tree, I've created a slightly more elaborate version that can handle command line arguments too. The basic model being to include this header file (libcap/execable.h) and define your .so main() function with the macro SO_MAIN(). You also need to externally define the path to your build target's dynamic loader from the C compiler command line: -DSHARED_LOADER="\"ld-linux...\"". You can see how this is done in the Makefile for the capso sources.

  • re-executing the .so file from within the shared library using cap_launch() functionality. This is a feature in libcap for invoking a subprocess that can operate with a security context different from the launching program (even from multi-threaded applications that otherwise observe POSIX security semantics).

  • passing the bound file-descriptor from the privileged child .so invocation back to the parent via a Unix domain socket pair. With no small irony, leveraging a Linux Capability to conjure up such a file descriptor, enables a rudimentary implementation of the more generic Capability-based security.

Given the above, the bind program calls the capso.so:bind80() function, which first attempts to bind to port 80 directly. If that works (since the bind program isn't privileged, it won't work), the function returns immediately. If, however, that fails, it locates itself on the filesystem using the libdl.so:dladdr() function and re-invokes itself as a standalone binary. This standalone version of capso.so, when executed as a sub-process by cap_launch(), leverages its own file capability to run as privileged code. Run this way, the code can bind to port 80, and it passes back to the launching parent the obtained file-descriptor. Having done this the standalone capso.so exits, leaving the running bind program with a file-descriptor of its own bound to the privileged port, 80.

We can use the captree program to observe what happens. To do this we enable three strategic sleep(30) function calls inside capso.so by rebuilding the example with a make variable on the command line and run the program as before (it produces more output pausing after each line):

$ make CAPSO_DEBUG=-DCAPSO_DEBUG clean all

$ LD_LIBRARY_PATH=../../libcap/:./ ./bind

application bind80(127.0.0.1) attempt failed

invoking bind80-helper standalone

exiting standalone bind80-helper

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

After each line displays, we have an opportunity to investigate what is happening with captree. We do that as follows (from a different terminal):

  1. application bind80(127.0.0.1) attempt failed:

$ captree bind

--bind(11270)

  1. invoking bind80-helper standalone:

$ captree bind

--bind(11270)

+-capso.so(11297) "cap_net_bind_service=p"

  1. exiting standalone bind80-helper:

$ captree bind

--bind(11270)

+-capso.so(11297) "cap_net_bind_service=ep"

  1. (Sleeping for 60s... Try 'netstat -tlnp|grep :80'):

$ captree bind

--bind(11270)

This clearly shows that the helper binary, the capso.so executable with PID=11297 here, when invoked obtains the Permitted capability, CAP_NET_BIND_SERVICE. In order to bind to port 80 it has to raise this Effective capability. Finally, the helper exits and leaves the unprivileged bind program with a bound file-descriptor.

We note in passing that bind, courtesy of our LD_LIBRARY_PATH environment variable, links against the build tree build of libcap.so (known to include the cap_launch() abstraction). However, when capso.so is subsequently invoked as a stand alone binary, being privileged, the loader does not honor the LD_LIBRARY_PATH variable. Instead, the stand-alone execution uses the system installed default libcap.so. When invoked in this privileged helper mode, the needed libcap API is more limited and long supported by much older libcap versions, so this does not prevent this mode of operation from working fine on older systems.

As described in this article, we are providing a solution that allows any user to execute the file capable capso.so shared object. It can, however, be limited to only being usable by changing its ownership or group ownership (chown and chmod) to be executable by individual or a group of users. Capable privilege is not tied to ownership. Lists of explicit users can be enabled with Access Control Lists (chacl).

But isn't this insecure?

A stack overflow answer referred to this article, and one comment on that asked the question why isn't doing this a breach of some sort? Here we'll dig in a little to explain why file capabilities and setuid-root create very different situations because file capabilities do not inherit.

To explore the security implications of this shared library, we first state explicitly that using a technique like this to make it possible for unprivileged programs to bind to port 80 is a decision. Much like allowing anyone to run ping is a decision. Enabling these choices is a decision that the local administrator owns. So, we're going to take that decision as a given and that this shared library provides port 80 access to every program that can execute this shared library.

The issue of insecurity that we are interested in, is how might such a shared library be persuaded to leak its privilege? Since we don't think the program as previously described can be made to leak privilege (please file a bug if you find this to be untrue), we have added some explicit code to act like an exploit. To enable it, you need to rebuild the code with a different option:

$ make CAPSO_DEBUG=-DALLOW_EXPLOIT clean all

$ LD_LIBRARY_PATH=../../libcap/:./ ./bind

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$

On the face of it, the program and shared library work as they did the first time we invoked it above. The exploit code we've added is hiding there waiting for us to trigger it. It is triggered by use of an environment variable, and to show it in its full glory, we'll do that first with a modified privilege configuration.

Without any code changes, we can change the way capso.so, when executed, gains privilege. That is, instead of a file capability, we'll make it setuid-root. This is a really insecure thing to do, since the library is compiled with an easy to exploit hole in it. We're in exploration mode, so try it in a VM, or on a single user machine and remember to clean up afterwards...

$ sudo chown root capso.so

$ sudo chmod +s capso.so

$ LD_LIBRARY_PATH=../../libcap/:./ ./bind

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$

Again, this looks to behave the same. However, if we exercise the shell-escape exploit code we've been hinting at, we'll see why this is such a problematic configuration:

$ TRIGGER_EXPLOIT="../../progs/capsh --current" LD_LIBRARY_PATH=../../libcap/:./ ./bind

execv argv[0] = "../../progs/capsh"

execv argv[1] = "--current"

Current: =ep

Current IAB:

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$

We see that the exploit causes execution of the capsh program from within the shared library, and that that, user controlled binary, runs with full privilege (=ep)! Another way of saying this is, if you want to use the strategy we outlined in this article with setuid-root based privilege, you need to be really confident that no potential exploit is present in the code. Since that exploit will lead to full root equivalence as we have demonstrated here!

Of course, this whole article is not about setuid-root shared-libraries, but file-capable ones. What we show next is that this sort of vulnerability is not shared by file capabilities. To do that, we start over:

$ make CAPSO_DEBUG=-DALLOW_EXPLOIT clean all

$ LD_LIBRARY_PATH=../../libcap/:./ ./bind

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$ ../../progs/getcap capso.so

capso.so cap_net_bind_service=p

$

All working as before, with the file capability present on capso.so and not a set-up with setuid-root. So what happens this time when we try to perform the exploit?

$ TRIGGER_EXPLOIT="../../progs/capsh --current" LD_LIBRARY_PATH=../../libcap/:./ ./bind

execv argv[0] = "../../progs/capsh"

execv argv[1] = "--current"

Current: =

Current IAB:

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$

This time, when capsh runs, it runs with no privilege at all (=). This is not =ep. This is not even cap_net_bind_service=p. The exploit runs with the unprivileged state of the invoking user! This is 100% by design. Specifically, in the draft POSIX.1e privilege model, privilege does not naively inherit as one binary executes another.

The best that an exploit from within the execution context of the capso.so binary could do is leak the single Ambient capability: IAB of ^cap_net_bind_service. Which causes the one privilege capso.so had to inherit naively. This is because Ambient capabilities are naively inherited (they were not part of the original POSIX.1e design, but added to the Linux kernel later). We can modify our exploit to do this with a modified TRIGGER_EXPLOIT variable:

$ TRIGGER_EXPLOIT="^../../progs/capsh --current" LD_LIBRARY_PATH=../../libcap/:./ ./bind

execv argv[0] = "../../progs/capsh"

execv argv[1] = "--current"

Current: cap_net_bind_service=eip

Current IAB: ^cap_net_bind_service

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

Here, capsh is invoked with cap_net_bind_service=eip and an IAB value of ^cap_net_bind_service. The fact that this was not what the POSIX.1e committee had in mind can be seen if we try this same operation in a PURE1E libcap mode. To be complete, we actually select PURE1E_INIT which ensures we have fully reset the Inheritable flag values:

$ sudo LD_LIBRARY_PATH=../../libcap/:./ \

../../prog/capsh --user=$(whoami) --mode=PURE1E_INIT --

$ TRIGGER_EXPLOIT="^../../progs/capsh --current" LD_LIBRARY_PATH=../../libcap/:./ ./bind

unable to raise ambient capability: Operation not permitted

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

<...60 seconds elapse...>

Done.

$

In the PURE1E* modes of operation, Ambient capability inheritance is completely disabled. Since we didn't exit in that last sequence, the shell we're executing commands in is still running in that PURE1E* state. We can persuade our exploit to leak its one capability value in the Inheritable set alone:

$ TRIGGER_EXPLOIT="%../../progs/capsh --current" LD_LIBRARY_PATH=../../libcap/:./ ./bind

execv argv[0] = "../../progs/capsh"

execv argv[1] = "--current"

Current: cap_net_bind_service=i

Current IAB: cap_net_bind_service

Webserver code to use filedes = 4 goes here.

(Sleeping for 60s... Try 'netstat -tlnp|grep :80')

Done.

$

What privilege does that give you? Not a great deal. (Disappointment with how little, is actually what motivated folk to develop the Ambient capability vector in the first place.) Indeed, per the POSIX.1e draft, Inheritable process capabilities are only useful in limited situations in conjunction with Inheritable file capabilities. If you want to see how they work you can review our separate article on inheriting privilege.