Cannot open /proc/self/oom_score_adj when I have the right capability

  • A+
Category:Languages

I'm trying to set the OOM killer score adjustment for a process, inspired by oom_adjust_setup in OpenSSH's port_linux.c. To do that, I open /proc/self/oom_score_adj, read the old value, and write a new value. Obviously, my process needs to be root or have the capability CAP_SYS_RESOURCE to do that.

I'm getting a result that I can't explain. When my process doesn't have the capability, I'm able to open that file and read and write values, though the value I write doesn't take effect (fair enough):

$ ./a.out  CAP_SYS_RESOURCE: not effective, not permitted, not inheritable oom_score_adj value: 0 wrote 5 bytes oom_score_adj value: 0 

But when my process does have the capability, I can't even open the file: it fails with EACCES:

$ sudo setcap CAP_SYS_RESOURCE+eip a.out $ ./a.out  CAP_SYS_RESOURCE: effective, permitted, not inheritable failed to open /proc/self/oom_score_adj: Permission denied 

Why does it do that? What am I missing?


Some further googling led me to this lkml post by Azat Khuzhin on 20 Oct 2013. Apparently CAP_SYS_RESOURCE lets you change oom_score_adj for any process but yourself. To change your own score adjustment, you need to combine it with CAP_DAC_OVERRIDE - that is, disable access controls for all files. (If I wanted that, I would have made this program setuid root.)

So my question is, how can I achieve this without CAP_DAC_OVERRIDE?


I'm running Ubuntu xenial 16.04.4, kernel version 4.13.0-45-generic. My problem is similar to but different from this question: that's about an error on write, when not having the capability.

My sample program:

#include <stdio.h> #include <string.h> #include <errno.h> #include <sys/capability.h>  void read_value(FILE *fp) {   int value;   rewind(fp);   if (fscanf(fp, "%d", &value) != 1) {     fprintf(stderr, "read failed: %s/n", ferror(fp) ? strerror(errno) : "cannot parse");   }   else {     fprintf(stderr, "oom_score_adj value: %d/n", value);   } }  void write_value(FILE *fp) {   int result;   rewind(fp);   result = fprintf(fp, "-1000");   if (result < 0) {     fprintf(stderr, "write failed: %s/n", strerror(errno));   }   else {     fprintf(stderr, "wrote %d bytes/n", result);   } }  int main() {   FILE *fp;    struct __user_cap_header_struct h;   struct __user_cap_data_struct d;    h.version = _LINUX_CAPABILITY_VERSION_3;   h.pid = 0;   if (0 != capget(&h, &d)) {       fprintf(stderr, "capget failed: %s/n", strerror(errno));   }   else {       fprintf(stderr, "CAP_SYS_RESOURCE: %s, %s, %s/n",           d.effective & (1 << CAP_SYS_RESOURCE) ? "effective" : "not effective",           d.permitted & (1 << CAP_SYS_RESOURCE) ? "permitted" : "not permitted",           d.inheritable & (1 << CAP_SYS_RESOURCE) ? "inheritable" : "not inheritable");   }    fp = fopen("/proc/self/oom_score_adj", "r+");   if (!fp) {     fprintf(stderr, "failed to open /proc/self/oom_score_adj: %s/n", strerror(errno));     return 1;   }   else {     read_value(fp);     write_value(fp);     read_value(fp);     fclose(fp);   }   return 0; } 

 


This one was very interesting to crack, took me a while.

The first real hint was this answer to a different question: https://unix.stackexchange.com/questions/364568/how-to-read-the-proc-pid-fd-directory-of-a-process-which-has-a-linux-capabil - just wanted to give the credit.

The reason it does not work as is

The real reason you get "permission denied" there is files under /proc/self/ are owned by root if the process has any capabilities - it's not about CAP_SYS_RESOURCE or about oom_* files specifically. You can verify this by calling stat and using different capabilities. Quoting man 5 proc:

/proc/[pid]

There is a numerical subdirectory for each running process; the subdirectory is named by the process ID.

Each /proc/[pid] subdirectory contains the pseudo-files and directories described below. These files are normally owned by the effective user and effective group ID of the process. However, as a security measure, the ownership is made root:root if the process's "dumpable" attribute is set to a value other than 1. This attribute may change for the following reasons:

  • The attribute was explicitly set via the prctl(2) PR_SET_DUMPABLE operation.

  • The attribute was reset to the value in the file /proc/sys/fs/suid_dumpable (described below), for the reasons described in prctl(2).

Resetting the "dumpable" attribute to 1 reverts the ownership of the /proc/[pid]/* files to the process's real UID and real GID.

This already hints to the solution, but first let's dig a little deeper and see that man prctl:

PR_SET_DUMPABLE (since Linux 2.3.20)

Set the state of the "dumpable" flag, which determines whether core dumps are produced for the calling process upon delivery of a signal whose default behavior is to produce a core dump.

In kernels up to and including 2.6.12, arg2 must be either 0 (SUID_DUMP_DISABLE, process is not dumpable) or 1 (SUID_DUMP_USER, process is dumpable). Between kernels 2.6.13 and 2.6.17, the value 2 was also permitted, which caused any binary which normally would not be dumped to be dumped readable by root only; for security reasons, this feature has been removed. (See also the description of /proc/sys/fs/suid_dumpable in proc(5).)

Normally, this flag is set to 1. However, it is reset to the current value contained in the file /proc/sys/fs/suid_dumpable (which by default has the value 0), in the following circumstances:

  • The process's effective user or group ID is changed.

  • The process's filesystem user or group ID is changed (see credentials(7)).

  • The process executes (execve(2)) a set-user-ID or set-group-ID program, resulting in a change of either the effective user ID or the effective group ID.

  • The process executes (execve(2)) a program that has file capabilities (see capabilities(7)), but only if the permitted capabilities gained exceed those already permitted for the process.

Processes that are not dumpable can not be attached via ptrace(2) PTRACE_ATTACH; see ptrace(2) for further details.

If a process is not dumpable, the ownership of files in the process's /proc/[pid] directory is affected as described in proc(5).

Now it's clear: our process has a capability that the shell used to launch it did not have, thus the dumpable attribute was set to false, thus files under /proc/self/ are owned by root rather than the current user.

How to make it work

The fix is as simple as re-setting that dumpable attribute before trying to open the file. Stick the following or something similar before opening the file:

prctl(PR_SET_DUMPABLE, 1, 0, 0, 0); 

Hope that helps ;)

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: