Why is my system CPU so high?

Had some performance issues with a client, recently. The server is an absolute beast, and should be able to easily handle the single website hosted on it. Alas, pages took forever to load (D7). A simple top showed that load was at about 60, and CPU utilization was at 10% for user and ... 95% for system. Wait what?

There are two things that are odd here. The first is, that even though the normal culprit for high system CPU usage ( the I/O ) was low ( checked using iostat ), most of the CPU went into kernel tasks.

I used strace to look into a process and find out what the reason for that delay is. For the curious, you can use

# strace -c -p [pid]

to get a nice summary list of the syscalls your process does. I applied that to a LiteSpeed process, and found the following (truncated):

[root@server:/root]> strace -c -p 97276
Process 97276 attached - interrupt to quit
Process 97276 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- --------------
 93.25    0.395625           0   6338926     38539 lstat
  2.72    0.011539           0    225123           getcwd
  1.51    0.006415           0    101411      5977 stat
  0.61    0.002593           0     25156           getdents
  0.48    0.002039           0     25621        21 open
  0.27    0.001136           0     25625           close
  0.26    0.001091           0     12307     12307 readlink
  0.22    0.000913           0      9239         1 read
  0.16    0.000698           0     12887           munmap
  0.15    0.000634           0     24547           fstat
  0.15    0.000628           0     12887           mmap
[.....]
------ ----------- ----------- --------- --------- --------------
100.00    0.424240               6835792     57199 total

That's 6,3 MILLION calls to lstat. But why would this occur? After digging about a bit, I found the culprit to be PHP's open_basedir -- which is utterly unecessary in our case, since we run PHP via suExec in FastCGI mode (thus filesystem permissions are more than enough). After turning open_basedir() off, compare the result in a different process :

[root@server:/root]> strace -c -p 98227
Process 98227 attached - interrupt to quit
Process 98227 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- --------------
 89.30    0.019310          32       600           munmap
  4.82    0.001043           1      1343         1 writev
  3.58    0.000774         774         1           brk
  0.84    0.000182           0      5158         5 stat
  0.55    0.000118           0      6082           read
  0.38    0.000083           0      5656        10 lstat
  0.18    0.000038           0      1231           open
  0.07    0.000016           0      1236           close
  0.07    0.000016           0       533           mmap
  0.07    0.000015           0        78           poll
[....]
------ ----------- ----------- --------- --------- --------------
100.00    0.021623                 24734        28 total

No calls to lstat. Nor stat. That's quite huge in terms of improvement even for the seconds -- from 0.42 to 0.02! Of course, these are two processes that might have anything being loaded, but the important thing to take away from here is, open_basedir has QUITE the effect in your PHP processes, and should NOT be used for a dedicated server!

If you want to take it a step further, PHP also has a cache for mapping files to their real paths - for a large Drupal installation, I've found that the default value of 16K is barely ever enough. You might squeeze a bit of extra performance by adjusting the two relevant php.ini parameters:

realpath_cache_size = 1M
realpath_cache_ttl = 3600

according to some articles this will even further reduce the amount of system work for your server.