OS X Server: Evaluating performance

This article describes how you can use a script to begin evaluating performance of your server running Mac OS X Server v10.4 or later. The primary focus is identifying the utilization of key resources: CPU, disk storage, and networking.

This article has been archived and is no longer updated by Apple.

Resource utilization

You can use this perl script (named "top_guide.pl") to evaluate the utilization of key resources over time on your server. This script works with Mac OS X Server v10.4 and later.

The script runs top -d, but extracts and summarizes key resource utilization over time. With no arguments, the output of top_guide.pl looks similar to this:

--------  -----------  ------------  -------------  -------------  -------------  -------------  ------
          CPU usage:          300-s  Reads/sec:     Writes/sec:    Net in/sec:    Net out/sec:   kernel
Time:     user  sys    Idle   avg.   number MB      number MB      pkts   Mb      pkts   Mb      CPU%
--------  ----- -----  -----  -----  ------ ------  ------ ------  ------ ------  ------ ------  ------
08:44:36    0.1   0.1   99.8   99.8       0    0.0       0    0.0      11    0.0       0    0.0     0.4
08:44:46    0.1   0.3   99.6   99.8       0    0.0       0    0.1      12    0.0       0    0.0     0.4
08:44:56    0.0   0.2   99.8   99.8       0    0.0       0    0.0      17    0.0       0    0.0     0.4
08:45:06    0.0   0.2   99.8   99.8       0    0.0       0    0.0       6    0.0       0    0.0     0.4
08:45:16    0.0   0.2   99.8   99.8       0    0.0       0    0.0       9    0.0       0    0.0     0.4
08:45:26    8.7  24.6   66.7   98.6      43    0.8      98    0.6     803    1.0    1399    1.1     5.0
08:45:37    5.7  83.6   10.7   95.7      48    0.4    1085  116.9   95366 1053.1   61830   35.4    72.4
08:45:47   11.5  88.5    0.0   92.4      23    0.2    3228  426.9  337955 3843.6  207872  114.0   284.8
08:45:58    4.6  95.4    0.0   89.0       4    0.0     594  596.7  447499 5215.5  266449  140.9   370.0

You can see the three key areas identified: CPU, disk I/O, and networking. The columns are:

  • Time: The end-time of the most recent summary line. By default it is 10 seconds between samples.
  • CPU usage: user: User-mode CPU utilization. The value is a percentage of the total CPU resources available on this system. For example, on an 8-processor system, 50% User CPU would mean that the total CPU utilization of all running applications is the equivalent of 4 processors.
  • CPU usage: sys: Kernel-mode CPU utilization as a percentage of the total available CPU.
  • Idle: Idle CPU, as a percentage of the total available resources.
  • 300-s avg: Smoothed average of the percentage of unused CPU over the last 300 seconds. (Note that this changes if you use the -roption to adjust the averaging interval; the column title changes appropriately.)
  • Reads/sec - number: The number of unique disk read operations issued during the sample period.
  • Reads/sec - MB: The total megabytes read from the disks during the sample period.
  • Writes/sec - number: The number of unique write operations issued during the sample period.
  • Writes/sec - MB: Total megabytes written during the sample period.
  • Net in/sec - pkts:Total packets received during the sample period.
  • Net in/sec - Mb: Total network input in megabits per second during the sample period.
  • Net out/sec - pkts: Total network packets sent during the sample period.
  • Net out/sec - Mb: Total network output in megabits per second during the sample period.

Other options that the script provides include:

  • -h: Display full usage information.
  • -o filename: Place a tab-separated copy of the output into a file. This output form is suitable for import into a spreadsheet or other data analysis scripts. The first line includes the column headers to ensure the output is always self-describing.
  • -d: Display the date and time stamp. This is particularly useful for long-term logging on a server where the administrator would like a historic record.
  • -S processNames: Monitor single instance processes. This is useful if a particular service that has only a single process ID is particularly interesting on a server.  An example might be a server used primarily as an AFP server. In that case, use the arguments -S AppleFileServer to add a separate column for the user CPU usage of the AppleFileServer process. More than one process can be listed by separating the process names by commas.
  • -M processNames: Like -S, except for use when the application uses multiple processes. An example is smbd, for which we have one per connection. On a system heavily used for Samba, use -M smbd to get a column that summarizes user CPU of all Samba activity.

Although a complete analysis of server performance is beyond the scope of this article, the output of top_guide.pl provides a summary of CPU, disk, and network resources usage on the system. Comparing that with the hardware and load of the system allows an administrator to decide which area to investigate in more detail. Some additional tools that may help when investigating each area are discussed below.

CPU utilization

If the CPU utilization is high, then the primary avenue of investigation is to improve the application's efficiency.  This is outside of the scope of this article; however, sometimes the CPU is not being utilized.  Several tools are available to help you investigate the situation where there are sufficient network and disk resources for a higher load, but a substantial amount of the CPU remains idle.

These are two ways a CPU may be under-utilized:

  • Application thread locking
  • Kernel resource locking

Introduced in Mac OS X 10.5 Leopard, the plockstat(1) command can be run against a particular application to investigate pthread lock contention. You can identify processes that might have pthread lock contention by examining the top -d output for processes with a high number of context switches (CSW column).  Once you've identified a process that you want to investigate for pthread lock contention, use: # plockstat -C -e 10 -p <pid>.

 Mutex block
   
   Count     nsec Lock                       Caller
   -------------------------------------------------------------------------------
  8540    29622 0x805eb0                     a.out`process_request+0x3d1
  8361    28284 0x805eb0                     a.out`release_block+0xac
  
 R/W reader block
 
 Count     nsec Lock                         Caller
 -------------------------------------------------------------------------------
     1  3014987 0xa781e8                     a.out`key_find+0x121
     5    78831 0xa3d158                     a.out`key_find+0x121
     2   120532 0xa3d158                     a.out`key_delete+0x137
 
 R/W writer block
 
 Count     nsec Lock                         Caller
 -------------------------------------------------------------------------------
    10  5336318 0xa781e8                     a.out`key_replace+0x80
    13  3827632 0xa75fe8                     a.out`key_replace+0x80
     3  3644946 0xa5fde8                     a.out`key_create+0x37
     6   161055 0xa3d158                     a.out`key_replace+0x80
     1   785267 0xa37758                     a.out`key_create+0x37

You can examine this output and review the total amount of nsec blocked behind various locks. If this value is high relative to the total, execution time is being impacted by lock contention.  That information can be helpful to the application writer in isolating and improving the application's performance. If symbols are available, the Caller field will contain the symbolic name of the function which blocked. Also, if the total count of blocks is high, over approximately 250 per second, then some efficiency is being lost to rescheduling between threads. Although not sufficient to conclude it is a limiting factor for performance, high rates of blocks warrant investigation because in most cases it indicates the application may not be as efficient as it could be.

Sometimes resource contention issues are for resources within the kernel. The Mac OS X kernel manages most multi-threaded resources with kernel mutexes and reader-writer locks.  Mutex contention can be evaluated via dtrace(1M) scripts such as:

lockstat:::adaptive-block

{

        @often[arg0, stack(5)] = count();

}


profile:::tick-5sec

{

        normalize(@often, 5);

        printa(@often);

}

This prints a sorted list, with the most contended locks and paths at the bottom of each output block printed out after five seconds of sampling. A sample of output is:

        375517140

              mach_kernel`lck_mtx_lock_wait_x86+0x1c7

              mach_kernel`lck_mtx_lock+0x217

              mach_kernel`ip_output_list+0xd34

              mach_kernel`tcp_setpersist+0x18b

              mach_kernel`tcp_output+0x17a0

                0

        375571960

              mach_kernel`lck_mtx_lock_wait_x86+0x1c7

              mach_kernel`lck_mtx_lock+0x217

              mach_kernel`vnode_ref+0x32

              mach_kernel`fcntl_nocancel+0x4b3

              mach_kernel`unix_syscall64+0x26d

                0

In this case, the script is reporting that the kernel mutex at 375517140 is being blocked via a path where tcp_output() calls tcp_setpersist(), which calls ip_output_list().  The number of blocks per second is 0 on both of these locks, so lock contention is not an issue.  Note that this form of the script does not total all instances against a single lock.  If the script were changed to:

lockstat:::adaptive-block

{

        @often[arg0] = count();

}


profile:::tick-5sec

{

        normalize(@often, 5);

        printa(@often);

}

The output may change to:

       1301055072                9

       1417504384                9

        353318336               10

        359381776               10

       1301055088               11

          5387344              165

This indicates that only the lock at kernel address 5387344 has appreciable contention, and even then it's below a level of concern, because it's less than 250.

Finally, the script can focus on kernel reader-writer locks by changing the script slightly to:

lockstat:::rw-block

{

    @often[arg0, stack(4)] = count();

}


profile:::tick-5sec

{

    normalize(@often, 5);

    printa(@often);

    clear(@often);

}

Some things to remember about using dtrace(1M) lockstat scripts and plockstat(1):

  • Collecting locking statistics has a performance impact. These tools can may slow the system.  Run them with care, and remember that the performance impact will add contention that isn't there in normal operation.
  • Some designs are not impacted by simply blocking on locks. Consider a lock on a very-slowly changing resource, where you may choose to use a lock to manage it. The lock may be left in place waiting for a long-running operation to complete.

Such instances would have long hold times that may not actually reduce performance.

Disk utilization

iopending(1) gives an indication of the depth of the I/O queues, as histogram of the queue depth.  For example:

2008 May 21 10:35:26,  load: 0.99,  disk_r:      0 KB,  disk_w:    392 KB
           value  ------------- Distribution ------------- count   
             < 0 |                                         0       
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9298    
               1 |                                         12      
               2 |                                         5       
               3 |                                         4       
               4 |                                         2       
               5 |                                         3       
               6 |                                         2       
               7 |                                         2       
               8 |                                         3       
               9 |                                         2       
              10 |                                         3       
              11 |                                         2       
              12 |                                         2       
              13 |                                         2       
              14 |                                         2       
              15 |                                         0       

This indicates some disk activity, but is not a concern since almost all the I/O queue have a depth of 0. In addition, you can use iopending to focus on a particular disk or file system volume. For example, to focus on /Volumes/MyRAID, use iopending -m MyRAID.

Finally, you can attribute I/O to particular processes and evaluate latency using fs_usage(1).  You can learn more about using Shark and fs_usage(1) in this Developer connection document.

Network utilization

Many servers may have multiple network interfaces and clients at various distances across the network. Analysis of performance can be complex. However, in LAN situations where clients are directly connected across a fabric of a known speed, metrics reported by top_guide.pl as well as netstat -i will provide an insight into how fully the capacity is being utilized. When interpreting those numbers keep in mind these guidelines:

Gigabit Networks - Full Duplex, 1500 byte MTU
Maximum rates - small packets:    1488 Kpps   762 Mbps
Maximum packet rate - 1500 byte packets:  81 Kpps   987 Mbps
(Kpps = thousands of packets per second)

When analyzing network performance it's also useful to review netstat -s output for issues such as excessive packet retransmit rates (which is beyond the scope of this article).

Learn more

Information about products not manufactured by Apple, or independent websites not controlled or tested by Apple, is provided without recommendation or endorsement. Apple assumes no responsibility with regard to the selection, performance, or use of third-party websites or products. Apple makes no representations regarding third-party website accuracy or reliability. Risks are inherent in the use of the Internet. Contact the vendor for additional information. Other company and product names may be trademarks of their respective owners.

Published Date: