SF.NET Logo

Hpcbench Project Page in SF.NET


High Performance Networks Benchmarking



TCP Communication Experiments

[ Blocking unidirectional throughput test ]   [ Blocking bidirectional throughput test ]   [ Blocking exponential test ]

[ Non-blocking unidirectional throughput test ]   [ Non-blocking bidirectional throughput test ]

[ Non-blocking exponential test ]   [ Test with system log ]   [ Test TCP socket options ]   [ Latency test ]   [ Plot data ]

Our testbed is a cluster named "mako" (mako.sharcnet.ca) in SHARCNET. Mako consists of 8 nodes(mk1-mk8), each node possessing 4 Intel Xeon 3GHz Hyperthreading processors and 2GB of RAM. There are two high speed, low latency interconnects between all nodes: Myrinet and Gigabit Ethernet. Our TCP tests are based on Gigabit Ethernet communication.

In the following examples, we will test the TCP communication between mk1(client) and mk4(server). The first step for TCP tests is to start the server process at remote or local machine:

[mk4 ~/hpcbench/tcp]$ tcpserver &
[1] 11504
TCP socket listening on port [5677]
[mk4 ~/hpcbench/tcp]$

You can also run server in foreground and enable the verbose mode (-v option). If the default port number is not available, you can specify another port by (-p) option, or let the system to pick an unused port number for you:

[mk4 ~/hpcbench/tcp]$ tcpserver -p 0
TCP socket listening on port [19283]
 

 


[ Blocking unidirectional throughput test ]   [ TOP ]

Blocking stream (unidirectional) throughput test is the default setting. We repeat the test by 5 times and write results to a output file.

[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -r 5 -o output
 (1) : 938.306894 Mbps
 (2) : 936.791097 Mbps
 (3) : 936.252991 Mbps
 (4) : 939.005970 Mbps
 (5) : 938.736784 Mbps
Test done!
Test-result: "output"
[mk1 ~/hpcbench/tcp]$ cat output
# TCP communication test -- Tue Jul 13 17:52:09 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: stream(unidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 587792384
# Message size (Bytes): 65536
# Iteration: 8969
# Test Repetition: 5

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        938.307        5.01         0.00        1.21         5.01         0.12        2.90
2        936.791        5.02         0.00        1.18         5.02         0.07        2.87
3        936.253        5.02         0.03        1.26         5.02         0.06        3.00
4        939.006        5.01         0.00        1.36         5.01         0.08        2.97
5        938.737        5.01         0.00        1.84         5.01         0.13        2.64

# Throughput statistics : Average 937.9449   Minimum 936.2530   Maximum 939.0060
[mk1 ~/hpcbench/tcp]$

 


[ Blocking bidirectional throughput test ]   [ TOP ]

The blocking bidirectional throughput test is called "ping-pong" test: server receives a message and sends back to clients:

[mk1 ~/hpcbench/tcp]$ tcptest -i -h mk4 -r 5 -o out.txt
 (1) : 835.843170 Mbps
 (2) : 845.507710 Mbps
 (3) : 846.742031 Mbps
 (4) : 843.675498 Mbps
 (5) : 842.522398 Mbps
Test done!
Test-result: "out.txt" 
[mk1 ~/hpcbench/tcp]$ cat out.txt
# TCP communication test -- Tue Jul 13 18:15:53 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: ping-pong(bidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 262602752
# Message size (Bytes): 65536
# Iteration: 4007
# Test Repetition: 5

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        835.843        5.03         0.03        2.91         5.03         0.09        2.19
2        845.508        4.97         0.08        3.01         4.97         0.04        2.28
3        846.742        4.96         0.05        2.96         4.96         0.04        2.16
4        843.675        4.98         0.07        2.86         4.98         0.03        2.07
5        842.522        4.99         0.08        2.72         4.99         0.05        2.30

# Throughput statistics : Average 843.9019   Minimum 835.8432   Maximum 846.7420
[mk1 ~/hpcbench/tcp]$

 


[ Blocking exponential throughput test ]   [ TOP ]

In exponential tests, the message size will increase exponentially from 1 Byte to a 2^n Bytes, where n is defined by (-e) option. We tests both stream and ping-pong exponential tests with maximum data size of 32MByte (2^25):

[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -e 25
 (1) : 0.966954 Mbps
 (2) : 2.028758 Mbps
 (3) : 4.018384 Mbps
 (4) : 8.962581 Mbps
 (5) : 114.021023 Mbps
 (6) : 193.792581 Mbps
 (7) : 348.583878 Mbps
 (8) : 516.233112 Mbps
 (9) : 754.716981 Mbps
 (10) : 903.635722 Mbps
 (11) : 939.406449 Mbps
 (12) : 936.153679 Mbps
 (13) : 936.881712 Mbps
 (14) : 938.658356 Mbps
 (15) : 940.367532 Mbps
 (16) : 938.735666 Mbps
 (17) : 937.824883 Mbps
 (18) : 938.344359 Mbps
 (19) : 941.224639 Mbps
 (20) : 941.180170 Mbps
 (21) : 940.614833 Mbps
 (22) : 939.158388 Mbps
 (23) : 933.832581 Mbps
 (24) : 932.930817 Mbps
 (25) : 940.043719 Mbps
 (26) : 941.001496 Mbps
Test done!
[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -e 25 -i -o output.txt
 (1) : 0.279742 Mbps
 (2) : 0.526432 Mbps
 (3) : 1.053214 Mbps
 (4) : 2.096010 Mbps
 (5) : 4.115054 Mbps
 (6) : 8.171917 Mbps
 (7) : 15.999450 Mbps
 (8) : 32.126498 Mbps
 (9) : 56.310765 Mbps
 (10) : 95.175468 Mbps
 (11) : 144.502655 Mbps
 (12) : 224.478024 Mbps
 (13) : 368.048684 Mbps
 (14) : 516.538273 Mbps
 (15) : 634.885268 Mbps
 (16) : 776.249393 Mbps
 (17) : 842.645579 Mbps
 (18) : 891.314645 Mbps
 (19) : 912.319905 Mbps
 (20) : 924.485150 Mbps
 (21) : 929.543704 Mbps
 (22) : 931.308711 Mbps
 (23) : 938.358981 Mbps
 (24) : 939.385785 Mbps
 (25) : 940.040040 Mbps
 (26) : 930.230512 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 18:30:36 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: ping-pong(bidirectional) exponential throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default

# Data-size  Network-hroughput   Elapsed-time   Iteration
#  (Bytes)        (Mbps)           (Seconds)    
         1         0.2797           0.28598       5000
         2         0.5264           0.30393       5000
         4         1.0532           0.30383       5000
         8         2.0960           0.30534       5000
        16         4.1151           0.31105       5000
        32         8.1719           0.31327       5000
        64        15.9995           0.32001       5000
       128        32.1265           0.31874       5000
       256        56.3108           0.36370       5000
       512        95.1755           0.43036       5000
      1024       144.5027           0.56691       5000
      2048       224.4780           0.72987       5000
      4096       368.0487           0.89032       5000
      8192       516.5383           1.26875       5000
     16384       634.8853           2.06450       5000
     32768       776.2494           3.37706       5000
     65536       842.6456           4.60547       3701
    131072       891.3146           4.72693       2009
    262144       912.3199           4.88244       1062
    524288       924.4852           4.92708        543
   1048576       929.5437           4.96344        275
   2097152       931.3087           4.97205        138
   4194304       938.3590           4.93469         69
   8388608       939.3858           4.85786         34
  16777216       940.0400           4.85448         17
  33554432       930.2305           4.61710          8
[mk1 ~/hpcbench/tcp]$

 


[ Non-blocking unidirectional throughput test ]    [ TOP ]

We can set the TCP socket in non-blocking mode, then the read/write system calls will immediately return in the application layer.

[mk1 ~/hpcbench/tcp]$ tcptest -n -h mk4 -r 5 -o output.txt
 (1) : 930.135129 Mbps
 (2) : 936.424894 Mbps
 (3) : 933.355274 Mbps
 (4) : 939.637175 Mbps
 (5) : 941.422417 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 18:35:41 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: stream(unidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: OFF  server: OFF
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 587792384
# Message size (Bytes): 65536
# Iteration: 8969
# Test Repetition: 5

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        930.135        5.06         0.37        4.67         5.06         0.17        3.76
2        936.425        5.02         0.16        4.82         5.02         0.16        3.81
3        933.355        5.04         0.26        4.76         5.04         0.21        3.69
4        939.637        5.00         0.20        4.73         5.00         0.20        3.81
5        941.422        4.99         0.28        4.71         4.99         0.18        3.55

# Throughput statistics : Average 936.4724   Minimum 930.1351   Maximum 941.4224
[mk1 ~/hpcbench/tcp]$ 

Contrast to blocking tests above, result shows that there is no big difference of throughputs between unidirectional blocking and non-blocking communication, and non-blocking consumes more system resource (process time of system mode). This is because we are exhaustively using select() system call in the application layer, and those system calls may repeated in kernel level.


[ Non-blocking bidirectional throughput test ]    [ TOP ]

[mk1 ~/hpcbench/tcp]$ tcptest -in -h mk4 -r 5 -o output.txt 
 (1) : 1456.741918 Mbps
 (2) : 1353.053078 Mbps
 (3) : 1349.546522 Mbps
 (4) : 1346.721659 Mbps
 (5) : 1348.522626 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 18:44:21 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: ping-pong(bidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: OFF  server: OFF
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 514785280
# Message size (Bytes): 65536
# Iteration: 7855
# Test Repetition: 5

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1       1456.742        5.65         0.04        5.59         5.65         0.13        4.93
2       1353.053        6.09         0.06        6.02         6.09         0.13        4.32
3       1349.547        6.10         0.03        6.07         6.10         0.11        4.37
4       1346.722        6.12         0.09        6.02         6.12         0.08        4.63
5       1348.523        6.11         0.07        6.04         6.11         0.22        4.71

# Throughput statistics : Average 1350.3741   Minimum 1346.7217   Maximum 1456.7419
[mk1 ~/hpcbench/tcp]$ 

As we can see, in bidirectional tests, non-blocking throughputs are much higher than those of blocking communication.


[ Non-blocking exponential throughput test ]   [ TOP ]

In exponential tests, the message size will increase exponentially from 1 Byte to a 2^n Bytes, where n is defined by (-e) option. We tests both unidirectional and bidirectional exponential tests with maximum data size of 32MByte (2^25):

[mk1 ~/hpcbench/tcp]$ tcptest -ne 25 -h mk4  
 (1) : 179.372197 Mbps
 (2) : 361.990950 Mbps
 (3) : 553.633218 Mbps
 (4) : 751.173709 Mbps
 (5) : 783.353733 Mbps
 (6) : 880.935994 Mbps
 (7) : 911.356355 Mbps
 (8) : 925.356949 Mbps
 (9) : 933.199672 Mbps
 (10) : 929.135287 Mbps
 (11) : 939.255658 Mbps
 (12) : 939.223352 Mbps
 (13) : 937.777244 Mbps
 (14) : 934.495749 Mbps
 (15) : 934.355855 Mbps
 (16) : 940.092351 Mbps
 (17) : 938.246807 Mbps
 (18) : 938.310372 Mbps
 (19) : 937.102111 Mbps
 (20) : 937.951656 Mbps
 (21) : 934.376846 Mbps
 (22) : 938.401579 Mbps
 (23) : 936.109626 Mbps
 (24) : 938.588935 Mbps
 (25) : 933.255092 Mbps
 (26) : 935.683583 Mbps
Test done!
[mk1 ~/hpcbench/tcp]$ tcptest -ine 25 -h mk4 -o output.txt  
 (1) : 615.384615 Mbps
 (2) : 1012.658228 Mbps
 (3) : 1314.168378 Mbps
 (4) : 1479.768786 Mbps
 (5) : 1680.892974 Mbps
 (6) : 1576.354680 Mbps
 (7) : 1630.832935 Mbps
 (8) : 1539.733855 Mbps
 (9) : 1643.527807 Mbps
 (10) : 1632.392795 Mbps
 (11) : 1635.995087 Mbps
 (12) : 1623.078142 Mbps
 (13) : 1635.954248 Mbps
 (14) : 1326.670560 Mbps
 (15) : 1359.569326 Mbps
 (16) : 1350.383811 Mbps
 (17) : 1344.656175 Mbps
 (18) : 1345.188891 Mbps
 (19) : 1343.651458 Mbps
 (20) : 1344.313125 Mbps
 (21) : 1351.314055 Mbps
 (22) : 1354.159095 Mbps
 (23) : 1356.183126 Mbps
 (24) : 1354.537707 Mbps
 (25) : 1351.288206 Mbps
 (26) : 1352.852697 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 19:38:47 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: ping-pong(bidirectional) exponential throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: OFF  server: OFF
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default

# Data-size  Network-hroughput   Elapsed-time   Iteration
#  (Bytes)        (Mbps)           (Seconds)    
         1       615.3846           0.00026      10000
         2      1012.6582           0.00032      10000
         4      1314.1684           0.00049      10000
         8      1479.7688           0.00086      10000
        16      1680.8930           0.00152      10000
        32      1576.3547           0.00325      10000
        64      1630.8329           0.00628      10000
       128      1539.7339           0.01330      10000
       256      1643.5278           0.02492      10000
       512      1632.3928           0.05018      10000
      1024      1635.9951           0.10015      10000
      2048      1623.0781           0.20189      10000
      4096      1635.9542           0.40060      10000
      8192      1326.6706           0.98798      10000
     16384      1359.5693           1.92814      10000
     32768      1350.3838           3.88251      10000
     65536      1344.6562           5.02119       6439
    131072      1345.1889           4.99660       3205
    262144      1343.6515           5.00388       1603
    524288      1344.3131           4.99206        800
   1048576      1351.3141           4.96619        400
   2097152      1354.1591           4.98054        201
   4194304      1356.1831           4.94836        100
   8388608      1354.5377           4.95437         50
  16777216      1351.2882           4.96629         25
  33554432      1352.8527           4.76212         12
[mk1 ~/hpcbench/tcp]$  

 


[ Test with system log ]    [ TOP ]

Currently the system resource tracing functionality is only available for Linux boxes.  To enable the system logging, you should enable the write option (-o) and CPU logging option (-c). In the following example, the file "output" records the results of tests, "ouput.c_log" logs client's side system information, "output.s_log" logs server's system information. System logs have two more entries than test repetition, the first one showing pre-test system information and  the last one showing system's post-test information.

[mk1 ~/hpcbench/tcp]$ tcptest -ch mk4 -r 5 -o output
 (1) : 932.349667 Mbps
 (2) : 939.459044 Mbps
 (3) : 940.130663 Mbps
 (4) : 939.778465 Mbps
 (5) : 939.497761 Mbps
Test done!
Test-result: "output"  Local-syslog: "output.c_log"  server-syslog: "output.s_log"
[mk1 ~/hpcbench/tcp]$ cat output
# TCP communication test -- Tue Jul 13 22:05:30 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: stream(unidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 87380  server: 87380
# Socket Send-buffer (Bytes) -- client: 16384  server: 16384
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 587005952
# Message size (Bytes): 65536
# Iteration: 8957
# Test Repetition: 5

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        932.350        5.04         0.01        1.41         5.04         0.08        2.68
2        939.459        5.00         0.02        1.35         5.00         0.10        2.77
3        940.131        5.00         0.01        1.26         4.99         0.09        2.72
4        939.778        5.00         0.01        1.29         5.00         0.07        2.46
5        939.498        5.00         0.02        1.31         5.00         0.08        2.56

# Throughput statistics : Average 939.5784   Minimum 932.3497   Maximum 940.1307
[mk1 ~/hpcbench/tcp]$ cat output.c_log 
# mk1 syslog -- Tue Jul 13 22:05:30 2004
# Watch times: 7
# Network devices (interface): 3 ( loop eth0 eth1 )
# CPU number: 4

##### System info, statistics of network interface <loop> and its interrupts to each CPU #####
#       CPU(%)     Mem(%)  Interrupt  Page   Swap   Context           <loop> information
#   Load User  Sys  Usage   Overall  In/out In/out   Swtich   RecvPkg    RecvByte   SentPkg    SentByte  Int-CPU0 Int-CPU1 Int-CPU2 Int-CPU3 
0      2    1    0     99       358       0      0      431         0           0         0           0         0        0        0        0
1     21    0   21     99    203056      32      0    34858        40        2699        40        2699         0        0        0        0
2     20    1   18     99    203742      28      0    38783        56        3531        56        3531         0        0        0        0
3     23    1   21     99    203038      32      0    40582        56        3531        56        3531         0        0        0        0
4     24    1   22     99    201045      28      0    42597        38        2595        38        2595         0        0        0        0
5     22    1   21     99    201381      28      0    42220        56        3531        56        3531         0        0        0        0
6      1    0    1     99       321       0      0      414         0           0         0           0         0        0        0        0

##### System info, statistics of network interface <eth0> and its interrupts to each CPU #####
#       CPU(%)     Mem(%)  Interrupt  Page   Swap   Context           <eth0> information
#   Load User  Sys  Usage   Overall  In/out In/out   Swtich   RecvPkg    RecvByte   SentPkg    SentByte  Int-CPU0 Int-CPU1 Int-CPU2 Int-CPU3 
0      2    1    0     99       358       0      0      431        57        6120        60        8885       137        0        0        0
1     21    0   21     99    203056      32      0    34858    189870    13298831    379800   575704642    201957        0        0        0
2     20    1   18     99    203742      28      0    38783    196012    13727462    391821   593860215    202722        0        0        0
3     23    1   21     99    203038      32      0    40582    196000    13729962    391868   593827773    202036        0        0        0
4     24    1   22     99    201045      28      0    42597    235372    16481861    470419   713509848    200021        0        0        0
5     22    1   21     99    201381      28      0    42220    196032    13730331    391923   594124906    200319        0        0        0
6      1    0    1     99       321       0      0      414      7081      496603     14068    21326241       111        0        0        0

##### System info, statistics of network interface <eth1> and its interrupts to each CPU #####
#       CPU(%)     Mem(%)  Interrupt  Page   Swap   Context           <eth1> information
#   Load User  Sys  Usage   Overall  In/out In/out   Swtich   RecvPkg    RecvByte   SentPkg    SentByte  Int-CPU0 Int-CPU1 Int-CPU2 Int-CPU3 
0      2    1    0     99       358       0      0      431        51        7965        50        4220       105        0        0        0
1     21    0   21     99    203056      32      0    34858       246       38150       242       20275       542        0        0        0
2     20    1   18     99    203742      28      0    38783       245       38287       232       19304       482        0        0        0
3     23    1   21     99    203038      32      0    40582       248       38758       238       19722       457        0        0        0
4     24    1   22     99    201045      28      0    42597       223       34312       213       17644       470        0        0        0
5     22    1   21     99    201381      28      0    42220       242       37351       232       19304       523        0        0        0
6      1    0    1     99       321       0      0      414        46        6908        43        3566        94        0        0        0

## CPU workload distribution: 
##
##         CPU0 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      1.0    0.0    1.0    99.0       3.0    2.0    1.0    97.0
1     70.2    0.2   70.0    29.8      21.5    0.2   21.3    78.5
2     70.6    0.2   70.4    29.4      20.1    1.7   18.4    79.9
3     72.0    0.2   71.8    28.0      23.2    2.0   21.2    76.8
4     67.8    4.6   63.2    32.2      24.2    1.8   22.4    75.8
5     73.2    3.8   69.4    26.8      22.9    1.1   21.8    77.1
6      0.0    0.0    0.0   100.0       1.5    0.2    1.2    98.5

##         CPU1 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      0.0    0.0    0.0   100.0       3.0    2.0    1.0    97.0
1      5.1    0.0    5.1    94.9      21.5    0.2   21.3    78.5
2      2.2    0.2    2.0    97.8      20.1    1.7   18.4    79.9
3     15.3    7.8    7.6    84.7      23.2    2.0   21.2    76.8
4     14.9    2.6   12.3    85.1      24.2    1.8   22.4    75.8
5      2.6    0.4    2.2    97.4      22.9    1.1   21.8    77.1
6      1.0    1.0    0.0    99.0       1.5    0.2    1.2    98.5

##         CPU2 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      2.0    0.0    2.0    98.0       3.0    2.0    1.0    97.0
1      6.9    0.2    6.7    93.1      21.5    0.2   21.3    78.5
2      7.6    6.4    1.2    92.4      20.1    1.7   18.4    79.9
3      5.4    0.0    5.4    94.6      23.2    2.0   21.2    76.8
4      1.2    0.0    1.2    98.8      24.2    1.8   22.4    75.8
5      0.8    0.0    0.8    99.2      22.9    1.1   21.8    77.1
6      2.0    0.0    2.0    98.0       1.5    0.2    1.2    98.5

##         CPU3 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      8.9    7.9    1.0    91.1       3.0    2.0    1.0    97.0
1      3.7    0.4    3.4    96.3      21.5    0.2   21.3    78.5
2      0.0    0.0    0.0   100.0      20.1    1.7   18.4    79.9
3      0.0    0.0    0.0   100.0      23.2    2.0   21.2    76.8
4     12.7    0.0   12.7    87.3      24.2    1.8   22.4    75.8
5     14.9    0.2   14.7    85.1      22.9    1.1   21.8    77.1
6      3.0    0.0    3.0    97.0       1.5    0.2    1.2    98.5
[mk1 ~/hpcbench/tcp]$ cat output.s_log 
# mk4 syslog -- Tue Jul 13 22:05:30 2004
# Watch times: 7
# Network devices (interface): 2 ( loop eth0 )
# CPU number: 4

##### System info, statistics of network interface <loop> and its interrupts to each CPU #####
#       CPU(%)     Mem(%)  Interrupt  Page   Swap   Context           <loop> information
#   Load User  Sys  Usage   Overall  In/out In/out   Swtich   RecvPkg    RecvByte   SentPkg    SentByte  Int-CPU0 Int-CPU1 Int-CPU2 Int-CPU3 
0      0    0    0     10       187       0      0      118         0           0         0           0         0        0        0        0
1     25    0   25     10    405641      32      0   672780         0           0         0           0         0        0        0        0
2     28    0   28     10    405419      16      0   805291         0           0         0           0         0        0        0        0
3     28    0   28     10    405411      16      0   805986         0           0         0           0         0        0        0        0
4     26    0   26     10    405385      16      0   805753         0           0         0           0         0        0        0        0
5     27    0   27     10    405437      16      0   805660         0           0         0           0         0        0        0        0
6      0    0    0     10       192       0      0      130         0           0         0           0         0        0        0        0

##### System info, statistics of network interface <eth0> and its interrupts to each CPU #####
#       CPU(%)     Mem(%)  Interrupt  Page   Swap   Context           <eth0> information
#   Load User  Sys  Usage   Overall  In/out In/out   Swtich   RecvPkg    RecvByte   SentPkg    SentByte  Int-CPU0 Int-CPU1 Int-CPU2 Int-CPU3 
0      0    0    0     10       187       0      0      118        21        2884        23        2586        48        0        0        0
1     25    0   25     10    405641      32      0   672780    407773   618402173    203812    14274437    404948        0        0        0
2     28    0   28     10    405419      16      0   805291    391589   593779142    195823    13712492    404752        0        0        0
3     28    0   28     10    405411      16      0   805986    391754   593997738    195905    13717329    404766        0        0        0
4     26    0   26     10    405385      16      0   805753    391537   593968445    195785    13710252    404713        0        0        0
5     27    0   27     10    405437      16      0   805660    391296   593460875    195675    13703206    404748        0        0        0
6      0    0    0     10       192       0      0      130     65930    99976153     32981     2309297        50        0        0        0

## CPU workload distribution: 
##
##         CPU0 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0
1     68.8    0.8   68.0    31.2      25.4    0.4   25.0    74.6
2     58.8    0.0   58.8    41.2      29.0    0.5   28.5    71.0
3     60.0    0.0   60.0    40.0      29.0    0.4   28.5    71.0
4     57.0    0.0   57.0    43.0      26.9    0.3   26.5    73.1
5     57.8    0.0   57.8    42.2      27.6    0.4   27.2    72.4
6      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0

##         CPU1 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0
1     32.8    0.8   32.0    67.2      25.4    0.4   25.0    74.6
2     57.2    2.0   55.2    42.8      29.0    0.5   28.5    71.0
3     56.0    1.8   54.2    44.0      29.0    0.4   28.5    71.0
4     50.6    1.4   49.2    49.4      26.9    0.3   26.5    73.1
5     52.6    1.6   51.0    47.4      27.6    0.4   27.2    72.4
6      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0

##         CPU2 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0
1      0.0    0.0    0.0   100.0      25.4    0.4   25.0    74.6
2      0.0    0.0    0.0   100.0      29.0    0.5   28.5    71.0
3      0.0    0.0    0.0   100.0      29.0    0.4   28.5    71.0
4      0.0    0.0    0.0   100.0      26.9    0.3   26.5    73.1
5      0.0    0.0    0.0   100.0      27.6    0.4   27.2    72.4
6      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0

##         CPU3 workload (%)           Overall CPU workload (%)
#   < load   user  system   idle >  < load   user  system   idle >
0      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0
1      0.0    0.0    0.0   100.0      25.4    0.4   25.0    74.6
2      0.0    0.0    0.0   100.0      29.0    0.5   28.5    71.0
3      0.0    0.0    0.0   100.0      29.0    0.4   28.5    71.0
4      0.0    0.0    0.0   100.0      26.9    0.3   26.5    73.1
5      0.0    0.0    0.0   100.0      27.6    0.4   27.2    72.4
6      0.0    0.0    0.0   100.0       0.0    0.0    0.0   100.0
[mk1 ~/hpcbench/tcp]$ 

We found that client machine takes about 20% CPU  usage and mainly consumes CPU0  system time, while server has about 26% CPU usage and distributes workload to CPU0 and CPU1.  We also observe Client has one more network interface than server, on which there is a little traffic.


[ Test TCP socket options ]    [ TOP ]

There are some TCP socket options we can set. In the following example, we set the TCP socket buffer size to 500KBytes, turn on the TCP_NODELAY option, set the MSS to 8KBytes, and set the packet's TOS bit to Maximize-Throughput mode:

[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -b 500k -N -q 2 -M 8k -r 6 -o output.txt
 (1) : 935.895594 Mbps
 (2) : 937.464018 Mbps
 (3) : 937.376642 Mbps
 (4) : 933.568767 Mbps
 (5) : 934.880739 Mbps
 (6) : 938.093950 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 19:50:21 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: stream(unidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 262142  server: 262142
# Socket Send-buffer (Bytes) -- client: 262142  server: 262142
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: ON  server: ON
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 1448  server: 1448
# IP TOS type -- client: IPTOS_Maximize_Throughput server: IPTOS_Maximize_Throughput
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 585826304
# Message size (Bytes): 65536
# Iteration: 8939
# Test Repetition: 6

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        935.896        5.01         0.02        2.85         5.01         0.11        2.18
2        937.464        5.00         0.01        3.11         5.00         0.09        2.16
3        937.377        5.00         0.02        2.91         5.00         0.05        2.16
4        933.569        5.02         0.02        2.44         5.02         0.07        2.17
5        934.881        5.01         0.02        2.88         5.01         0.05        2.04
6        938.094        5.00         0.02        2.92         5.00         0.09        2.18

# Throughput statistics : Average 936.4042   Minimum 933.5688   Maximum 938.0940
[mk1 ~/hpcbench/tcp]$ 

We are testing a pure idle cluster, and we can see there is no big difference of the throughput between this setting and that of default setting. Notice the socket buffer size is not the same as we defined, and 256KBytes is the maximum size we can get. The MSS setting is also ignored since it's bigger than the system MTU.

Let's try a small size of MSS and socket buffer:

[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -b 10k -M 500 -r 6 -o output.txt
 (1) : 244.712325 Mbps
 (2) : 243.906450 Mbps
 (3) : 244.264968 Mbps
 (4) : 244.139996 Mbps
 (5) : 243.599390 Mbps
 (6) : 242.246976 Mbps
Test done!
Test-result: "output.txt" 
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP communication test -- Tue Jul 13 20:06:57 2004
# Hosts: mk1 (client) <----> mk4 (server)
# TCP test mode: stream(unidirectional) throughput test
# Socket Recv-buffer (Bytes) -- client: 10240  server: 10240
# Socket Send-buffer (Bytes) -- client: 10240  server: 10240
# Socket blocking option -- client: ON  server: ON
# TCP_NODELAY option -- client: OFF  server: OFF
# TCP_CORK option -- client: OFF  server: OFF
# TCP Maximum-segment-size(MSS) (Bytes) -- client: 500  server: 500
# IP TOS type -- client: Default server: Default
# Data size of each read/write (Bytes): 8192
# Total data size sent of each test (Bytes): 151388160
# Message size (Bytes): 65536
# Iteration: 2310
# Test Repetition: 6

#        Network      Client     C-process   C-process      Server     S-process   S-process
#      Throughput  Elapsed-time  User-mode  System-mode  Elapsed-time  User-mode  System-mode
#         (Mbps)    (Seconds)    (Seconds)   (Seconds)    (Seconds)    (Seconds)   (Seconds)
1        244.712        4.95         0.00        0.35         4.95         0.03        1.74
2        243.906        4.97         0.01        0.38         4.97         0.05        1.66
3        244.265        4.96         0.02        0.31         4.96         0.03        1.75
4        244.140        4.96         0.00        0.31         4.96         0.02        1.55
5        243.599        4.97         0.00        0.29         4.97         0.04        1.41
6        242.247        5.00         0.01        0.29         5.00         0.02        1.71

# Throughput statistics : Average 243.9777   Minimum 242.2470   Maximum 244.7123
[mk1 ~/hpcbench/tcp]$ 

 


[ Latency (Roundtrip time) test ]    [ TOP ]

This test is like a TCP version of "ping".  We test roundtrip time with default messge size (64Bytes) and 1KBytes data:

[mk1 ~/hpcbench/tcp]$ tcptest -ah mk4
TCP Round Trip Time (1) : 61.551 usec
TCP Round Trip Time (2) : 60.696 usec
TCP Round Trip Time (3) : 60.233 usec
TCP Round Trip Time (4) : 60.439 usec
TCP Round Trip Time (5) : 60.254 usec
TCP Round Trip Time (6) : 60.371 usec
TCP Round Trip Time (7) : 60.774 usec
TCP Round Trip Time (8) : 60.414 usec
TCP Round Trip Time (9) : 60.583 usec
TCP Round Trip Time (10) : 60.359 usec
10 trials with message size 64 Bytes.
TCP RTT min/avg/max = 60.233/60.568/61.551 usec
[mk1 ~/hpcbench/tcp]$ tcptest -h mk4 -A 1k -r 5 -o output.txt 
TCP Round Trip Time (1) : 112.682 usec
TCP Round Trip Time (2) : 112.682 usec
TCP Round Trip Time (3) : 112.507 usec
TCP Round Trip Time (4) : 110.707 usec
TCP Round Trip Time (5) : 112.020 usec
5 trials with message size 1024 Bytes.
TCP RTT min/avg/max = 110.707/112.119/112.682 usec
[mk1 ~/hpcbench/tcp]$ cat output.txt
# TCP roundtrip time test Tue Jul 13 20:11:43 2004
# mk1 <--> mk4
# TCP-send-buffer: 16384 TCP-recv-buffer: 87380
# Message-size: 1024 Iteration: 1024
TCP Round Trip Time (1) : 112.682 usec
TCP Round Trip Time (2) : 112.682 usec
TCP Round Trip Time (3) : 112.507 usec
TCP Round Trip Time (4) : 110.707 usec
TCP Round Trip Time (5) : 112.020 usec
5 trials with message size 1024 Bytes.
TCP RTT min/avg/max = 110.707/112.119/112.682 usec
[mk1 ~/hpcbench/tcp]$ 

 


[ Plot data ]    [ TOP ]

If write option (-o) and plot option (-P) are both defined, a configuration file for plotting with format of "ouput.plot" will be created. Use gnuplot to plot the data or create the postscript files of the plotting:

 


Last updated: Sept. 2004 by Ben Huang