HOW ADDING PROCESSORS AND ENABLING HYPER-THREADING
AFFECT USER CAPACITY OF METAFRAME XP SERVERS
By John D’Agati and Gagan Singh
Citrix Systems Inc.
Notice
The information in this publication is subject to change without notice.
THIS PUBLICATION IS PROVIDED “AS IS” WITHOUT WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE OR NON-INFRINGEMENT. CITRIX SYSTEMS, INC. (“CITRIX”), SHALL NOT BE LIABLE FOR
TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR DIRECT,
INCIDENTAL, CONSEQUENTIAL, OR ANY OTHER DAMAGES RESULTING FROM THE FURNISHING,
PERFORMANCE, OR USE OF THIS PUBLICATION, EVEN IF CITRIX HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES IN ADVANCE.
This publication contains information protected by copyright. Except for internal distribution, no part
of this publication may be photocopied or reproduced in any form without prior written consent from
Citrix.
The exclusive warranty for any Citrix products discussed in this publication, if any, is stated in the
product documentation accompanying such products. Citrix does not warrant products other than its
own.
Product names mentioned herein may be trademarks and/or registered trademarks of their respective
companies.
© 2003 Citrix Systems, Inc.
All rights reserved. Printed in the U.S.A.
Version History
January 2003
John D’Agati
Revisions
March 2003
John D’Agati
Revisions
ii
Overview ................................................................................................................................................ 1
Citrix ICAMark ....................................................................................................................................... 1
Number of CPUs’ Effect on User Capacity ........................................................................................ 2
Performance Monitor Statistics for Single processor system: ........................................................ 5
Performance Monitor Statistics for Dual processor system: .......................................................... 6
Performance Monitor Statistics for Quad processor system: ......................................................... 7
Hyper-Threading’s Effect on User Capacity ...................................................................................... 9
Dell 2650 Results .................................................................................................................................. 9
Performance Monitor Statistics for Dell 2650 Hyper-Threading Enabled: ..................................11
Performance Monitor Statistics for Dell 2650 Hyper-Threading Disabled:..................................12
Dell 6650 Results .................................................................................................................................14
Performance Monitor Statistics for Dell 6650 Hyper-Threading Enabled: ..................................16
Performance Monitor Statistics for Dell 6650 Hyper-Threading Disabled:..................................17
Summary ..............................................................................................................................................19
iii
Overview
The number of users that a server can support depends on several factors
including:

The MetaFrame server’s hardware specifications

The applications that are being run (because of the applications’ CPU and
memory requirements)

The amount of user input being processed by the applications

The maximum desired resource usage on the server, for example, 90% CPU
usage or 80% memory usage
This section discusses the increase in user capacity when more CPUs are added,
as well as the effect of Hyper-Threading in the processor. First, the Citrix
benchmarking test for user capacity, known as ICAMark, is described.
Citrix ICAMark
Citrix ICAMark is an internal tool which is based on the Citrix Server Test Kit
(CSTK) and used by Citrix Engineering for benchmarking purposes to quantify
the optimal number of simulated client sessions that can be connected to a
MetaFrame server with acceptable performance. Extending the number of
concurrent simulated users beyond the optimal results will cause a decreased
performance and may impact end user experience.
The test simulates users constantly typing and performing actions in Microsoft
Excel 97, Microsoft Access 97, and Microsoft PowerPoint 97. Other applications
can utilize more or less memory and CPU than Microsoft Office 97 and therefore
could produce different results. Note, that the simulated users in this test are
constantly typing into these applications at 40 words per minute and may be
considered more “rigorous” than normal users.
In this test, a step size of “number of users” is defined as 5. During the course
of the test, after the first 5 users are logged in, ICAMark launches simulated user
scripts on all 5 sessions. Each script opens Microsoft Excel and simulates the
creation of a spreadsheet, including calculations and charts. Once the Excel
phase is complete, Excel is closed and Microsoft Access is opened. The script
then simulates the creation of an Access database, including a table, query, and
form, with data manipulation. Once the Access phase is complete, a Microsoft
PowerPoint presentation is created of 6 slides, including spell checking, font
changes, and slide copies and deletions. When a script is finished it remains idle
until the scripts on all sessions are complete. The next iteration is then
launched, adding 5 more sessions to the test and the process begins again.
Based on how long the scripts take to complete, an ICAMark score is calculated.
For this test, a score of 80 has been determined as the optimal load for a server.
The ICAMark score is calculated by comparing a calibration value of the script
time with the time gathered during the iteration. The calibration value was
determined by running the scripts on a calibration machine. This machine is
considered to perform at the level we expect from a stand alone workstation.
Each script on the calibration machine was run locally, and the data recorded.
An ICAMark score of 80 means that the server has enough additional CPU and
memory resources to handle spikes in performance. When the test iteration
1
score drops below 80, additional users added to the server consume more
resources, producing lower test scores and slower performance.
Number of CPUs’ Effect on User Capacity
The benchmark test was performed with the following:
Server:
Dell PowerEdge 6650
Quad Processor - 1.6GHz Xeon with 256 KB L2 and 1 MB L3 Cache
400MHz Front Side Bus
Hyper threading is enabled
35 GB HDD with Dell PERC 3/DC Raid Controller
3.5 GB RAM
4 GB Page File
Citrix MetaFrame XP Feature Release 2/Service Pack 2
Microsoft Windows 2000 Advanced Server with Service Pack 2
Microsoft Office 97
Clients:
Dual Pentium P3 667 w/256 kb Cache
256 MB RAM
9 GB HDD with Adaptec SCSI Controller
Citrix ICA Program Neighborhood Client version 6.30.1050
Microsoft Windows 2000 Service Pack 2
Tests were performed by keeping the hardware static and disabling processors
on the server.
Results were collected on the following configurations:

Dell 6650 with 1 processor enabled

Dell 6650 with 2 processors enabled

Dell 6650 with 4 processors enabled
2
The following results were collected:
# of CPUs
# of Simulated
Users
% Performance
Increase
# of Users per
Processor
1
70 ± 1
N. A.
70 ± 1
2
126 ± 1
80%
63 ± 1
4
160 ± 1
27%
40 ± 1
User Capacity Benchmark
4 Processors
2 Processors
1 Processor
120
ICAMark Score
100
80
60
40
20
0
0
10
20 30
40 50 60
70 80
90 100 110 120 130 140 150 160 170 180 190 200
# of Simulated Users
The results conclude that the performance of the Dell PowerEdge 6650 with 4
processors enabled and 160 concurrent simulated users, is equivalent to the
performance of 2 processors enabled with 126 concurrent simulated users, which
is equivalent to the performance of 1 processor enabled with 70 concurrent
simulated users.
In other words, the user experiences on each system with the number of
simulated users shown above are equivalent. As CPUs are added to the server,
the increase in performance is not linear thus allowing fewer users per processor
as they are added to the system.
At optimal load, the following counters were noted:
3
# of Processors
1
2
4
% Processor
Utilization
65%
65%
60%
Average PQL
70
120
55
2,332,816
1,531,280
1,174,824
1
3
1
Available Memory
in KBytes
Current DQL
When the benchmark test ended, the following PTE information was noted:
# of Processors
# of Users at End
Available PTEs
1
2
4
100
150
200
87,554
55,544
23,941
On the single and dual processor systems, the bottleneck is the processor. This
is evident by the sustained 65% processor utilization and the high processor
queue lengths.
On the quad processor system, the bottleneck is believed to be the system bus.
By looking at the graphs for the quad system, the processor utilization is 60%
and the processor queue length is 55, lower than the single and dual systems at
the failure point. As more processors are added to the system the bandwidth of
the system bus becomes contentious resulting in the non-linear scalability shown
here. A bottleneck in the system bus would cause contention between the
processors and threads in memory causing slower script execution time.
4
Performance Monitor Statistics for Single processor system:
Single Pr oce s s or Sys te m
100
90
Processor Queue Length
80
70
60
50
40
30
20
10
10
0
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
# of Sim ulate d Us e r s
Single Pr oce s s or Sys te m
100
90
Processor Utilization
80
70
60
50
40
30
20
10
10
0
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
# of Sim ulate d Us e r s
Single Pr oce s s or Sys te m
3500000
2500000
2000000
1500000
1000000
500000
10
0
90
95
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
0
10
Memory Available KBytes
3000000
# of Sim ulate d Us e r s
5
Single Pr oce s s or Sys te m
12
11
Current Disk Queue Length
10
9
8
7
6
5
4
3
2
1
95
10
0
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
# of Sim ulate d Us e r s
Performance Monitor Statistics for Dual processor system:
75
80
85
90
9
105
100
115
110
125
120
135
0
13
5
14
0
14
155
150
5
65
70
55
60
40
45
50
5
10
15
20
25
30
35
Processor Queue Length
Dual Processor System
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
0
# of Sim ulate d Us e rs
Dual Processor System
100
90
70
60
50
40
30
20
10
70
75
80
85
90
9
10 5
100
115
110
125
120
135
0
13
5
14
140
155
150
5
60
65
0
5
10
15
20
25
30
35
40
45
50
55
% Processor Time
80
# of Sim ulate d Us e rs
6
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
10
100
115
110
125
120
135
0
13
5
14
140
5
15
150
5
16
160
175
170
5
18
0
18
195
190
205
0
Processor Queue Length
75
80
85
90
9
105
100
115
110
125
120
135
0
13
5
14
0
14
155
150
5
65
70
55
60
40
45
50
5
10
15
20
25
30
35
Current Disk Queue Length
70
75
80
85
90
9
105
100
115
110
125
120
135
130
5
14
140
155
150
5
60
65
5
10
15
20
25
30
35
40
45
50
55
Memory Available KBytes
4000000
Dual Processor System
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
# of Sim ulate d Us e rs
Dual Processor System
18
16
14
12
10
8
6
4
2
0
# of Sim ulate d Us e rs
Performance Monitor Statistics for Quad processor system:
100
Quad Processor System
90
80
70
60
50
40
30
20
10
0
# of SIm ulated Users
7
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
0
13
5
14
140
5
15
150
5
16
160
175
170
5
18
0
18
195
190
205
0
Current Disk Queue Length
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
130
145
140
155
150
5
16
160
175
170
5
18
0
18
195
190
205
0
Memory Available KBytes
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
10 5
100
115
110
125
120
135
0
13
5
14
0
14
155
150
5
16
160
175
170
5
18
0
18
195
190
205
0
% Processor Time
100
Quad Processor System
90
80
70
60
50
40
30
20
10
0
# of SIm ulated Users
Quad Processor System
4000000
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
# of Sim ulated Users
25
Quad Processor System
20
15
10
5
0
# of Sim ulated Users
8
Hyper-Threading’s Effect on User Capacity
Hyper-Threading is technology developed by Intel that enables a single physical
processor to appear as two logical processors. This technology was introduced in
the Pentium IV line of processors. Hyper-Threading allows multi-threaded
programs to take advantage of extra execution units on the processor resulting
in as much as a 30% performance increase to some applications. Note that the
benefit of Hyper-Threading is only seen with applications that are multithreaded.
MetaFrame and the applications it hosts can also benefit from Hyper-Threading.
Note that the increase in performance from Hyper-Threading is highly dependent
on the type of application that is running on the server.
The benchmark test was performed on a Dell 2650 dual processor and a Dell
6650 quad processor with Hyper-Threading capable Pentium IV processors.
Dell 2650 Results
This benchmark was performed on the following:
Server Hardware Configurations
Dell PowerEdge 2650
Dual Processor - 2.2GHz Xeon with 512KB L2 Cache
400MHz Front Side Bus
16 GB HDD with Dell PERC 3/Di Raid Controller
4 GB RAM
4 GB Page File
Citrix MetaFrame XP Feature Release 2/Service Pack 2
Microsoft Windows 2000 Advanced Server with Service Pack 2
Microsoft Office 97
Clients:
Dual Pentium P3 667 w/256 kb Cache
256 MB RAM
9 GB HDD with Adaptec SCSI Controller
Citrix ICA Program Neighborhood Client version 6.30.1050
Microsoft Windows 2000 Service Pack 2
9
Hyper-threading
# of Simulated Users
% Performance Increase
Off
116 ± 1
N. A.
On
131 ± 1
11.5%
User Capacity Benchmark
120
Hyper-threading Enabled
Hyper-threading Disabled
ICAMark Score
100
80
60
40
20
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
# of Simulated Users
The results conclude that the performance of the Dell PowerEdge 2650 with
Hyper-Threading enabled servicing 131 concurrent simulated users is equivalent
to the performance of the Dell PowerEdge 2650 with Hyper-Threading disabled
servicing 116 concurrent users. The lower Processor Queue Length on the
system with Hyper-Threading enabled allows for faster execution of the scripts
and accounts for the 11.5% increase in capacity.
At optimal load, the following counters were noted:
Hyper-Threading
% Processor Utilization
Average PQL
Available Memory in
KBytes
Current DQL
On
Off
70%
65%
70
115
1,876,108
1,680,124
3
1
When the benchmark test ended, the following PTE information was noted:
10
160
Hyper-Threading
On
Off
# of Users at End
150
150
52,201
52,608
Available PTEs
Performance Monitor Statistics for Dell 2650 Hyper-Threading Enabled:
Dell 2650 Hyper-Threading On
100
Processor Queue Length
90
80
70
60
50
40
30
20
10
13
0
13
5
14
140
155
0
55
60
65
70
75
80
85
90
9
10 5
100
115
0
11
5
12
120
5
45
50
35
40
25
30
10
15
20
5
0
# of Sim ulate d Us e rs
Dell 2650 Hyper-Threading On
100
90
70
60
50
40
30
20
10
55
60
65
70
75
80
85
90
9
10 5
100
115
0
11
5
12
120
5
13
0
13
5
14
140
155
0
45
50
35
40
25
30
10
15
20
0
5
% Processor Time
80
# of Sim ulate d Us e rs
11
Dell 2650 Hyper-Threading On
4000000
Available Memory KBytes
3500000
3000000
2500000
2000000
1500000
1000000
500000
45
50
55
60
65
70
75
80
85
90
9
105
100
115
0
11
5
12
120
5
13
130
5
14
140
155
0
30
35
40
10
15
20
25
5
0
# of Sim ulate d Us e rs
Dell 2650 Hyper-Threading On
Current Disk Queue Length
25
20
15
10
5
13
0
13
5
14
140
5
65
70
75
80
85
90
9
105
100
115
0
11
5
12
120
5
55
60
45
50
35
40
25
30
5
10
15
20
0
# of Sim ulate d Us e rs
Performance Monitor Statistics for Dell 2650 Hyper-Threading Disabled:
Dell 2650 Hyper-Threading Off
120
80
60
40
20
70
65
80
85
90
95
10
100
115
110
5
12
0
12
5
13
130
145
0
14
155
0
60
65
50
55
40
45
30
35
20
25
10
15
0
5
Processor Queue Length
100
# of Sim ulate d Us e rs
12
70
65
80
85
90
95
10
100
115
110
5
12
0
12
5
13
130
5
14
0
14
155
0
60
65
50
55
40
45
30
35
20
25
5
10
15
Current Disk Queue Length
50
55
60
65
70
65
80
85
90
95
10
100
115
110
125
0
12
135
130
145
0
14
155
0
40
45
30
35
5
10
15
20
25
Memory Available KBytes
70
65
80
85
90
95
10
100
115
110
5
12
0
12
5
13
130
5
14
0
14
155
0
60
65
50
55
40
45
30
35
20
25
5
10
15
% Processor Time
Dell 2650 Hyper-Threading Off
100
90
80
70
60
50
40
30
20
10
0
# of Sim ulate d Us e rs
4000000
Dell 2650 Hyper-Threading Off
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
# of Sim ulate d Us e rs
25
Dell 2650 Hyper-Threading Off
20
15
10
5
0
# of Sim ulate d Us e rs
13
Dell 6650 Results
This benchmark was performed on the following:
Server Hardware Configurations
Dell PowerEdge 6650
Quad Processor - 1.6GHz Xeon with 256 KB L2 and 1 MB L3 Cache
400 MHZ Front Side Bus
35 GB HDD with Dell PERC 3/DC Raid Controller
3.5 GB RAM
4 GB Page File
Citrix MetaFrame XP Feature Release 2/Service Pack 2
Microsoft Windows 2000 Advanced Server with Service Pack 2
Microsoft Office 97
Clients:
Dual Pentium P3 667 w/256 kb Cache
256 MB RAM
9 GB HDD with Adaptec SCSI Controller
Citrix ICA Program Neighborhood Client version 6.30.1050
Microsoft Windows 2000 Service Pack 2
Hyper-threading
# of Simulated Users
% Performance Increase
Off
158 ± 1
N. A.
On
160 ± 1
1%
14
User Capacity Benchmark - Dell 6650
120
Hyper-threading Enabled
Hyper-threading Disabled
ICAMark Score
100
80
60
40
20
0
0
20
40
60
80
100
120
140
160
180
200
# of Simulated Users
The results conclude that the performance of the Dell PowerEdge 6650 with
Hyper-Threading enabled servicing 160 concurrent users is equivalent to the
performance of the Dell PowerEdge 6650 with Hyper-Threading disabled
servicing 158 concurrent users
At optimal load, the following counters were noted:
Hyper-Threading
% Processor Utilization
Average PQL
Available Memory in
KBytes
Current DQL
On
Off
60%
65%
55
115
1,174,824
1,122,072
1
3
When the benchmark test ended, the following PTE information was noted:
Hyper-Threading
On
Off
# of Users at End
200
180
23,941
37,168
Available PTEs
With Hyper-Threading enabled on a quad processor system, there are eight
logical CPUs. In this instance, the benefit of Hyper-Threading is no longer
15
evident. As the processing capacity of the server increases, less of a bottleneck is
placed on the processor’s execution resources and more on the system bus
speed, thus resulting in only a 1% increase in performance with Hyper-Threading
enabled.
Performance Monitor Statistics for Dell 6650 Hyper-Threading Enabled:
Dell 6650 Hyper-Threading On
100
Processor Queue Length
90
80
70
60
50
40
30
20
10
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
10
100
115
110
125
120
135
0
13
5
14
140
5
15
150
5
16
160
175
170
5
18
0
18
195
190
205
0
0
# of SIm ulated Users
Dell 6650 Hyper-Threading On
100
90
70
60
50
40
30
20
10
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
0
13
5
14
0
14
155
150
5
16
160
175
170
5
18
0
18
195
190
205
0
% Processor Time
80
# of SIm ulated Users
16
50
55
60
65
70
75
80
85
90
9
105
0
10
115
110
125
120
5
13
130
145
0
14
5
15
150
5
16
0
16
175
170
185
0
35
40
45
15
20
25
30
5
10
Processor Queue Length
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
0
13
5
14
140
5
15
150
5
16
160
175
170
5
18
0
18
195
190
205
0
Current Disk Queue Length
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
130
145
140
155
150
5
16
160
175
170
5
18
0
18
195
190
205
0
Memory Available KBytes
4000000
Dell 6650 Hyper-Threading On
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
# of Sim ulated Users
25
Dell 6650 Hyper-Threading On
20
15
10
5
0
# of Sim ulated Users
Performance Monitor Statistics for Dell 6650 Hyper-Threading Disabled:
Dell 6650 Hyper-Threading Off
140
120
100
80
60
40
20
0
# of Sim ulate d Us e rs
17
65
70
75
80
85
90
9
105
0
10
115
110
125
120
5
13
130
145
0
14
5
15
150
5
16
160
175
170
185
0
50
55
60
15
20
25
30
35
40
45
5
10
Current Disk Queue Length
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
105
100
115
110
125
120
135
130
145
0
14
5
15
150
5
16
160
175
170
185
0
Memory Available KBytes
# of Sim ulate d Us e rs
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
9
10 5
100
115
110
125
120
5
13
130
145
0
14
5
15
150
5
16
0
16
175
170
185
0
5
10
% Processor Time
100
Dell 6650 Hyper-Threading Off
90
80
70
60
50
40
30
20
10
0
4000000
Dell 6650 Hyper-Threading Off
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
# of Sim ulate d Us e rs
25
Dell 6650 Hyper-Threading Off
20
15
10
5
0
# of Sim ulate d Us e rs
18
Summary
In conclusion, user capacity does not scale linearly when adding processors to the
system. There is an 80% increase in performance when moving from a single to a dual
processor system. There is a 27% increase in performance when comparing a dual to a
quad processor system.
The increase in performance from Hyper-Threading is highly dependent on the type of
applications running on the server. The true benefit of Hyper-Threading is seen on the
dual processor system where the processor is the bottleneck, thus allowing an additional
11.5% increase in capacity when enabled. Note when sizing MetaFrame XP servers, the
number of actual users per server varies based on the type of applications deployed.
19