April Release (RR7): Performance Review

advertisement
April Release: Performance Review
• High-level PPE system diagram
• Definitions
• Current PPE performance summary
• <app1> performance
• <app2> performance
• <app3> performance
• <app4> performance
• May RR expectation list
• Links to test detail
1
High-level PPE system diagram
2
Definitions
Performance is property of a software system which indicates its ability to be as powerful, fast, stable, and scalable as required.
Ramp-up test (also known as capacity test):
Virtual users are steadily incremented until performance saturation occurs (adding more virtual users results in response time
growth rather than growth of transactions processed per second or in system failure).
The goal of test is to find point of system saturation and describe it: how many transactions/sec, network traffic/sec, etc.
Ramp-up test reveals how powerful system is.
Short low-, mid-, and high-load tests:
Test is running with fixed number of virtual users for a specific period of time to measure response times under different load
conditions determined from the ramp-up test.
These tests are contingent on results of ramp-up test because “low”, “mid”, and “high” levels of load are relative to the level of
saturation. High load is usually deemed to be 80% of the saturation load.
These tests show how fast system is.
Longevity test (also known as soak test or stability test):
Very long high load test to test system stability.
Longevity test shows how stable system is.
Rush-hour test:
Rapid increase of number of users logging onto the system (imitating start of business day) followed by rapid user logout imitating
activity at the end of business hours on the low-load background. In case of tight project schedule this test can be replaced with
login storm test.
This test shows how stable system is.
Scalability tests:
Series of ramp-up tests conducted against various system configurations (different number of application servers, different number
of CPU cores, etc.)
Scalability tests show how scalable system is, i.e. how its capacity changes with increase in power of resources provided.
See details here: <link>
3
April Release Summary
Overall: Amber
Positive factors:

Increased <..> capacity in comparison to previous cycle (from 16 to 21
transactions/sec)

Good <..> scalability

Average of average <..> response time decreased from 1.4 to 1.1 seconds
<app3>: Amber

<..> capacity per server 3.3 times better than requirement
<app4>: Amber

Stable <..> system behavior during rush-hour tests

<..> system was stable under high load, memory leak was not reproduced in long
high-load test

<..> system capacity increased comparing to March RR (now it is back to the level of
February RR), issue with <..> -side Search cache was not reproduced

Overall <..> "Meet KPQP" share increased (<..> and News are main contributors)

Average of average <..> response time decreased from 0.6 to 0.4 seconds

Slightly better overall <..> performance compared to the previous <..> tests in PPE
(higher capacity)

Less average <..> response time for all VSM URLs compared to the previous test (11%)

<..> Funds unavailability issue was not reproduced so this led to 100% KPQP pass
status for " <..> - Funds" domain

Overall <..> KPQP pass status increased: Search, <..>, and Funds views and sub-component
are main contributors

Good <..> scalability
<app1>: Amber
<app2>: Amber
4
April Release Summary
Overall: Amber
<app1>: Amber
<app2>: Amber
<app3>: Amber
<app4>: Amber
RAG status demotion factors:

"Authorization failed" errors can cause problems with <..> users logon during rushhours

" <..> - FI (Debt)" domain <..> showed 15% KPQP fail status (0% KPQP failed in March
RR)

Average response time within “<..> – FI (Debt)” domain in <..> increased from 1.9 to
3.3 seconds

<..> system capacity is ~20% worse vs. March RR

<..> system scalability is poor

<..> total open defects and issues 2.6 times increase (from 8 to 21)

<..> memory leak has not been fixed yet

Overall <..> "Meet KPQP" share decreased significantly (from 83% to 57%), every
requests group followed the overall trend

Average of average <..> response time increased from 1.22 to 2.40 seconds

Disproportional <..> memory consumption growth leads to system instability when
adding extra load after the saturation point

40% of all <..> KPQP transactions do not meet target under medium load

Network traffic volume impact on <..> performance (due to environmental issue) was
detected

Number of open <..> defects and issues increased from 8 to 11

<..> March RR was tested in CP April RR environment

Instability of critical <..> back-ends: <..> WS and DocumentStore_1

Instability of <..> back-ends: DidYouKnowService, mstService, NewsSvc_1,
ItemService_1, etc.
5
<..> General Status
Positive factors:


Increased capacity in comparison to
previous cycle (from 16 to 21
transactions/sec)
Amber
April KPQP Status
2%
Good scalability
4%
RAG status demotion factors:

~2% of transactions do not meet KPQP
target under low load

"Authorization failed" errors can cause
problems with users logon during rushhours
94%
Meet KPQP
(average <= 3 sec)
Miss KPQP
(3 sec < average <= 5 sec)
Fail KPQP
(average > 5 sec)
Test Fail
6
<..> Capacity
Transactions / sec
Network traffic (MB/sec)
% CPU usage
7
KPQP Status
Meet KPQP
(average <= 3 sec)
Miss KPQP
(3 sec < average <= 5 sec)
Fail KPQP
(average > 5 sec)
Test Fail
Data for Medium Load8
<..> Response Times

Average of average response time decreased from 1.4 to 1.1 seconds

Average response time within “FI (Debt)” domain increased from 1.9 to 3.3 seconds
Data for Medium Load9
<..> General Status
Positive factors:

Capacity per server 3.3 times better than
requirement

Stable system behavior during rush-hour tests
Amber
<..> April KPQP Status
1%
RAG status demotion factors:

23%
System capacity is ~20% worse vs.
March RR
57%
19%

System scalability is poor

Total open defects 2.6 times increase
(from 8 to 21)

Memory leak has not been fixed yet
Meet KPQP
(average <= target)
Miss KPQP
(target < average <= 2x target)
Fail KPQP
(average > 2x target)
Test Fail
10
<..> Capacity
Transactions / sec
Network traffic (KB/sec)
% CPU usage
11
<..> KPQPs

Overall “Meet KPQP” share decreased significantly, every requests group followed the overall trend
Meet KPQP
(average <= target)
Miss KPQP
(target < average <= 2x target)
Fail KPQP
(average > 2x target)
Test Fail
Data for Medium Load12
<..> Response Times

Average of average response times increased from 1.22 to 2.40 seconds

Average response times of every requests group followed the overall trend
Data for Medium Load13
Open <..> Defects & Issues
Issues:
698
1441
1464
Defects:
401248
401253
Defects:
337022
337235
354507
354529
354907
337243
336856
337039
337034
337038
337518

Total number of open defects and environmental issues increased from 13 to 28

11 re-opened defects: missing KPQP target, mostly <..> and <..>_Select handlers

2 new defects: missing KPQP target for “5PerCentMetadataUpdatedDownload” and
“CC.43Fields.5Instruments”

3 new env issues: 2 Performance Center related + 1 related to <..> server instability
Strategic
defects
Re-opened
defects
New
defects
Strategic
env. issues
New
env. issues
14
<..> General Status
Positive factors:

System was stable under high load, memory leak was
not reproduced in long high-load test

Maximum system capacity increased comparing to
March RR (now it is back to the level of February RR)
Amber
<..> April KPQP Status
0%
RAG status demotion factors:

Disproportional memory consumption growth leads
to system instability when adding extra load after
the saturation point

40% of all KPQP transactions do not meet target
under medium load

Network traffic volume impact on <..> performance
(due to env issue) was detected

Number of open defects and issues increased from
8 to 11
20%
20%
60%
Meet KPQP
(average <= target)
Miss KPQP
(target < average <= 2x target)
Fail KPQP
(average > 2x target)
Test Fail
15
<..> Capacity

Transactions/sec value returned to the level of February RR – issue with <..>-side Search cache
was not reproduced
Transactions / sec
Network traffic (MB/sec)
% CPU usage
16
<..> KPQPs

Overall “Meet KPQP” share increased (<..> and News are contributors)
Meet KPQP
(average <= target)
Miss KPQP
(target < average <= 2x target)
Fail KPQP
(average > 2x target)
Test Fail
Data for Medium Load17
<..> Response Times

Average of average response time decreased from 0.6 to 0.4 seconds

Primarily this is due to the not reproduced issues with <..>-side Search cache which was met in March
RR testing

<..> and News response times are better now as well
Data for Medium Load18
Open <..> Defects & Issues
Defect
#401868
Issue #1567
Defect
#381090

New env issue (#1567): if network traffic volume is increased from 40 Mbit/sec to 100 Mbit/sec on average
response times are two times worse, the bigger response the worse impact is

New defect (#401868): non-proportional memory consumption if saturation level of load is exceeded leads to
instability

Re-opened defect (#381090): SaveBinaries.512K transaction response time does not meet KPQP
Strategic
defects
Re-opened
defects
New
defects
Strategic
env. issues
New
env. issues
19
<..> General Status
Positive factors:

Slightly better overall performance opposite to the
previous <..> tests in PPE (higher capacity)

Less average response time of all VSM URLs comparing
to the previous test (-11%)


Amber
<..> KPQP Status
Funds unavailability issue was not reproduced so this led
to 100% KPQPs pass for "Funds" domain
2%
9%
Good scalability.
RAG status demotion factors:

Environment is unstable

<..> March RR was tested in CP April RR
environment

Instability of critical back-ends: <..>WS and
DocumentStore_1

Instability of back-ends: DidYouKnowService,
mstService, NewsSvc_1, ItemService_1, etc.
89%
Meet KPQP
(average <= 3 sec)
Miss KPQP
(3 sec < average <= 5 sec)
Fail KPQP
(average > 5 sec)
Test Fail
20
<..> Capacity
Transactions / sec
Network traffic (MB/sec)
% CPU usage
21
<..> KPQP Status

Overall KPQP pass status increased: Search, <..>, and Funds views and sub-component are main contributors
Meet KPQP
(average <= 3 sec)
Miss KPQP
(3 sec < average <= 5 sec)
Fail KPQP
(average > 5 sec)
Test Fail
Data for Medium Load22
<..> Response Times
Data for Medium Load23
May RR Expectation List
Application
What we expect
<..>
<..> implements new caching for equity views, so we expect that <..>
performance improves, especially for these views.
<..>
We expect "HTTP Status-Code=500 (Authorization failed: ..." issue (836) to
be fixed to prevent failures with users logins especially in rush-hour.
<..>
Moving <..> servers to Win2008. We expect that performance is at least the
same as it was before.
<..>
<..> implements new caching for Equity <..> views, so we expect that
performance of these requests improves.
<..>
Memory leak is expected to be fixed
<..>
Traffic volume will not affect <..> performance since issue with network is
claimed to be fixed (1567)
<..>
If <..> is upgraded to <..>, we expect performance improvement. Also, <..> >>
<..> migration should make it possible to test with wider universe of
instruments (<..>) which now causes <..> to crash
24
Links to Test Detail
<application 1>
•
Test results reports: <link>
•
End of cycle <link>
<application 2>
•
Test results reports: <link>
•
End of cycle summary: <link>
<application 3>
•
Test results reports: <link>
•
End of cycle summary: <link>
<application 4>
•
Test results reports: <link>
General status: <link>
25
Download