Fisier rezumat (application/msword)

advertisement
IMPROVEMENT OF NIST STATISTICAL TESTS
EMIL SIMION1
In cadrul acestei lucrări prezentăm o serie de îmbunătăţiri ale procesului de
decizie asupra aleatorismului propus de Institutul Naţional de Standardizare din
SUA în cadrul ghidului SP 800-22, prin calculul probabilităţii de acceptare a unei
ipoteze false.
In this paper we propose an improvement of the decision regarding the
randomness, proposed by National Institute of Standards and Technologies (NIST)
in the guideline Statistical Test Suite (STS) Special Publication (SP) 800-22, on
computing the second order error (the probability of acceptance a false hypothesis).
Key words: cryptographic statistical testing, NIST SP 800-22.
1. Introduction
We need to develop statistical tools for testing the degree of randomness of
binary sequences used for cryptographic applications. Such tools exist in the field
of public cryptography area:
i) Donald Knuth’s book [1], The Art of Computer Programming,
Seminumerical Algorithms, describes several empirical tests which
include the: frequency, serial, gap, poker, coupon collector’s,
permutation, run, maximum-of-t, collision, birthday spacing’s, and
serial correlation;
ii) The Crypt-XS suite of statistical tests was developed by researchers at
the Information Security Research Centre at Queensland University of
Technology in Australia. Crypt-XS tests include the frequency, binary
derivative, change point, runs, sequence complexity and linear
complexity;
iii) The DIEHARD suite of statistical tests developed by George Marsaglia
consists of fifteen tests, namely the: birthday spacings, overlapping
permutations, ranks of 31 x 31 and 32 x 32 matrices, ranks of 6x8
matrices, monkey tests on 20-bit Words, monkey tests OPSO
(Overlapping-Pairs-Sparse-Occupancy),
OQSO
(Overlapping1
Associate professor, University Politehnica of Bucharest, Faculty of Applied Sciences, Romania,
e-mail: esimion@fmi.unibuc.ro
Emil Simion
Quadruples-Sparse- Occupancy), DNA, count the 1’s in a stream of
bytes, count the 1’s in specific bytes, parking lot, minimum distance,
random spheres, squeeze, overlapping sums, runs, and craps. Additional
information may be found in G. Marsaglia [2];
iv) NIST 800-22 [3] is a publication of sixteen statistical tests, which can
be founded at the Internet page of Computer Security Research Center
[4] among with an implementation of this tool. We must remark the
fact that NIST 800-22 was one of the cryptographic tools, which was
involved in evaluation of the candidates for Advanced Encryption
Standard (FIPS PUB 197).
In this paper we propose the extension of National Institute of Standards
and Technologies (NIST) Statistical Test Suite (STS) Special Publication (SP)
800-22 on computing the second order error (the probability of acceptance a false
hypothesis). The paper is organized as follows. In section 2 we present the
concept of statistical testing and the rules for decision regarding the randomness
of a binary sequence. Section 3 is focused on NIST decision procedures about
randomness and some extensions of test interpretation. These extensions are about
the computing of the probability of a false hypothesis which is exemplified in
section 4 in conjunction with the presentation, in section 5, of a comparative
graphical interpretation of the probability of rejection of a true hypothesis.
Finally, in section 6 we conclude.
2. Statistical testing
A statistical test provides a mechanism for making decisions, using data,
about a binary sequence x{0,1}, which usually represents the output of a source.
The aim is to decide whether there is enough evidence to "reject" a conjecture or
hypothesis about the sequence. The hypothesis to be tested represents an
assumption that may or may not be true and it is called a statistical hypothesis.
A statistical test requires a pair of hypothesis regarding the sequence to be
tested:
i) The null hypothesis H0 - the sequence x is produced by a binary memory
less source: Pr(X=1) = p0 and thus Pr(X=0) = 1-p0 (in this case we say that the
sequence does not present any predictable component);
ii) The alternative hypothesis H1 - the sequence x is produced by a binary
memory less source: Pr(X=1) = p1 and Pr(X=0) = 1-p1 with p0  p1, (in this case
we say that the sequence present a predictable component regarding the
probability p).
Two types of errors can result from a statistical test: the first order error the risk of rejecting the null hypothesis when it is in fact true, also called the level
of significance:
Improvement of NIST statistical tests
= Pr (reject H0 | H0 is true) = 1 - Pr (accept H0 | H0 is true)
and the second order error - the risk of failing to reject (accepting) the null
hypothesis when it is in fact false,
= Pr (accept H0 | H0 is false) = 1 – Pr (reject H0 | H0 is false).
The values 1- and 1- are called the specificity2 respectively the power
(or sensitivity3) of the test.
These two errors  and  can't be minimized simultaneous since the risk 
increases as the risk  decreases and vice versa (Neymann-Pearson tests minimize
the value of  for a given ).
The following table gives the relationship between the truth of the null
hypothesis and outcomes of the test.
Conclusion
Reject H0
Accept H0
Real situation
H0 is true
H0 is false
 (false positive result)
1- (true positive result)
1- (true negative result)
 (false negative result)
There is other error that the evaluator can make during the testing process: type
III error4 when is to ask the wrong question and use the wrong null hypothesis.
The analysis plan of the statistical test includes decision rules for rejecting
the null hypothesis. These rules are described in two ways:
i) Decision based on confidence intervals: an example of decision rule is the
following: for a fixed value of  we find the confidence region for the test statistic
and check if the test statistic value is in the confidence region. The confidence
region is computed using the quantiles of order /2 and 1 - /2 (for example the
quantile u of order  is defined by Pr(X < u) = 
ii) Decision based on P-value: Let us denote ftest the value of test function.
Another equivalent method is to compare the P - value = Pr(X < ftest) with  and
decide the randomness if P-value  .
3. State of the art of NIST testing decision functions
We need to develop dynamically statistical tools for testing the degree of
randomness of binary sequences. Such tools exist in the field of public
cryptography area. NIST 800-22 [3] is a publication of sixteen statistical tests,
which can be founded at the Internet page of Computer Security Research Center
[4] among with an implementation of this tool. We must remark the fact that
2
Specificity measures the proportion of negatives which are correctly identified.
Sensitivity measures the proportion of actual positives which are correctly identified.
4
These error was defined for the first time in system theory.
3
Emil Simion
NIST 800-22 was one of the cryptographic tools, which was involved in
evaluation of the candidates for Advanced Encryption Standard (FIPS PUB 197).
NIST SP 800-22 is a tutorial including 16 statistical randomness tests, used for
evaluating pseudorandom generators as well as random number generators. Tests
stipulated in NIST 800-22 are made on 1st degree risk (1% probability to reject a
true hypothesis). These tests are not independent, making difficult to calculate a
global rejection rate. Each of these statistical tests highlights a certain fault type
proper to regularity deviation. Besides, none of these tests calculates 2nd degree
risk (the probability to accept a false hypothesis), therefore we considered useful to
modify them in order to calculate this probability.
NIST 800-22 provides two methods for integrating the 16 tests results, namely:
i) percentage of passed tests;
ii) uniformity of P values.
The experiments revealed these decision rules were insufficient and therefore
we considered their improvement would be useful. Therefore, in Oprina [5], were
introduced new integration methods for these tests:
i) majority decision;
ii) maximum value decision, based on the max value of test statistics. In
this case we compute the maximum value of the results, the
distribution of the new statistics being normal;
iii) sum of square decision, based on the sum of squares of the tests
results, the distributions in this case is 2, the freedom degrees given
by the number of partial results which are being integrated.
It is important to remark that the level of significance of the test suite is
hard to estimate because the tests are not independent.
Thus, NIST tests can be generalized in the following way:
i) generalize NIST STS 800-22 for an arbitrary 
ii) compute the 2nd degree risk for each statistical test proposed in NIST
STS 800-22
iii) derive and compute the minimum sample size to achieve the desired
probability errors
iv) introducing and implementing new decision rules.
4. Proof of concept: computing the 2nd risk in statistical frequency
test
Let us show an example of a well-known simple statistical test, called the
frequency test. It is used to test the randomness of a sequence of zeroes and ones.
In fact, it tests the closeness of the proportion of ones to 0.5.
Input: binary sequence s=s1,…,sn. Denote probability p0 of occurrences of symbol
1, and q0 = 1-p0 the occurrences of symbol 0.
Improvement of NIST statistical tests
Output: The decision of acceptation or rejection of randomness, that the sequence
sN is the output of a symmetric binary source with Pr(S = 1) = p0, the alternative
hypothesis being Pr(S = 1) = p1  p0.
STEP 0. Read the sequence s and the rejection rate .
STEP 1. Compute the test function5:
1  n

f 
  si  np0 .
np0 q0  i 1

STEP 2. If f  [u/2, u1-/2] we accept the hypothesis of randomness, else we
reject.
STEP 3. Compute the second error probability:
 pq 
 pq 
n( p1  p 0 )  
n( p1  p 0 )  
0 0 
   0 0  u  
 
u 
 p1q1  1 2
 p1q1  2
np0 q0  
np0 q0  




5. Graphical interpretation
In figure 2 we show the graphical interpretation of a statistical test, in case
of testing the null hypothesis H0: m = m0 = 0 against the alternative H1: m  m0.
The reference distribution in this case is the normal one given by:
Fig. 1. Critical region of a statistical test at 0.01 level of significance.
5
It can be easy proven that the random variable f is normal N(0;1).
Emil Simion
In real situation the value of the variance  is unknown and need to be
s
estimated. An unbiased estimation6 of the variance  is
, where
n
s
_
1 n
( xi  x) 2 , and n is the size of the sample.

n  1 i 1
_
n ( x  m0 )
Test outcome will be f 
, the distribution of f is Student
s
distribution7 with n-1 degrees of freedom.
The level of significance of the test was setup at = 0.01. For a sample of size
n=1000 the critical region is (-;-2.33)  (2.33, ).
Thus the probability to reject the null hypothesis in the situation that is true is
Pr (f(-;-2.33)  (2.33, )|m=0)= 0.01,
and the probability to accept the null hypothesis, in the situation that is true is
0.99.
Direct computations lead to the probability of acceptance:

 (m; n)  Gn 1  t
n 1;1



n ( m  m0 ) 
n ( m  m0 ) 
  Gn 1  t  
,

 n 1;

s
s
2



2

where Gn-1(x) is the repartition function of t(n-1).
The value of second error risk depends on the alternative: simulating a N(0,1)
we obtain a sample of n=1000 values for which the computed value of  is equal
to 0,0565. Closer values for the alternatives m to m0, that is | m -m0 | , implies,
for the same type of test and the same size of the sample, bigger variations for .
Figure 2 reflects the above discussion, for the values of  = 1, m0 = 0 and
_
x  1 (obtained from the samples values).
 s 
E 
   .
 n
6
That is
7
Student distribution or t-distribution has the following repartition probability density function:
 n 1


2   t2

1 
f (t ) 
n
 n  
n  
2
Gamma function.




n 1
2
, where n is the number of degrees of freedom and  is the
Improvement of NIST statistical tests
6. Conclusion
We have presented statistical techniques principles used in cryptographic
evaluation. These statistical principles are NIST SP 800-22 based and are
improved by computing the second order error. In fact this error is the probability
of acceptance a false hypothesis that is to accept as random a sequence, which in
fact is not produced by a random source. Also there are discussed some
improvements of integration the result of statistical test applied on different
samples. One open problem is to derive formulas for the minimum sample size to
guarantee a good estimation of the confidence interval for each statistics. Another
question is to derive estimates on the power of a NIST SP 800 22 test suite, which
is difficult to compute because the statistical tests are not quite independent.
BIBLIOGRAPHY
[1] Donald Knuth, The Art of Computer Programming, Seminumerical Algorithms, Volume 2, 3rd
edition, Addison Wesley, Reading, Massachusetts, 1998.
[2] George Marsaglia, DIEHARD Statistical Tests: http://stat.fsu.edu/ geo/diehard.html.
[3] *** NIST Special Publication 800-22, A Statistical Test Suite for Random and Pseudorandom
Number Generators for Cryptographic Applications, 2001.
[4] *** NIST standards: http://www.nist.gov/, http://www.csrc.nist.gov/.
[5] A. Oprina, A. Popescu, E. Simion and Gh. Simion, Walsh-Hadamard Randomness Test and
New Methods of Test Results Integration, Bulletin of Transilvania University of Braşov, vol. 2(51)
Series III-2009, pg. 93-106.
Download