Sequential probability ratio test

Sequential probability ratio test

The sequential probability ratio test (SPRT) is a specific sequential hypothesis test, developed by Abraham Wald. [cite journal
first=Abraham
last=Wald
title=Sequential Tests of Statistical Hypotheses
journal=Annals of Mathematical Statistics
volume=16
issue=2
date=June, 1945
pages=117–186
url=http://links.jstor.org/sici?sici=0003-4851%28194506%2916%3A2%3C117%3ASTOSH%3E2.0.CO%3B2-7
doi=10.1214/aoms/1177731118
] Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a rule of thumb for when the all the data is collected (and its likelihood ratio known).

While originally developed for use in quality control studies in the realm of manufacturing, SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion. [Ferguson, Richard L. (1969). [http://eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED034406&ERICExtSearch_SearchType_0=no&accno=ED034406 The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction] . Unpublished doctoral dissertation, University of Pittsburgh.] [Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing (pp. 237-254). New York: Academic Press.] cite journal
author = Eggen, T. J. H. M.
year = 1999
title = Item Selection in Adaptive Testing with the Sequential Probability Ratio Test
journal = Applied Psychological Measurement
volume = 23
issue = 3
pages = 249-261
doi = 10.1177/01466219922031365
]

Theory

As in classical hypothesis testing, SPRT starts with a pair of hypotheses, say H_0 and H_1 for the null hypothesis and alternative hypothesis respectively. They must be specified as follows:

:H_0: p=p_0:H_1: p=p_1

The next step is calculate the cumulative sum of the log-likelihood ratio, Lambda_i, as new data arrives:

:S_i=S_{i-1}+ log Lambda_i

The stopping rule is a simple thresholding scheme:

* a < S_i < b: continue monitoring ("critical inequality")
* S_i geq b: Accept H_1
* S_i leq a: Accept H_0

where a and b (0) depend on the desired type I and type II errors, alpha and eta. They may be chosen as follows:

a approx log frac{ eta }{1-alpha} and b approx log frac{1-eta}{alpha}

In other words, alpha and eta must be decided beforehand in order to set the thresholds appropriately. The numerical value will depend on the application. The reason for using approximation signs is that, in the discrete case, the signal may cross the threshold between samples. Thus, depending on the penalty of making an error and the sampling frequency, one might set the thresholds more aggressively. Of course, the exact bounds may be used in the continuous case.

Example

A textbook example is parameter estimation of a probability distribution function. Let us consider the exponential distribution:

:f_ heta(x)= heta^{-1}expleft(-x/ heta ight), x, heta>0

The hypotheses are simply H_0: heta= heta_0 and H_1: heta= heta_1, with heta_1> heta_0. Then the log-likelihood function (LLF) for one sample is

:egin{align}log Lambda(x)&=log left [ frac{ heta_1^{-1}expleft(-x/ heta_1 ight)}{ heta_0^{-1}expleft(-x/ heta_0 ight)} ight] \&=log left [ frac{ heta_0}{ heta_1} exp left(x/ heta_0 - x/ heta_1 ight) ight] \&=frac{ heta_1- heta_0}{ heta_0 heta_1} x - log frac{ heta_1}{ heta_0}end{align}

The cumulative sum of the LLFs for all x is

:S_n=sum_{i=1}^n log Lambda(x_i)=frac{ heta_1- heta_0}{ heta_0 heta_1} sum_{i=1}^n x_i - n log frac{ heta_1}{ heta_0}

Accordingly, the stopping rule is:b

After re-arranging we finally find:b+n log frac{ heta_1}{ heta_0} < sum_{i=1}^n x_i < a+n log frac{ heta_1}{ heta_0}

The thresholds are simply two parallel lines with slope log ( heta_1/ heta_0 ). Sampling should stop when the sum of the samples makes an excursion outside the "continue-sampling region".

Applications

Manufacturing

The test is done on the proportion metric, and tests that a variable "p" is equal to one of two desired points, "p1" or "p2". The region between these two points is known as the "indifference region" (IR). For example, suppose you are performing a quality control study on a factory lot of widgets. Management would like the lot to have 3% or less defective widgets, but 1% or less is the ideal lot that would pass with flying colors. In this example, "p1 = 0.01" and "p2 = 0.03" and the region between them is the IR because management considers these lots to be marginal and is OK with them being classified either way. Widgets would be sampled one at a time from the lot (sequential analysis) until the test determines, within an acceptable error level, that the lot is ideal or should be rejected.

Testing of human examinees

The SPRT is currently the predominant method of classifying examinees in a variable-length computerized classification test (CCT). The two parameters are "p1" and "p2" are specified by determining a cutscore (threshold) for examinees on the proportion correct metric, and selecting a point above and below that cutscore. For instance, suppose the cutscore is set at 70% for a test. We could select "p1 = 0.65" and "p2 = 0.75" . The test then evaluates the likelihood that an examinee's true score on that metric is equal to one of those two points. If the examinee is determined to be at 75%, they pass, and they fail if they are determined to be at 65%.

These points are not specified completely arbitrarily. A cutscore should always be set with a legally defensible method, such as a modified Angoff procedure. Again, the indifference region represents the region of scores that the test designer is OK with going either way (pass or fail). The upper parameter "p2" is conceptually the highest level that the test designer is willing to accept for a Fail (because everyone below it has a good chance of failing), and the lower parameter "p1" is the lowest level that the test designer is willing to accept for a pass (because everyone above it has a decent chance of passing). While this definition may seem to be a relatively small burden, consider the high-stakes case of a licensing test for medical doctors: at just what point should we consider somebody to be at one of these two levels?

While the SPRT was first applied to testing in the days of classical test theory, as is applied in the previous paragraph , Reckase (1983) suggested that item response theory be used to determine the "p1" and "p2" parameters. The cutscore and indifference region are defined on the latent ability (theta) metric, and translated onto the proportion metric for computation. Research on CCT since then has applied this methodology for several reasons:

#Large item banks tend to be calibrated with IRT
#This allows more accurate specification of the parameters
#By using the item response function for each item, the parameters are easily allowed to vary between items.

ee also

*CUSUM
*Computerized classification test
*Wald test
*Likelihood-ratio test

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Sequential Probability Ratio Test — Inhaltsverzeichnis 1 Einleitung 2 Geschichte 3 Definition 3.1 Die Entscheidungsgrenzen 4 Beispiel …   Deutsch Wikipedia

  • Sequential analysis — In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data is evaluated as it is collected, and further sampling is stopped in accordance with a pre… …   Wikipedia

  • Sequential estimation — In statistics, sequential estimation refers to estimation methods in sequential analysis where the sample size is not fixed in advance. Instead, data is evaluated as it is collected, and further sampling is stopped in accordance with a pre… …   Wikipedia

  • Computerized classification test — A computerized classification test (CCT) refers to, as its name would suggest, a test that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as Pass or… …   Wikipedia

  • Parametrischer Test — Die Artikel Statistischer Test und Signifikanztest überschneiden sich thematisch. Hilf mit, die Artikel besser voneinander abzugrenzen oder zu vereinigen. Beteilige dich dazu an der Diskussion über diese Überschneidungen. Bitte entferne diesen… …   Deutsch Wikipedia

  • Statistischer Test — Ein statistischer Test dient in der mathematischen Statistik dazu, anhand vorliegender Beobachtungen eine begründete Entscheidung über die Gültigkeit oder Ungültigkeit einer Hypothese zu treffen. Formal ist ein Test also eine mathematische… …   Deutsch Wikipedia

  • probability theory — Math., Statistics. the theory of analyzing and making statements concerning the probability of the occurrence of uncertain events. Cf. probability (def. 4). [1830 40] * * * Branch of mathematics that deals with analysis of random events.… …   Universalium

  • test — 1. To prove; to try a substance; to determine the chemical nature of a substance by means of reagents. 2. A method of examination, as to determine the presence or absence of a definite disease or of some substance in any of the fluids, tissues,… …   Medical dictionary

  • SPRT — sequential probability ratio test; Sixty Plus Reinfarction Trial …   Medical dictionary

  • SPRT — Sequential Probability Ratio Test ( > IEEE Standard Dictionary ) …   Acronyms

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”