What Is Test Reliability/Precision?

Chapter 5

What Is Reliability/Precision?

Measurement error: variations in measurement using a reliable instrument.

Reliable test: is one we can trust to measure each person in approximately the same way every time it is used.

Classical Test Theory

True score (T): is a measure of the amount of the attribute that the test is designed to measure.

Random error: The second part of an observed test score consists of random errors that occur anytime a person takes a test (E).

Classical Test Theory

True Score

Random Error

Systematic Error

Classical Test Theory

The Formal Relationship Between Reliability/Precision and Random Measurement Error

Parallel

Reliability coefficient: the correlation between the two sets of test scores

Three Categories of Reliability Coefficients

Test–retest method: a test developer gives the same test to the same group of test takers on two different occasions.

Correlation: the scores from the first and second administrations are then compared.

Practice effects: occur when test takers benefit from taking the test the first time (practice), which enables them to solve problems more quickly and correctly the second time.

Three Categories of Reliability Coefficients

Alternate-Forms Method

Alternate forms: the test developer creates two different forms of the test.

Order effects: changes in test scores resulting from the order in which the tests were taken.

Parallel forms: describes different forms of the same test.

Three Categories of Reliability Coefficients

Internal consistency method: is a measure of how related the items (or groups of items) on the test are to one another.

Split-half method: is to divide the test into halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half.

Three Categories of Reliability Coefficients

Homogeneous tests: measuring only one trait or characteristic.

Heterogeneous tests: measuring more than one trait or characteristic.

Three Categories of Reliability Coefficients

Scorer Reliability

Scorer reliability or interscorer agreement: the amount of consistency among scorers’ judgments

Intrascorer reliability: whether each clinician was consistent in the way he or she assigned scores from test to test.

The Reliability Coefficient

Adjusting Split-Half Reliability Estimates

Other Methods of Calculating Internal Consistency

The Reliability Coefficient

Calculating Scorer Reliability/Precision and Agreement

Interrater agreement: an index of how consistently the scorers rate or make decisions.

Intrarater agreement: when one scorer makes judgments, the researcher also wants assurance that the scorer makes consistent judgments across all tests.

Interpreting Reliability Coefficients

Calculating the Standard Error of Measurement

Standard error of measurement (SEM): is an estimate of how much the individual’s observed test score (X) might differ from the individual’s true test score (T).

Interpreting the Standard Error of Measurement

Interpreting Reliability Coefficients

Confidence Intervals

Confidence interval–a range of scores that we feel confident will include the test taker’s true score.

Factors That Influence Reliability

Test Length

Homogeneity

Test–Retest Interval

Factors That Influence Reliability

Test Administration

Scoring

Cooperation of Test Takers

Generalizability Theory

Generalizability theory: an approach to estimating reliability/precision.

