Not all tests are created equal
Tests should be selected that will draw on the strengths of the child. Of special consideration is the degree of verbal content on a test. Tests with significant verbal content should be used with caution when assessing linguistically and culturally diverse students. Ideally, a variety of information related to achievement, aptitude, and intelligence should be collected.
Achievement Tests—measures what a child knows or understands about a content area (i.e., math). Commonly administered achievement tests include: Iowa Test of Basic Skills (ITBS); California Achievement Test (CAT); and the Stanford Achievement Test. The ACT Assessment used for college entrance falls under the category of an achievement test.
Aptitude Tests—Predict future performance in a particular domain. Examples of such tests include: SAT Reasoning Test (SAT); and the Differential Aptitude Test (DAT).
Intelligence Tests—samples behavior already learned in an attempt to predict future learning. The Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV); the Stanford-Binet Intelligence Scales-Fifth Edition (SB-V); and the Naglieri Nonverbal Ability Test (NNAT) are examples of IQ measures.
In some cases, diagnostic tests may also be used to assess learning disabilities, cognitive difficulties, or emotional disorders (i.e., ADHD, dysgraphia, visual motor deficits).
Understanding Testing Lingo
Validity and Reliability
The instruments selected should be valid and reliable. Information relating to validity and reliability studies can be found in the instruments technical manual, on the publisher's Web site, or in test review publications.
Validity refers to the degree in which an instrument measures what it purports to measure. Three types of validity are typically reported.
Content Validity—the degree to which the questions on the test adequately cover or are representative of the domain (intelligence, creativity, leadership, etc.) under consideration.
Construct Validity—the degree to which an instrument measures the domain or construct that is purports to measure.
Criterion-Predictive Validity—the degree to which the test can predict performance on another measure that assesses the same area in a different way.
Reliability refers to the degree to which a test is consistent and stable over time in measuring what it is intended to measure.
Norming Samples
Norming samples used should be representative of the most recent census data, for this reason, tests that have not been renormed in more than 10 years should be avoided. The demographics of the students being tested should to the greatest extent possible match those of the norming sample.
Scores
Various types of scores are provided on assessments. These might include raw scores, standard scores, grade- and age-equivalent scores, percentile ranks, and stanines.
Raw Scores—the number of items answered correctly on the test. These scores are not comparable across tests and generally provide little information since they are not placed in any sort of context.
Standard Scores—are basically raw scores that have been translated using a conversion table provided with the test so that a student's performance can be compared to others of the same age or grade level. Unlike percentile ranks, standard scores are expressed in standard deviation units on a normal curve and are comparable across tests.
Grade- and Age-Equivalent Scores—estimates that are used to describe a student's score in terms of a grade or age level in which the student is functioning. These scores are often misinterpreted. For example, if a fourth grade student receives a grade equivalent score of 8.1 on the reading portion of a grade-level achievement test, this does not mean the student is reading at the eighth grade level. It means that this student reads fourth grade material as well as the average eighth grader would read it.
Percentile Ranks—indicates the percentage of others that the student did better than on the test. For example, a person scoring at the 88th percentile, did better than 88 percent of those in the norming sample.
Stanines—short for standard nine, these scores range from 1 to 9. A stanine of 1, 2, or 3 is considered below average whereas stanine scores of 7, 8, or 9 are above average.
Parents decide to have their child tested for a variety of reasons: to qualify for specialized programming, for data to use in advocacy efforts, and to just confirm suspicions about advanced ability. Whatever the reason, parents should approach testing in a careful and informed manner (Information extrapolated and adapted from Duke University)