Why Virginia Gov. Youngkin is wrong about student NAEP scores
What Youngkin’s team got wrong is that the two tests have different meanings of what it means to be “proficient” in a subject, so comparisons aren’t valid. NAEP has long been the subject of criticism over its achievement levels — and for too many years those levels have been equated to grade-level skills. They aren’t.
Youngkin isn’t the first politician to misinterpret NAEP scores and then use that bad interpretation to bash public schools.
As Post reporters Hannah Natanson and Laura Vozzella wrote here, Virginia’s students largely perform remarkably well on standardized tests. (The argument can be made that standardized tests alone shouldn’t be used to determine how much kids have learned and can do and therefore schools shouldn’t be judged in any high-stakes way for test results — but this post is about issues with NAEP.)
Virginia Dept. of Education releases report on student achievement
This post is all about the real problems with NAEP, which is sometimes referred to as “the nation’s report card” or the “gold standard” in student assessment. It has been seen as the most consistent, nationally representative measure of U.S. student achievement since the 1990s and because it is supposed to be able to assess what students “know and can do.”
It is administered every two years to groups of U.S. students in the fourth and eighth grades (with test takers said to be randomly chosen within selected schools) and less frequently to high school students. Math and reading tests are given every two years. Tests in science, writing, the arts, civics, economics, geography, technology and engineering literacy, and U.S. history are given less often.
Here is a column on the NAEP by James Harvey, recently retired as executive director of the National Superintendents Roundtable — a group of about 100 superintendents from around the country. He can be reached at [email protected].
The ‘nation’s report card’ says it assesses critical thinking in history — but NAEP gets an F on that score
Every couple of years, public alarm spikes over reports that only one-third of American students are performing at grade level in reading and math. No matter the grade — fourth, eighth or 12th — these reports claim that tests designed by the federal government, the National Assessment of Educational Progress (NAEP), demonstrate that our kids can’t walk and chew gum at the same time. It’s nonsense.
In fact, digging into the data on NAEP’s website reveals, for example, that 81 percent of American fourth-graders are performing at grade level in mathematics. Reading? Sixty-six percent. How could this one-third distortion come to be so widely accepted? Through a phenomenon that Humpty Dumpty described best to Alice in “Through the Looking Glass”: “When I use a word it means just what I choose it to mean.”
Here, the part of Humpty Dumpty was played by Reagan-era political appointees to a policy board overseeing NAEP. The members of the National Assessment Governing Board, most with almost no grounding in statistics, chose to define the term “proficient” as a desirable goal in the face of expert opinion that such a goal was “indefensible.”
Here’s a typical account from the New York Times in 2019 reporting on something that is accurate as far as it goes: results from NAEP indicate that only about one-third of fourth- and eighth-graders are “proficient” in reading.
But that statement quickly turns into the misleading claim that only one-third of American students are on grade level. The 74, for example, obtained $4 million from the Walton and DeVos foundations in 2015 by insisting that “less than half of our students can read or do math at grade-level.”
The claim rests on a careless conflation of NAEP’s “proficient” benchmark with grade-level performance. The NAEP assessment sorts student scores into three achievement levels — basic, proficient, and advanced. The terms are mushy and imprecise. Still, there’s no doubt that the federal test makers who designed NAEP see “proficient” as the desirable standard, what they like to describe as “aspirational.”
However, as Peggy Carr from the National Center for Education Statistics, which funds NAEP, has said repeatedly, if people want to know how many students are performing at grade level, they should be looking at the “basic” benchmark. By that logic, students at grade level would be all those at the basic level or above, which is to say that grade-level performance in reading and mathematics in grades 4, 8 and 12, is almost never below 60 percent and reaches as high as 81 percent.
And the damage doesn’t stop with NAEP. State assessments linked to NAEP’s benchmarks amplify this absurd claim annually, state by state.
While there’s plenty to be concerned about in the NAEP results, anxiety about the findings should focus on the inequities they reveal, not the proportion of students who are “proficient.”
Considering the expenditure of more than a billion dollars on NAEP over 50-odd years, one would expect that NAEP could defend its benchmarks by pointing to rock-solid studies of their validity and the science behind them. It cannot.
Instead, the department has spent the better part of 30 years fending off a scientific consensus that the benchmarks are absurd. Indeed, the science behind these benchmarks is so weak that Congress insists that every NAEP report include the following disclaimer: “[The Department of Education] has determined that NAEP achievement levels should continue to be used on a trial basis and should be interpreted with caution” (emphasis added).
Criticisms of the NAEP achievement levels
What is striking in reviewing the history of NAEP is how easily its policy board has shrugged off criticisms about the standards-setting process. The critics constitute a roll call of the statistical establishment’s heavyweights. Criticisms from the likes of the National Academy of Education, the Government Accounting Office, the National Academy of Sciences, and the Brookings Institution have issued scorching complaints that the benchmark-setting processes were “fundamentally flawed,” “indefensible,” and “of doubtful validity,” while producing “results that are not believable.”
How unbelievable? Fully half the 17-year-olds maligned as being just basic by NAEP obtained four-year college degrees. About one-third of Advanced Placement Calculus students, the crème de la crème of American high school students, failed to meet the NAEP proficiency benchmark. While only one-third of American fourth-graders are said to be proficient in reading by NAEP, international assessments of fourth-grade reading judged American students to rank as high as No. 2 in the world.
For the most part, such pointed criticism from assessment experts has been greeted with silence from NAEP’s policy board.
Proficient doesn’t mean proficient
Oddly, NAEP’s definition of proficiency has little or nothing to do with proficiency as most people understand the term. NAEP experts think of NAEP’s standard as “aspirational.” In 2001, two experts associated with NAGB made it clear that:
“[T]he proficient achievement level does not refer to “at grade” performance. Nor is performance at the Proficient level synonymous with ‘proficiency’ in the subject. That is, students who may be considered proficient in a subject, given the common usage of the term, might not satisfy the requirements for performance at the NAEP achievement level.”
Lewis Carroll’s insight into Humpty Dumpty’s hubris leads ineluctably to George Orwell’s observation that “[T]he slovenliness of our language makes it easier for us to have foolish thoughts.”
NAEP and international assessments
NAEP’s proficiency benchmark might be more convincing if most students abroad could handily meet it. That case cannot be made. Sophisticated analyses between 2007 and 2019 demonstrate that not a single nation can demonstrate that even 50 percent of its students can clear the proficiency benchmark in fourth-grade reading, while only three could do so in eighth-grade math and one in eighth-grade science. NAEP’s “aspirational” benchmark is pie-in-the-sky on a truly global scale.
NAEP is widely understood to be the “gold standard” in large-scale assessments. That appellation applies to the technical qualities of the assessment (sampling, questionnaire development, quality control and the like) not to the benchmarks. It is important to say that the problem with NAEP doesn’t lie in the assessments themselves, the students, or the schools. The fault lies in the peculiar definition of proficiency applied after the fact to the results.
Here are three simple things that could help fix the problem:
- The Department of Education should simply rename the NAEP benchmarks as low, intermediate, high, and advanced.
- The department should insist that the congressional demand that these benchmarks are to be used on a trial basis and interpreted with caution should figure prominently, not obscurely, in NAEP publications and on its website.
- States should revisit the decision to tie their “college readiness” standards to NAEP’s proficiency or advanced benchmarks. (They should also stop pretending they can identify whether fourth-graders are “on track” to be “college ready.”)
The truth is that NAEP governing board lets down the American people by laying the foundation for this confusion. In doing so, board members help undermine faith in our government, already under attack for promoting “fake news.” The “fake news” here is that only one-third of American kids are performing at grade level.
It’s time the Department of Education made a serious effort to stamp out that falsehood.