Wednesday, December 13, 2006

Liberal democracy, folk positivism, and test scores

This afternoon I had lunch with Stephen Turner, author of Liberal Democracy 3.0 (2003), which is about the relationship between expertise and liberal democracy. His take is a somewhat cynical but important one to consider, that we're inevitably moving towards (and have already shifted towards) a political system that liberally uses expertise as a way to displace political decisions out of political arenas. His book has been important to me as I try to revise the part of my book manuscript that focuses on the tension between technocracy and transparent political dialogue. Stephen told me it just wasn't going to happen, and while I disagree with his claim of inevitability, it's important to understand his argument.

There is a long tradition of debate over the nature of expertise and what we might call folk positivism (and philosophers call naive realism) attached to tests, the belief that a test score must mean something. The standard historiography of testing focuses on intelligence testing and points out the early 20th century confluence of scientific racism, the worshipping of numbers, and the creation of two new professions whose fates were intertwined (psychometrics and school administration). There are a number of arguments about what is putatively labeled governance by expertise, as Frank Fischer has argued for the last decade and a half, or a socially-driven statistics consciousness, as Theodore Porter has argued, and the early history of intelligence testing would fit in fairly well with either argument (and they overlap).

There are two flaws with the common description that wraps up test scores in a host of problems (such as the focus on intelligence tests). One is the implication that just because a social indicator is socially produced and maintains its legitimacy through social processes, it is therefore bankrupt. We can posit a not-quite-naive-realism argument something like this: "Sure, tests are flawed, and they embody a selective and arbitrary sampling of performance, but they're better than nothing even in the worst case and they're quite reasonable indicators if we accept that arbitrary can mean 'arrived at through a process of arbitration' rather than in the capriciousness sense." Hold your grumbling for a moment, because this is close to the arguments in favor of using test scores that you'll hear from politicians. The argument is a bit slippery, because there is an explicit side to it and then a nasty, twist-your-gut side to the argument. The explicit argument is a "this is better than nothing" statement, one that has a utilitarian perspective on information: we'd like perfect information, but we'll take imperfect over nothing. It falls close to casuistry in terms of a willingness to make decisions based on categorically-incomplete information. Arguing against this is essentially claiming, "Look, you don't know what you don't know." Try to win an argument with that claim in any context.

So if that's the easier side to deal with, what's the twist-your-gut, hidden argument? It's this: "Look, I'd like to sit down with you and talk in depth about what we should be expecting students to learn, but I'm a busy guy/gal, and I'm willing to displace the hard decisions into some forum that has the patina of neutral interest. There are these folks called test publishers and psychometricians who can produce a set of statistics, and we can then use those statistics to argue about what we should do with schools. But you're not going to get me to spend my time looking at DIFs and IRT charts and whatnot. I'd rather take the numbers and reports generated by experts and then decide what to do."

This argument matches the liberal-democratic purpose of expertise -- providing an interest-neutral forum for hashing things out without tearing the polity apart -- to the boundary-creating purpose of generating statistics, reports, etc--the claim that the public is welcome to talk about the meaning of what we produce outside the boundaries of our expertise, but the public has no right to invade the boundary within which the expertise works. Turner's argument is that we have crossed the line into an expertise-managed society not by the usurpation of democratic principles but by the workings of liberal democracy itself. In the same way that we defer some nasty decisions to courts with the imperfect but workable assumption that courts are neutral forums insulated from interests that can create "facts" we call decisions -- NOT that they're objective -- we're deferring important decisions about education with the assumption that tests are neutral opportunities to generate "facts" with which we can then make public policy decisions.

From Turner's perspective, we're already into this, and it's inevitable. We get test scores, and while we'll argue about which one is important, the genie is out of the bottle. We had a pleasant debate when I asked him to consider the Bowl Championship Series as a counterexample, since people who obsess about (or love) Division I-A football debate the BCS formula all the time. He countered that people accept ratings and just argue which ones are best, so the boundary shrinks a tiny bit to protect the machinery of rankings but allow debate over which ranking is best. I still think the BCS is a counter-example, because there are plenty who talk about a playoff system. I don't think I budged Turner on this. Part of my disagreement with Turner is with his assumption of inevitability and the philosopher's tendency to reduce the complexities of history and politics into archetypes and core arguments.

But a second argument I have with Turner is similar to the second flaw in the expertise critics'/historiographical argument, which is to conflate several different purposes of and influences upon expertise and test scores. I can think of at least six uses of test scores, and each also shapes the politics of test scores and the naive or not-so-naive realism with which they're treated.
  • Test as materialist tool I. This is the "NCLB is there to make money for McGraw-Hill, the Bush buddies" argument. Strength: it's appealing. Weakness: it oversimplifies the political roots of accountability, and there are too many counterexamples: there were no tests before George W. Bush was president, and in no state where he wasn't governor?
  • Test as materialist tool II. This is the "tests are culturally biased" argument, and it has a tautological truth in that tests will reflect something about the prevailing norms in society. Given the complex uses of tests, though, this is an incomplete explanation, and it can cut both ways. Right now, the NAACP in St. Petersburg, Florida, is suing the Pinellas County school system for providing an unequal education. Their evidence? Test scores.
  • Test as status tool. This is the argument tied into the professionalization of expertise around the turn of the 20th century, and if we froze time in 1930, you'd have a lot more persuasive evidence than now about the power of psychometrics as an occupation. Plenty of professional assessment experts warn about the fragility of tests as measures, and such professional skeptics are often ignored.
  • Test as administrative tool. This is the other argument tied into the professionalization of expertise, except this time from the standpoint of school administrators. Of course, there's the small irony that administrators built up the tools of testing to gain autonomy, and the successors of those tests have now become a tool for bashing administrators (and other educators).
  • Test as liberal-democratic political tool. This is the claim that Turner makes, that test scores (and other social instruments) thrive less because of technical considerations than because they serve as what a colleague calls seemingly-neutral numerical designations. Sir Humphrey from the British sitcom Yes, Prime Minister might put it something like this: "We solicit smart people, bring them to a nice resort, ply them with food, and ask them to produce some numbers for us or a way to produce numbers that they then tell us the meaning of. If we didn't do that, every government would fall within about a week from the failure of Parliament to understand what a standard error is, and for that matter, I don't want to be quizzed on it, either. It's much better to get experts to do all the dirty work, and then we can decide what to do about the results."
  • Test as an assertion of technical accuracy. It is this elision, from the not-quite-naive realism to naive realism, that drives many of us bonkers. But this implication helps grease the skids for the other uses of statistics, because it's not really necessary. Point out the errors of test scores, and the response will acknowledge the flawed and then return to their importance.
Each of these uses is contested, and it is not inevitable that we're stuck with test-score accountability. Nonetheless, I find Turner's arguments incredibly important, in part for helping me to see the different distinctions in test use.

No comments:

Post a Comment