Friday, January 8, 2010

Rigor AND Relevance

One of the conversations at the Institute of Education Sciences (the federal research agency) in 2010 is about rigor. How do we adhere to strict rules about what is accepted as scientific evidence while making the work sponsored by the agency more relevant to educators, as the director, John Easton, wants to do?

The conflict between rigor and relevance arises for a number of reasons that we will illustrate in this entry. The basic problem arises when rigor is defined in terms of specific methodologies such as randomized experiments or a specific criterion such as a 95% confidence interval. Defining rigor by such procedural rules restricts the body of evidence to a small number of studies and to a narrow range of questions that can be answered with the methods that would be considered acceptable. Our position is not that the education sciences have to become less rigorous in order to become more relevant. Instead, our position is that the concept of scientific rigor is being misunderstood.

Rigor, in ordinary English, is used to suggest rigidly following rules and procedures. However, because blind adherence to procedures is inappropriate in any area of science, the usage within the education sciences needs clarification and realignment. Our suggestion to IES is to focus on the underlying scientific principles rather than the procedures and criteria derived from the principles. Here are some examples.

The standard rules of research assume that a positive outcome identified in a study is very unlikely to be an artifact of a particular sample. There is a very important principle behind this that must be rigorously understood by researchers. And the appropriate statistical calculations must be applied. The rigor, however, is in understanding the trade-off between mistaking a false positive result for a real result or erroneously rejecting a positive effect of a new program as statistically insignificant when, in fact, there is a real difference. Scientific practice favors protecting against the first kind of mistake and conventionally sets the bar high. But changing the trade-off to favor avoiding the mistake of considering a program ineffective when it is really effective would not constitute less rigor. Faced with a very serious problem, a policy maker may prefer the risk of spending money on something that might not work rather than rejecting a promising program that narrowly missed the conventional threshold for statistical significance.

Randomization provides another example. The fundamental principle that has to be understood is how results of quantitative studies can be biased by confounding and how controlling for the effects of confounders produces a more accurate estimate of the treatment effect. While randomizing units (e.g., teachers, grade-level teams, schools) into treatment and control groups is recognized as the gold standard for controlling for the effects of potential confounding variables so as to isolate the impact of treatment, rigor is not accomplished by restricting education science to randomized experiments. A relevant study can often benefit from the use of observational data stored in school district information systems. Rigor would then consist of understanding how other designs and statistical controls can be appropriately applied to reduce potential bias (and when statistics can’t fix a bad design). There is nothing rigorous in discarding a dataset outright because it has not been created in a fully controlled experimental setting or because it is not free of measurement error.

While controlling selection bias through experimental designs and statistical adjustments must be understood by education scientists, it is also essential to attend to the context of the study and the range of its generalizability—what we can usefully conclude from the research. The experiment itself may have interfered with usual processes (a situation called ecological invalidity) such as when teacher-level randomization breaks up the existing team teaching within a grade-level team. We need a record of differences in program implementation that shows the relationship between quality of implementation and student performance and also prevents us from mistaking attributes of better-implementing teachers for attributes of the program. While the world of schools can be a messy place to conduct research, taking implementation issues seriously in the study design does not equal less rigor.

Ultimately, it comes down to knowing what we can say to the stakeholders, whether they are educators, publishers, or government agencies. What can be said derives from rigorous application of research principles and, to some extent, calls upon the art of careful audience-sensitive communication. It is not more rigorous to leave out the results of post-hoc explorations. Rigor in education science includes framing the results with appropriate cautions about preliminary findings, limitations on generalization, and results that are interesting and warrant continued tracking or more targeted investigations. Making progress in education science calls for rigor, and rigor includes clear communication and the participation of stakeholders in interpreting results.