Empirical Education: Scientifically Based Research

Showing posts with label Scientifically Based Research. Show all posts

Monday, December 5, 2011

Need for Product Evaluations Continues to Grow

There is a growing need for evidence of the effectiveness of products and services being sold to schools. A new release of SIIA’s product evaluation guidelines is now available at the Selling to Schools website (with continued free access to SIIA members), to help guide publishers in measuring the effectiveness of the tools they are selling to schools.

It’s been almost a decade since NCLB made its call for “scientifically-based research,” but the calls for research haven’t faded away. This is because resources available to schools have diminished over that time, heightening the importance of cost benefit trade-offs in spending.
NCLB has focused attention on test score achievement, and this metric is becoming more pervasive; e.g., through a tie to teacher evaluation and through linkages to dropout risk. While NCLB fostered a compliance mentality—product specs had to have a check mark next to SBR—the need to assure that funds are not wasted is now leading to a greater interest in research results. Decision-makers are now very interested in whether specific products will be effective, or how well they have been working, in their districts.

Fortunately, the data available for evaluations of all kinds is getting better and easier to access. The US Department of Education has poured hundreds of millions of dollars into state data systems. These investments make data available to states and drive the cleaning and standardizing of data from districts. At the same time, districts continue to invest in data systems and warehouses. While still not a trivial task, the ability of school district researchers to get the data needed to determine if an investment paid off—in terms of increased student achievement or attendance—has become much easier over the last decade.

The reauthorization of ESEA (i.e., NCLB) is maintaining the pressure to evaluate education products. We are still a long way from the draft reauthorization introduced in Congress becoming a law, but the initial indications are quite favorable to the continued production of product effectiveness evidence. The language has changed somewhat. Look for the phrase “evidence based”. Along with the term “scientifically-valid”, this new language is actually more sophisticated and potentially more effective than the old SBR neologism. Bob Slavin, one of the reviewers of the SIIA guidelines, says in his Ed Week blog that “This is not the squishy ‘based on scientifically-based evidence’ of NCLB. This is the real McCoy.” It is notable that the definition of “evidence-based” goes beyond just setting rules for the design of research, such as the SBR focus on the single dimension of “internal validity” for which randomization gets the top rating. It now asks how generalizable the research is or its “external validity”; i.e., does it have any relevance for decision-makers?

One of the important goals of the SIIA guidelines for product effectiveness research is to improve the credibility of publisher-sponsored research. It is important that educators see it as more than just “market research” producing biased results. In this era of reduced budgets, schools need to have tangible evidence of the value of products they buy. By following the SIIA’s guidelines, publishers will find it easier to achieve that credibility.

Monday, March 29, 2010

Research: From NCLB to Obama’s Blueprint for ESEA

We can finally put “Scientifically Based Research” to rest. The term that appeared more than 100 times in NCLB appears zero times in the Obama administration’s Blueprint for Reform, which is the document outlining its approach to the reauthorization of ESEA. The term was always an awkward neologism, coined presumably to avoid simply saying “scientific research.” It also allowed NCLB to contain an explicit definition to be enforced—a definition stipulating not just any scientific activities, but research aimed at coming to causal conclusions about the effectiveness of some product, policy, or laboratory procedure.

A side effect of the SBR focus has been the growth of a compliance mentality among both school systems and publishers. Schools needed some assurance that a product was backed by SBR before they would spend money, while textbooks were ranked in terms of the number of SBR-proven elements they contained.

Some have wondered if the scarcity of the word “research” in the new Blueprint might signal a retreat from scientific rigor and the use of research in educational decisions (see, for example, Debra Viadero’s blog). Although the approach is indeed different, the new focus makes a stronger case for research and extends its scope into decisions at all levels.

The Blueprint shifts the focus to effectiveness. The terms “effective” or “effectiveness” appear about 95 times in the document. “Evidence” appears 18 times. And the compliance mentality is specifically called out as something to eliminate.

“We will ask policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes. ... And across programs, we will focus less on compliance and more on enabling effective local strategies to flourish.” (p. 35)

Instead of the stiff definition of SBR, we now have a call to “policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes.” Thus we have a new definition for what’s expected: carefully analyzing impact. The call does not go out to researchers per se, but to policymakers and educators at all levels. This is not a directive from the federal government to comply with the conclusions of scientists funded to conduct SBR. Instead, scientific research is everybody’s business now.

Carefully analyzing the impact of practices on student outcomes is scientific research. For example, conducting research carefully requires making sure the right comparisons are made. A study that is biased by comparing two groups with very different motivations or resources is not a careful analysis of impact. A study that simply compares the averages of two groups without any statistical calculations can mistakenly identify a difference when there is none, or vice versa. A study that takes no measure of how schools or teachers used a new practice—or that uses tests of student outcomes that don’t measure what is important—can’t be considered a careful analysis of impact. Building the capacity to use adequate study design and statistical analysis will have to be on the agenda of the ESEA if the Blueprint is followed.

Far from reducing the role of research in the U.S. education system, the Blueprint for ESEA actually advocates a radical expansion. The word “research” is used only a few times, and “science” is used only in the context of STEM education. Nonetheless, the call for widespread careful analysis of the evidence of effective practices that impact student achievement broadens the scope of research, turning all policymakers and educators into practitioners of science. — DN

Monday, October 15, 2007

Congress Grapples with the Meaning of “Scientific Research”

Good news and bad news. As reported recently in Education Week(Viadero, 2007, October 17), pieces of legislation currently being put forward contain competing definitions for scientific research. The good news is that we may finally be getting rid of the obtuse and cumbersome term “Scientifically Based Research.” Instead we find some of the legislation using the ordinary English phrase “scientific research” (without the legalese capitalization). So far, the various proposals for NCLB reauthorization are sticking with the idea that school districts will find scientific evidence useful in selecting effective instructional programs and are mostly just tweaking the definition.

So why is the definition of scientific research important? This gets to the bad news. It is important because the definition—whatever it turns out to be—will determine which programs are, in effect, on an approved list for purchase with NCLB funds.

Let’s take a look at two candidate definitions, just focusing on the more controversial provisions.

* The Education Sciences Reform Act of 2002 says that research meeting its “scientifically based research standards” makes “claims of causal relationships only in random assignment experiments or other designs (to the extent such designs substantially eliminate plausible competing explanations for the obtained results) ”
* However, the current House proposal (the Miller-McKeon Draft) defines “principles of scientific research” as guiding research that (among other things) makes “strong claims of causal relationships only in research designs that eliminate plausible competing explanation for observed results, which may include but shall not be limited to random assignment experiments.”

Both say essentially the same thing, but the new wording takes the primacy off random assignment and puts it on eliminating plausible competing explanations. We see the change as a concession to researchers who find random assignment too difficult to pull off. These researchers are not, however, relieved of the requirement to eliminate competing explanations (for which randomized control remains the most effective method). Meanwhile, another bill, introduced recently by Senators Lugar and Bingaman takes a radically different approach to a definition.

* This bill defines what it means for a reading program to be “research–proven” and proposes the requirements for the actual studies that would “prove” that the program is effective. Among the minimum criteria described in the proposal are:

* The program must be evaluated in not less than two studies in which:
* The study duration was not less than 12 weeks.
* The sample size of each study is not less than five classes or 125 students per treatment (10 classes or 250 students overall). Multiple smaller studies may be combined to reach this sample size collectively.
* The median difference between program and control group students across all qualifying studies is not less than 20 percent of student-level standard deviation, in favor of the program students.

As soon as legislation tries to be this specific, counter examples immediately leap to mind. For example, we are currently conducting a study of a reading program that fits the last two points but, because the program is designed as a 10-week intervention, it can never become research-proven under this definition. Another oddity is that the size of the impact and the size of the sample are specified, but not the level of confidence required—it is unlikely we would have any confidence in a finding of a 0.2 effect size with only 10 classrooms in the study. Perhaps the most unacceptable part of this definition is the term “research-proven.” This is far too strong and absolute. It suggests that as soon as two small studies are completed, the program gets a perpetual green light for district purchases under NCLB.

As odd as this definition may be, we can understand why it was introduced. The most prevalent interpretation of the requirement for “Scientifically Based Research” in NCLB has been that the program under consideration should have been written and developed based on findings derived from scientific research. It was not required that the program itself have any scientific evidence of effectiveness. The Lugar-Bingaman proposal calls for scientific tests of the program itself. In Reading First, programs that had actual evidence of effectiveness were famously left off the approved list, while programs that simply claimed to be designed based on prior scientific research were put on. This proposal will help to level the playing field. To avoid the traps that open up when specific designs are legislated, perhaps the law could call for the convening of a broadly representative panel to hash out the differences between competing sets of criteria rather than enshrine one abbreviated set in federal law.

But even with consensus on the review criteria for acceptable research (and for explaining the trade–offs to the consumers of the research reviews at the state and local level), we are still left with an approved list—a set of programs with sufficient scientific evidence of effectiveness to be purchased. Meanwhile new programs (books, software, professional development, interventions, etc.) are becoming available every day that have not yet been “proven.”

There is a relatively simple fix that would help democratize the process for states and districts that want to try something because it looks promising but has not yet been “proven” in a sufficient number of other districts. Wherever the law says that a program must have scientific research behind it, also allow the state or district to conduct the necessary scientific research as part of the federal funding. So for example, where the Miller–McKeon Draft calls for

“a description of how the activities to be carried out by the eligible partnership will be based on a review of scientifically valid research,”

simply change that to

“a description of how the activities to be carried out by the eligible partnership will be based on a review of, or evaluation using, scientifically valid research.”

Similarly, a call for

“including integrating reliable teaching methods based on scientifically valid research”

can instead be a call for

“including integrating reliable teaching methods based on, or evaluated by, scientifically valid research.”

This opens the way for districts to try things they think should work for them while helping to increase the total amount of research available for evaluating the effectiveness of new promising programs. Most importantly, it turns the static approved list into a process for continuous research and improvement. —DN