Empirical Education: scientific research

Showing posts with label scientific research. Show all posts

Monday, March 29, 2010

Research: From NCLB to Obama’s Blueprint for ESEA

We can finally put “Scientifically Based Research” to rest. The term that appeared more than 100 times in NCLB appears zero times in the Obama administration’s Blueprint for Reform, which is the document outlining its approach to the reauthorization of ESEA. The term was always an awkward neologism, coined presumably to avoid simply saying “scientific research.” It also allowed NCLB to contain an explicit definition to be enforced—a definition stipulating not just any scientific activities, but research aimed at coming to causal conclusions about the effectiveness of some product, policy, or laboratory procedure.

A side effect of the SBR focus has been the growth of a compliance mentality among both school systems and publishers. Schools needed some assurance that a product was backed by SBR before they would spend money, while textbooks were ranked in terms of the number of SBR-proven elements they contained.

Some have wondered if the scarcity of the word “research” in the new Blueprint might signal a retreat from scientific rigor and the use of research in educational decisions (see, for example, Debra Viadero’s blog). Although the approach is indeed different, the new focus makes a stronger case for research and extends its scope into decisions at all levels.

The Blueprint shifts the focus to effectiveness. The terms “effective” or “effectiveness” appear about 95 times in the document. “Evidence” appears 18 times. And the compliance mentality is specifically called out as something to eliminate.

“We will ask policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes. ... And across programs, we will focus less on compliance and more on enabling effective local strategies to flourish.” (p. 35)

Instead of the stiff definition of SBR, we now have a call to “policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes.” Thus we have a new definition for what’s expected: carefully analyzing impact. The call does not go out to researchers per se, but to policymakers and educators at all levels. This is not a directive from the federal government to comply with the conclusions of scientists funded to conduct SBR. Instead, scientific research is everybody’s business now.

Carefully analyzing the impact of practices on student outcomes is scientific research. For example, conducting research carefully requires making sure the right comparisons are made. A study that is biased by comparing two groups with very different motivations or resources is not a careful analysis of impact. A study that simply compares the averages of two groups without any statistical calculations can mistakenly identify a difference when there is none, or vice versa. A study that takes no measure of how schools or teachers used a new practice—or that uses tests of student outcomes that don’t measure what is important—can’t be considered a careful analysis of impact. Building the capacity to use adequate study design and statistical analysis will have to be on the agenda of the ESEA if the Blueprint is followed.

Far from reducing the role of research in the U.S. education system, the Blueprint for ESEA actually advocates a radical expansion. The word “research” is used only a few times, and “science” is used only in the context of STEM education. Nonetheless, the call for widespread careful analysis of the evidence of effective practices that impact student achievement broadens the scope of research, turning all policymakers and educators into practitioners of science. — DN

Wednesday, November 5, 2008

Climate Change: Innovation

Congratulations to Barack Obama on his sweeping victory. We can expect a change of policy climate with a new administration bringing new players and new policy ideas to the table. The appointment of a new director of the Institute of Education Sciences will provide an early opportunity to set direction for research and development. Reauthorization of NCLB and related legislation — including negotiating the definition and usage of “scientific research” — will be another, although pundit consensus was that this change will take two more years, given the urgency of fixing the economy and resolving the war in Iraq. But already change is in the air with proposals for dramatic shifts in priorities. Here we raise a question about the big new idea that is getting a lot of play: innovation.

Educational innovation being called for includes funding for research and development [R&D (with a capital D for a focus on new ideas)], acquisition of school technology, and funding for dissemination of new charter school models. The Brookings Institution recently published a policy paper Changing the Game: The Federal Role in Supporting 21st Century Educational Innovation by Sara Mead and Andy Rotherham. The paper imagines a new part of the US Department of Education called the Office of Educational Entrepreneurship and Innovation (OEEI) that would be charged with the job of implementing “a game-changing strategy [that] requires the federal government to make new types of investments, form new partnerships with philanthropy and the nonprofit sector, and act in new ways to support the growth of entrepreneurship and innovation within the public education system” (p34). The authors see this as complementary to standards-based reform, which is yielding diminishing returns. “To reach the lofty goals that standards-based reform has set, we need more than just pressure. We need new models of organizing schooling and new tools to support student learning that are dramatically more effective or efficient than what schools doing today” (p35).

As an entrepreneurial education business, we applaud the idea behind the envisioned OEEI. The question for us arises when we think about how OEEI would know whether a game-changing model is “dramatically more effective or efficient.” How will the OEEI decide which entrepreneurs should receive or continue to receive funds? Although the authors call for a “relentless focus on results,” they do not say how results would be measured. The venture capital (VC) model bases success on return on investment. Many VC investments fail but, if a good percentage succeeds, the overall monetary return to the VC is positive. While venture philanthropies often work the same way, the profits go back into supporting more entrepreneurs instead of back to the investors. Scaling up profitably is a sufficient sign of success. Perhaps we can assume that parents, communities, and school systems would not choose to adopt new products if they were ineffective or inefficient. If this were true, then scaling up would be an indirect indication of educational effectiveness. Will positive results for innovations in the marketplace be sufficient, or should there perhaps be a role for research to determine their effectiveness?

The authors suggest a $300 million per year “Grow What Works” fund of which less than 5% would be set aside for “rigorous independent evaluations of the results achieved by the entrepreneurs” (p48). Similarly, their suggestion for a program like the Defense Advanced Research Projects Agency (DARPA) would allow only up to 10%. Budgeting research at this level is unlikely to have much influence over what is likely to be an overwhelming imperative for market success. Moreover, what will be the role of independent evaluations if they fail to show the innovation to be dramatically more effective or efficient? Funding research as a set-aside from a funded program is always an uphill battle because it appears to take money away from the core activity. So let‘s be innovative and call this R&D with the intention of empowering both the R and the D. Rather than offer a token concession to the research community, build ongoing formative research and impact evaluations into the development and scale-up processes themselves. This may more closely resemble the “design-engineering-development” activities that Tony Bryk describes.

Integrating the R with the D will have two benefits. First it will provide information to federal and private funding agencies on the progress toward whatever measurable goal is set for an innovation. Second, it will help the parents, communities, and school systems make informed decisions about whether the innovation will work locally. The important partner here is the school district, which can take an active role in evaluation as well as development. These are the entities that ultimately have to decide whether the innovations are more effective and efficient that what they already do. They are also the ones with all the student, teacher, financial, and other data needed to conduct quasi-experiments or interrupted time series studies. If an agency like OEEI is created, it should insist that school districts become partners in the R&D for innovations they consider introducing. —DN

Monday, October 15, 2007

Congress Grapples with the Meaning of “Scientific Research”

Good news and bad news. As reported recently in Education Week(Viadero, 2007, October 17), pieces of legislation currently being put forward contain competing definitions for scientific research. The good news is that we may finally be getting rid of the obtuse and cumbersome term “Scientifically Based Research.” Instead we find some of the legislation using the ordinary English phrase “scientific research” (without the legalese capitalization). So far, the various proposals for NCLB reauthorization are sticking with the idea that school districts will find scientific evidence useful in selecting effective instructional programs and are mostly just tweaking the definition.

So why is the definition of scientific research important? This gets to the bad news. It is important because the definition—whatever it turns out to be—will determine which programs are, in effect, on an approved list for purchase with NCLB funds.

Let’s take a look at two candidate definitions, just focusing on the more controversial provisions.

* The Education Sciences Reform Act of 2002 says that research meeting its “scientifically based research standards” makes “claims of causal relationships only in random assignment experiments or other designs (to the extent such designs substantially eliminate plausible competing explanations for the obtained results) ”
* However, the current House proposal (the Miller-McKeon Draft) defines “principles of scientific research” as guiding research that (among other things) makes “strong claims of causal relationships only in research designs that eliminate plausible competing explanation for observed results, which may include but shall not be limited to random assignment experiments.”

Both say essentially the same thing, but the new wording takes the primacy off random assignment and puts it on eliminating plausible competing explanations. We see the change as a concession to researchers who find random assignment too difficult to pull off. These researchers are not, however, relieved of the requirement to eliminate competing explanations (for which randomized control remains the most effective method). Meanwhile, another bill, introduced recently by Senators Lugar and Bingaman takes a radically different approach to a definition.

* This bill defines what it means for a reading program to be “research–proven” and proposes the requirements for the actual studies that would “prove” that the program is effective. Among the minimum criteria described in the proposal are:

* The program must be evaluated in not less than two studies in which:
* The study duration was not less than 12 weeks.
* The sample size of each study is not less than five classes or 125 students per treatment (10 classes or 250 students overall). Multiple smaller studies may be combined to reach this sample size collectively.
* The median difference between program and control group students across all qualifying studies is not less than 20 percent of student-level standard deviation, in favor of the program students.

As soon as legislation tries to be this specific, counter examples immediately leap to mind. For example, we are currently conducting a study of a reading program that fits the last two points but, because the program is designed as a 10-week intervention, it can never become research-proven under this definition. Another oddity is that the size of the impact and the size of the sample are specified, but not the level of confidence required—it is unlikely we would have any confidence in a finding of a 0.2 effect size with only 10 classrooms in the study. Perhaps the most unacceptable part of this definition is the term “research-proven.” This is far too strong and absolute. It suggests that as soon as two small studies are completed, the program gets a perpetual green light for district purchases under NCLB.

As odd as this definition may be, we can understand why it was introduced. The most prevalent interpretation of the requirement for “Scientifically Based Research” in NCLB has been that the program under consideration should have been written and developed based on findings derived from scientific research. It was not required that the program itself have any scientific evidence of effectiveness. The Lugar-Bingaman proposal calls for scientific tests of the program itself. In Reading First, programs that had actual evidence of effectiveness were famously left off the approved list, while programs that simply claimed to be designed based on prior scientific research were put on. This proposal will help to level the playing field. To avoid the traps that open up when specific designs are legislated, perhaps the law could call for the convening of a broadly representative panel to hash out the differences between competing sets of criteria rather than enshrine one abbreviated set in federal law.

But even with consensus on the review criteria for acceptable research (and for explaining the trade–offs to the consumers of the research reviews at the state and local level), we are still left with an approved list—a set of programs with sufficient scientific evidence of effectiveness to be purchased. Meanwhile new programs (books, software, professional development, interventions, etc.) are becoming available every day that have not yet been “proven.”

There is a relatively simple fix that would help democratize the process for states and districts that want to try something because it looks promising but has not yet been “proven” in a sufficient number of other districts. Wherever the law says that a program must have scientific research behind it, also allow the state or district to conduct the necessary scientific research as part of the federal funding. So for example, where the Miller–McKeon Draft calls for

“a description of how the activities to be carried out by the eligible partnership will be based on a review of scientifically valid research,”

simply change that to

“a description of how the activities to be carried out by the eligible partnership will be based on a review of, or evaluation using, scientifically valid research.”

Similarly, a call for

“including integrating reliable teaching methods based on scientifically valid research”

can instead be a call for

“including integrating reliable teaching methods based on, or evaluated by, scientifically valid research.”

This opens the way for districts to try things they think should work for them while helping to increase the total amount of research available for evaluating the effectiveness of new promising programs. Most importantly, it turns the static approved list into a process for continuous research and improvement. —DN