Empirical Education: IES

Showing posts with label IES. Show all posts

Tuesday, October 9, 2012

Can We Measure the Measures of Teaching Effectiveness

Teacher evaluation has become the hot topic in education. State and local agencies are quickly implementing new programs spurred by federal initiatives and evidence that teacher effectiveness is a major contributor to student growth. The Chicago teachers’ strike brought out the deep divisions over the issue of evaluations. There, the focus was on the use of student achievement gains, or value-added. But the other side of evaluation—systematic classroom observations by administrators—is also raising interest. Teaching is a very complex skill, and the development of frameworks for describing and measuring its interlocking elements is an area of active and pressing research. The movement toward using observations as part of teacher evaluation is not without controversy. A recent OpEd in Education Week by Mike Schmoker criticizes the rapid implementation of what he considers overly complex evaluation templates “without any solid evidence that it promotes better teaching.”

There are researchers engaged in the careful study of evaluation systems, including the combination of value-added and observations. The Bill and Melinda Gates Foundation has funded a large team of researchers through its Measures of Effective Teaching (MET) project,, which has already produced an array of reports for both academic and practitioner audiences (with more to come). But research can be ponderous, especially when the question is whether such systems can impact teacher effectiveness. A year ago, the Institute of Education Sciences (IES) awarded an $18 million contract to AIR to conduct a randomized experiment to measure the impact of a teacher and leader evaluation system on student achievement, classroom practices, and teacher and principal mobility. The experiment is scheduled to start this school year and results will likely start appearing by 2015. However, at the current rate of implementation by education agencies, most programs will be in full swing by then.

Empirical Education is currently involved in teacher evaluation through our Observation Engine—a web-based tool that helps administrators make more reliable observations (see story about our work with Tulsa Public Schools). This tool, along with our R&D on protocol validation, was initiated as part of the MET project. In our view, the complexity and time-consuming aspects of many of the observation systems that Schmoker criticizes arise from their intended use as supports for professional development. The initial motivation for developing observation frameworks was to provide better feedback and professional development for teachers. Their complexity is driven by the goal of providing detailed, specific feedback. Such systems can become cumbersome when applied to the goal of providing a single score for every teacher representing teaching quality that can be used administratively, for example, for personnel decisions. We suspect that a more streamlined and less labor-intensive evaluation approach could be used to identify the teachers in need of coaching and professional development. That subset of teachers would then receive the more resource-intensive evaluation and training services such as complex, detailed scales, interviews, and coaching sessions.

The other question Schmoker raises is: do these evaluation systems promote better teaching? While waiting for the IES study to be reported, some things can be done. First, look at correlations of the components of the observation rubrics with other measures of teaching such as value-added to student achievement (VAM) scores or student surveys. The idea is to see whether the behaviors valued and promoted by the rubrics are associated with improved achievement. The videos and data collected by the MET project are the basis for tools to do this (see earlier story on our Validation Engine.) But school systems can conduct the same analysis using their own student and teacher data. Second, use quasi-experimental methods to look at the changes in achievement related to the system’s local implementation of evaluation systems. In both cases, many school systems are already collecting very detailed data that can be used to test the validity and effectiveness of their locally adopted approaches.

Tuesday, February 21, 2012

Exploration in the World of Experimental Evaluation

Our 300+ page report marks a good start into this exploration. But IES, faced with limited time and resources to complete the many experiments being conducted within the Regional Education Lab system, put strict limits on the number of exploratory analyses researchers could conduct. We usually think of exploratory work as questions to follow up on puzzling or unanticipated results. However, in the case of the REL experiments, IES asked researchers to focus on a narrow set of “confirmatory” results and anything else was considered “exploratory,” even if the question was included in the original research design.

The strict IES criteria were based on the principle that when a researcher is using tests of statistical significance, the probability of erroneously concluding that there is an impact when there isn’t one increases with the frequency of the tests. In our evaluation of AMSTI, we limited ourselves to only four such “confirmatory” (i.e., not exploratory) tests of statistical significance. These were used to assess whether there was an effect on student outcomes for math problem-solving and for science, and the amount of time teachers spent on “active learning” practices in math and in science. (Technically, IES considered this two sets of two, since two were the primary student outcomes and two were the intermediate teacher outcomes.) The threshold for significance was made more stringent to keep the probability of falsely concluding that there was a difference for any of the outcomes at 5% (often expressed as p < .05).

While the logic for limiting the number of confirmatory outcomes is based on technical arguments about adjustments for multiple comparisons, the limit on the amount of exploratory work was based more on resource constraints. Researchers are notorious (and we don’t exempt ourselves) for finding more questions in any study than were originally asked. Curiosity-based exploration can indeed go on forever. In the case of our evaluation of AMSTI, however, there were a number of fundamental policy questions that were not answered either by the confirmatory or by the exploratory questions in our report. More research is needed.

Take the confirmatory finding that the program resulted in the equivalent of 28 days of additional math instruction (or technically an impact of 5% of a standard deviation). This is a testament to the hard work and ingenuity of the AMSTI team and the commitment of the school systems. From a state policy perspective, it gives a green light to continuing the initiative’s organic growth. But since all the schools in the experiment applied to join AMSTI, we don’t know what would happen if AMSTI were adopted as the state curriculum requiring schools with less interest to implement it. Our results do not generalize to that situation. Likewise, if another state with different levels of achievement or resources were to consider adopting it, we would say that our study gives good reason to try it but, to quote Lee Cronbach, a methodologist whose ideas increasingly resonate as we translate research into practice: “…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (Cronbach, 1975, p. 125).

The explorations we conducted as part of the AMSTI evaluation did not take the usual form of deeper examinations of interesting or unexpected findings uncovered during the planned evaluation. All the reported explorations were questions posed in the original study plan. They were defined as exploratory either because they were considered of secondary interest, such as the outcome for reading, or because they were not a direct causal result of the randomization, such as the results for subgroups of students defined by different demographic categories. Nevertheless, exploration of such differences is important for understanding how and for whom AMSTI works. The overall effect, averaging across subgroups, may mask differences that are of critical importance for policy.

Readers interested in the issue of subgroup differences can refer to Table 6.11. Once differences are found in groups defined in terms of individual student characteristics, our real exploration is just beginning. For example, can the difference be accounted for by other characteristics or combinations of characteristics? Is there something that differentiates the classes or schools that different students attend? Such questions begin to probe additional factors that can potentially be addressed in the program or its implementation. In any case, the report just released is not the “final report.” There is still a lot of work necessary to understand how any program of this sort can continue to be improved.

–DN & AJ

Friday, September 11, 2009

Easton Sets a New Agenda for IES

John Easton, now officially confirmed as the director of the Institute of Education Sciences, gave a brief talk July 24th to explain his agenda to the directors and staff of the Regional Education Labs. This is a particularly pivotal time, not only because the Obama administration is setting an aggressive direction for changes in the K-12 schools, but also because the Easton is starting his six-year term just as IES is preparing the RFP for the re-competition for the 10 RELs. (The budget for the RELs accounts for about 11% of the approximately $600 million IES budget.)

Easton made five points.

First, he is not retreating from the methodological rigor, which was the hallmark of his predecessor, Russ Whitehurst. This simply means that IES will not be funding poorly designed research that does not have the proper controls to support conclusions the researcher wants to assert. Randomized control is still the strongest design for effectiveness studies, although weaker designs are recognized as having value.

Second, there has to be more emphasis on relevance and usability for practitioners. IES can’t ignore how decisions are made and what kind of evidence can usefully inform them. He sees this as requiring a new focus on school systems as learning organizations. This becomes a topic for research and development.

Third, although randomized experiments will still be conducted, there needs to be a stronger tie to what is then done with the findings. In a research and development process, rigorous evaluation should be built in from the start and should relate more specifically to the needs of the practitioners who are part of the R&D process. In this sense, the R&D process should be linked more directly to the needs of the practitioners.

Fourth, IES will move away from the top-down dissemination model in which researchers seem to complete a study and then throw the findings over the wall to practitioners. Instead, researchers should engage practitioners in the use of evidence, understanding that the value of research findings comes in its application, not simply in being released or published. IES will take on the role of facilitating the use of evidence.

Fifth, IES will take on a stronger role in building capacity to conduct research at the local level and within state education agencies. There’s a huge opportunity presented by the investment (also through IES) in state longitudinal data systems. The combination of state systems and the local district systems makes gathering the data to answer policy questions and questions about program effectiveness much easier. The education agencies, however, often need help in framing their questions, applying an appropriate design, and deploying the necessary and appropriate statistics to turn the data into evidence.

These five points form a coherent picture of a research agency that will work more closely through all phases of the research process with practitioners, who will be engaged in developing the findings and putting them into practice. This suggests new roles for the Regional Education Labs in helping their constituencies to answer questions pertinent to their local needs, in engaging them more deeply in using the evidence found, and in building their local capacity to answer their own questions. The quality of work will be maintained, and the usability and local relevance will be greatly increased.
— DN