Wednesday, September 8, 2010
2010-2011: The Year of the VAM
VAM is a family of statistical techniques for estimating the contribution of a teacher or of a school to the academic growth of students. Recently, the LA Times obtained the longitudinal test score records for all the elementary school teachers and students in LA Unified and had a RAND economist (working as an independent consultant) run the calculations. The result was a “score” for all LAUSD elementary school teachers. Note that the economist who did the calculations wrote up a technical report on how it was done and the specific questions his research was aimed at answering.
Reactions to the idea that a teacher could be evaluated using a set of test scores—in this case from the California Standards Test—were swift and divisive. The concept was denounced by the teachers’ union, with the local leader calling for a boycott. Meanwhile, the US Secretary of Education, Arne Duncan, made headlines by commenting favorably on the idea. The LA Times quotes him as saying “What’s there to hide? In education, we’ve been scared to talk about success.”
There is a tangle of issues here, along with exaggerations, misunderstandings, and confusion between research techniques and policy decisions. This column will address some of the issues over the coming year. We also plan to announce some of our own contributions to the VAM field in the form of project news.
The major hot-button issues include appropriate usage (e.g., for part or all of the input to merit pay decisions) and technical failings (e.g., biases in the calculations). Of course, these two issues are often linked; for example, many argue that biases may make VAM unfair for individual merit pay. The recent Brief from the Economic Policy Institute, authored by an impressive team of researchers (several our friends/mentors from neighboring Stanford), makes a well reasoned case for not using VAM as the only input to high-stakes decisions. While their arguments are persuasive with respect to VAM as the lone criterion for awarding merit pay or firing individual teachers, we still see a broad range of uses for the technique, along with the considerable challenges.
For today, let’s look at one issue that we find particularly interesting: How to handle teacher collaboration in a VAM framework. In a recent Education Week commentary, Kim Marshall argues that any use of test scores for merit pay is a losing proposition. One of the many reasons he cites is its potentially negative impact on collaboration.
A problem with an exercise like that conducted by the LA Times is that there are organizational arrangements that do not come into the calculations. For example, we find that team teaching within a grade at a school is very common. A teacher with an aptitude for teaching math may take another teacher’s students for a math period, while sending her own kids to the other teacher for reading. These informal arrangements are not part of the official school district roster. They can be recorded (with some effort) during the current year but are lost for prior years. Mentoring is a similar situation, wherein the value provided to the kids is distributed among members of their team of teachers. We don’t know how much difference collaborative or mentoring arrangements make to individual VAM scores, but one fear in using VAM in setting teacher salaries is that it will militate against productive collaborations and reduce overall achievement.
Some argue that, because VAM calculations do not properly measure or include important elements, VAM should be disqualified from playing any role in evaluation. We would argue that, although they are imperfect, VAM calculations can still be used as a component of an evaluation process. Moreover, continued improvements can be made in testing, in professional development, and in the VAM calculations themselves. In the case of collaboration, what is needed are ways that a principal can record and evaluate the collaborations and mentoring so that the information can be worked into the overall evaluation and even into the VAM calculation. In such an instance, it would be the principal at the school, not an administrator at the district central office, who can make the most productive use of the VAM calculations. With knowledge of the local conditions and potential for bias, the building leader may be in the best position to make personnel decisions.
VAM can also be an important research tool—using consistently high and/or low scores as a guide for observing classroom practices that are likely to be worth promoting through professional development or program implementations. We’ve seen VAM used this way, for example, by the research team at Wake County Public Schools in North Carolina in identifying strong and weak practices in several content areas. This is clearly a rich area for continued research.
The LA Times has helped to catapult the issue of VAM onto the national radar. It has also sparked a discussion of how school data can be used to support local decisions—which can’t be a bad thing.
— DN
Friday, February 13, 2009
Education Week Reports that 'Scientifically Based' is Giving Way to 'Development' and 'Innovation'
The headline in the January 28 issue of Education Week suggests that the pendulum is swinging from obtaining rigorous scientific evidence to providing greater freedom for development and innovation.
Is there reason to believe this is more than a war of catch phrases? Does supporting innovative approaches take resources away from “scientifically based research”? A January 30 interview with Arne Duncan, our new Secretary of Education, by CNN’s Campbell Brown is revealing. She asked him about the innovative program in Chicago that pays students for better grades. Here is how the conversation went:
Duncan: ...in every other profession we recognize, reward and incent excellence. I think we need to do more of that in education.
CNN: For the students specifically, you think money is the way to do that? It’s the best incentive?
Duncan: I don’t think it is the best incentive; I think it's one incentive. This is a pilot program we started this fall so it’s very early on. But so far the data is very encouraging—so far the students’ attendance rates have gone up, students’ grades have gone up, and these are communities where the drop out rate has been unacceptably high and whatever we can do to challenge that status quo. When children drop out today, Campbell as you know, they are basically condemned to social failure. There are no good jobs out there so we need to be creative; we need to push the envelope. I don’t know if this is the right answer. We’ve got a control group.
CNN: But is it something that you would like to try across the country, to have other schools systems adopt?
Duncan: Again, Campbell, this is...we are about four months into it in Chicago. We have a control group where this is not going on, so we’re going to follow what the data tells us. And if it’s successful, we’ll look to expand it. If it’s not successful, we’ll stop doing it. We want to be thoughtful but I think philosophically I am pro pushing the envelope, challenging the status quo, and thinking outside the box...
Read more from the interview here.
Notice that he is calling for innovation: “pushing the envelope challenging the status quo, thinking outside the box.” But he is not divorcing innovation from rigorously controlled effectiveness research. He is also looking at preliminary findings of changes such as attendance rates that can be detected early. The “scientifically based” research is built into the innovation’s implementation at the earliest stage. And it is used as a basis for decisions about expansion.
While there may be reason to increase funding for development, we can’t divorce rigorous research from development; nor should we consider experimental evaluations as activities that kick in after an innovation has been fielded. We are skeptical that there is a real conflict between scientific research and innovation. The basic problem for research and development in education is not too much attention to rigorous research, but too little overall resources going into education. The graphic included in the Ed Week story makes clear that the R&D investment in education is miniscule compared to R&D for the military, energy, and heath sectors (health gets 100 times as much as education, whereas the category “other” gets 16 times as much).
The Department of Defense, of course, gets a large piece of the pie, and we often see the Defense Advanced Research Projects Agency (DARPA) held up as an example of a federal agency devoted to addressing innovative and often futuristic, requirements. The Internet started as one such engineering project, the ARPANET. In this case, researchers needed to access information flexibly from computers situated around the country and the innovative distributed approach turned out to be massively scalable and robust. Although we don’t always see that research is built in as a continuous part of engineering development, every step was in fact an experiment. While testing an engineering concept may take a few minutes of data collection, in education, the testing can be more cumbersome. Cognitive development and learning take months or years to generate measurable gains and experiments need careful ways to eliminate confounders that often don’t trouble engineering projects. Education studies are also not as amenable to clever technical solutions (although the vision of something as big the Internet coming in to disrupt and reconfigure education is tantalizing).
It is always appealing to see the pendulum swing with a changing of the guard. In the current transition, what is coming about looks more like a synthesis in which research and development are no longer treated as separate— and certainly not seen as competitors. —DN