The question came up at the recent workshop held in Washington DC for school district researchers to learn more about rigorous program evaluation: “Why is the strongest research design often the hardest to make happen?” There are very good theoretical reasons to use randomized control when trying to evaluate whether a school district’s instructional or professional development program works. What we want to know is whether involving students and teachers in some program will result in outcomes that are better than if those same students and teachers were not involved in the program. The workshop presenter, Mark Lipsey of Vanderbilt University, pointed out that if we had a time machine we could observe how well the students and teachers achieved with the program, then go back in time, don’t give them the program — thus creating the science fiction alternate universe — and watch how they did without the program. We can’t do that, so the next best thing is to find a group that is just like the one with the program and see how they do. By choosing who gets a program and who doesn’t get it from a pool of volunteer teachers (or schools) using a coin toss (or another random method), we can be sure that self selection had nothing to do with group assignment and that, at least on average, the only difference between members of the two groups is that one group won the coin toss and the other didn’t. Most other methods introduce potential bias that can change the results.
Randomized control can work where the district is doing a small pilot and has only enough materials for some of the teachers, where resources call for a phased implementation starting with a small number of schools, or where slots in a program are going to be allocated by lottery anyway. To many people, the coin toss (or other lottery method) just doesn’t seem right. Any number of other criteria could be suggested as a better rationale for assigning the program: some students are needier, some teachers may be better able to take advantage of it, and so on. But the whole point is to avoid exactly those kinds of criteria and make the choice entirely random. The coin toss itself highlights the decision process, creating a concern that it will be hard to justify, for example, to a parent who wants to know why his kid’s school didn’t get the program.
Our own experience with random assignment has not been so negative. Most districts will agree to it, although some do refuse on principle. When we begin working with the teachers face–to–face, there is usually camaraderie about tossing the coin, especially when it is between two teachers paired up because of their similarity on characteristics they themselves identify as important (we’ve also found this pairing method helps give us more precise estimates of the impact). The main problem we find with randomization, if it is being used as part of a district’s own local program evaluation, is the pre–planning that is required. Typically, decisions as to which schools get the program first or which teachers will be selected to pilot the program are made before consideration is given to doing a rigorous evaluation. In most cases, the program is already in motion or the pilot is coming to a conclusion before the evaluation is designed. At that point in the process, the best method will be to find a comparison group from among the teachers or schools that were not chosen or did not volunteer for the program (or to look outside the district for comparison cases). The prior choices introduce selection bias that we can attempt to compensate for statistically; still, we can never be sure our adjustments eliminate the bias. In other words, in our experience the primary reason that randomization is harder than weaker methods is that it requires that the evaluation design and the program implementation plan are coordinated from the start. —DN