The claims in the current research literature provide existence proofs that interventions can affect a particular outcome, such as deepening teachers' content knowledge. Efficacy of particular interventions or sets of intervention was not a subject of most studies. Broad generalizability of the effects, and replicability of the interventions were not generally examined.
For example, of 32 studies of interventions intended to deepen teachers' content knowledge in mathematics or science, all but one found positive effects. Although most of these studies used either a pre-post design to measure changes in teachers' content knowledge or traced changes in teachers' content knowledge over time, rarely did studies use comparison groups of teachers who did not participate in the professional development programs, lessening the confidence that can be placed on their results as evidence of efficacy.
In many of these studies the interventions were delivered by their developers to teacher participants who were volunteers committed to participation in fairly extensive interventions. Generalizability of the findings from these studies must be considered in this light. The populations that the participating teachers represented are limited to those willing and able to commit to such extensive interventions. Replicability of the interventions by other providers or in other contexts was generally not investigated. Some studies provide documentation of the interventions such that other providers could design and deliver similar programs, while others do not. Even for those studies that thoroughly documented the interventions, there may be important aspects of the design or implementation of the programs that are not described due to space limitations, or assumptions of the researchers that certain aspects do not need to be documented.
The claims in the current research literature document positive effects that result from whole programs of intervention. Effects or contributions of specific strategies within interventions were not typically examined empirically.
For example, nearly all of the studies of interventions aimed at deepening teachers' content knowledge in mathematics or science were designed like program evaluations to study the effects of teachers' participation in whole programs. Neither systematic variations in treatment, nor naturalistic variations in participation were examined to tease out contributions or effects of particular strategies for deepening teachers' content knowledge. Also, the instruments for assessing changes in teacher content knowledge for most studies were developed by the investigators specifically for the intervention and the study. Such instruments were appropriately intended to align with the goals of each intervention, but the diversity of instruments makes it difficult to combine and compare results. These instruments likely differ in the aspects of teachers' content knowledge that they assess, as well as on their measurement properties. As a consequence, it is difficult to accumulate evidence across studies to assess the effects of programs that share, or do not share, the use of particular strategies. Also, it is difficult to compare the effectiveness of different programs for addressing specific aspects of teachers' content knowledge.