Research

The Danielson Group seeks to share research studies involving the Framework. We maintain a strong interest in encouraging independent research in support of quality professional development, process improvements, and significant teaching outcomes.

We are interested in hearing about unpublished research studies involving the FfT such as dissertations or local impact studies. If you know of such studies, please let us know about them by contacting us at contact@danielsongroup.org.

Expand all Collapse all

2016: Teachers’ responses to feedback from evaluators: What feedback characteristics matter?

Study title/author(s)

Teachers’ responses to feedback from evaluators: What feedback characteristics matter? conducted by Trudy L. Cherasaro, R. Marc Brodersen, Marianne L. Reale, and David C. Yanoski. (2016)

Purpose of study

To support efforts to increase teacher effectiveness by examining how teachers value and use different aspects of the feedback they receive via teacher evaluation systems.

Research questions

  • What are teachers’ perceptions of the usefulness and accuracy of the feedback they received as a part of their evaluation system and what are their perceptions of their evaluator’s credibility and their access to resources related to the feedback?
  • How are the perceptions of the usefulness of feedback, the accuracy of feedback, evaluator credibility, and access to resources interrelated?
  • How are the usefulness of feedback, the accuracy of feedback, evaluator credibility, and access to resources related to the response to feedback?

Population/ sample

The researchers used a sample of teachers in seven districts across two states who volunteered to participate. It consisted of 317 pre-K through 12th grade teachers being evaluated with the district’s new teacher evaluation system. These teachers taught both core (e.g., English, math) and non-core (art, physical education) subjects in urban locales, rural locales, and small towns. Over three-quarters (76.7%) of these teachers responded to the Examining Evaluator Feedback Survey (Cherasaro et al., 2015), which asked teachers to reflect on the feedback they had received from evaluators during the 2014-15 school year.

Major results

  • Teachers received written and verbal feedback one to four times during the 2014-15 school year.
  • Most teachers agreed that their evaluator was credible (74%) and that the feedback they received was accurate (70%) and useful (55%). Two-thirds (66%) said that their feedback included specific suggestions for improvement. Similar proportions agreed or strongly agreed that they had access to resources related to the feedback they received (54%) and that they had responded to this feedback (60%).
  • Evaluator credibility was the most important characteristic affecting teachers’ responses to feedback. Feedback that was timely, accurate, based on observational data from a typical day in the classroom, and accompanied by sufficient time to plan for implementing feedback were also very important or critical to teachers’ responses to this feedback.
  • Teachers’ responses to feedback was strongly correlated with the extent to which teachers found the feedback to be useful, which in turn was strongly correlated with the credibility of their evaluators, which was strongly correlated with the perceived accuracy of the feedback.
  • After the relationship between evaluator credibility and usefulness was accounted for statistically, accuracy had little additional influence on usefulness. In addition, the relationship between access to resources and teachers’ responses to feedback was not significant after the

the perceived usefulness of the feedback was statistically controlled.

Conclusions/recommendations

Findings suggest that state and district education leaders take several steps to improve the effectiveness of teacher evaluation systems:

  • Review evaluator training on how to provide feedback to teachers with an emphasis on strengthening the usefulness of the feedback.
  • Examine policies related to the usefulness of teacher feedback and/or collect data on potential barriers to providing useful feedback.
  • Identify ways to ensure that feedback is frequent, timely, and includes specific suggestions to improve content knowledge, instructional strategies, and classroom management.
  • Direct teachers to targeted resources or professional development opportunities.
  • Highlight suggestions to improve content knowledge and classroom management strategies— more than half of teachers said these suggestions were important to respond to feedback, but fewer than half received such suggestions in their feedback.
  • Focus on ways to build evaluator credibility—although most teachers agreed that their evaluator was credible, their perceptions of the evaluator’s credibility were significantly related to the usefulness of the feedback and to their response to it.
  • Improve evaluator credibility by building evaluators’ knowledge of the content being evaluated, knowledge of how students learn, knowledge of teaching practices, understanding of the curriculum, and understanding of the teacher evaluation system.

FFT focus

The teacher evaluation systems examined in this study used observational protocols based on the Framework for Teaching, thus the findings regarding selection and training of evaluators applies to others using this instrument in their teacher evaluation systems.

VIEW REPORT

2014: Teaching to the Core: Practitioner perspectives about the intersection of teacher evaluation using the Danielson framework for teaching and Common Core State Standards

Study title/author(s)

Teaching to the Core: Practitioner perspectives about the intersection of teacher evaluation using the Danielson framework for teaching and Common Core State Standards, conducted by Caitlin K. Martin and Véronique Mertl. (2014)

Purpose of study

To highlight findings from a study that paired use of the Framework for Teaching (FfT) with Common Core-aligned Instructional Practice Guides (IPG) and inform decisions about modifications to the Framework for Teaching (FfT) to incorporate the Common Core State Standards (CCSS).

Research questions

  • How are the goals of the two instruments (FfT and IPG) aligned with each other and how do these goals correspond to perspectives of what is important about CCSS to practitioners?
  • What are practitioners struggling with in terms of adoption of and alignment with CCSS teaching and understanding?
  • To what extents are the two instruments (FfT and IPG) capturing critical information around CCSS and what more is needed to capture all critical information?
  • What are the general patterns of use of the instruments and observation practices? To what extent can the process for conducting teacher observations using the FfT be streamlined to allow for a more focused effort at measuring the core elements of instruction related to the CCSS?
  • Are there differences in use, attitude, or suggestions by district, school level, or evaluation role (observer compared to teacher)?

Population/ sample

413 teachers, school administrators, and district administrators from four US districts in four states (Connecticut, Illinois, New York, and Nevada) participated in teaching observations, surveys, and focus groups to provide feedback on the observation processes and tools. Sixteen teacher and observer pairs (4 per district) participated in case studies that included shadowing and interviews.

Major results

  • The data showed that almost all participants (91%) believed that the FfT effectively evaluated teaching practice. A similar majority (87%), however, felt that the Framework should change to better reflect the CCSS by making explicit connections between the FfT elements and Common Core practices to support evaluation, coaching, and practice. Teachers and administrators found the IPG to be an “enhanced lens” to see Common Core practices in Domain 3 of the FfT.
  • Qualitatively, practitioners did not want multiple versions of the FfT but rather digital guides that could be customized. The most prevalently desired customization were examples: video examples, examples specific to subjects and grade levels, examples of annotated lesson plans, and examples of student moves at each level of proficiency. Incorporating language from the CCSS into the FfT and highlighting connections between them were also common suggestions from study participants. Teachers and administrators connected the CCSS with Domains 1 and 3 of the FfT and desired examples and “critical attributes” of CCSS-informed practice with student descriptors for these domains. Participants often incorporated the CCSS practices that they found most challenging into the FfT. These practices included:

1. students develop a deep understanding
2. students persist through cycles of work and revision
3. teacher selection of complex texts and problems
4. student and teacher questioning leads to deeper understanding
5. students use evidence to construct arguments
6. students develop academic language, and
7. student independence and research.

  • While most teachers and administrators believed that the FfT captured effective teaching practice, only 60% believed that it was able to assess content knowledge. Despite this, only one fifth (21%) thought that the FfT should be enhanced by greater focus on subject-matter knowledge, preferring to assess this independent of observations by a general practitioner.
  • Finally, many schools and districts reduced the number of FtT components of focus for observations due to time constraints and other challenges, suggesting that changes to expand the FfT or make it more complex may be more challenging than supportive.

Conclusions/recommendations

The authors did not make recommendations separate from the research highlights and participant suggestions noted throughout the report. The primary recommendation in regard to modifications to the FfT was to develop digital guides with customized examples by subject area and grade level to support teachers and administrators in specifying levels of proficiency in various teaching contexts.

FfT focus

The study collected detailed practice information and suggestions from practitioners in both teacher and observer roles about connections between the FfT and CCSS practices that can be further developed within the Framework to better support teachers in improving their instructional practice.

SUMMARY REPORT

2014: Teacher Evaluations in an Era of Rapid Change: from “Unsatisfactory” to “Needs Improvement”

Study title/author(s)

Teacher Evaluations in an Era of Rapid Change: from “Unsatisfactory” to “Needs Improvement,” conducted by Chad Aldeman and Carolyn Chuong. (2014)

Purpose of study

To examine what can be learned from efforts to revise teacher evaluation systems between 2010-14 by synthesizing data from 17 states and the District of Columbia.

Research questions

Rather than specifying research questions, the authors reviewed the teacher evaluation data to identify five major trends.

Population/ sample

An appendix table (pp 31-2) specifies the data available for each included state and Washington, DC. Available data varied by the school personnel included, years, level of data (e.g., district- versus school-level data), source, inclusion of student growth or learning outcomes, and participation (e.g., pilot versus full implementation). All sites collected data on teachers, and almost 75% involved principals.

Major results

  • Districts have made progress in differentiating between multiple levels of teaching performance, rather than painting all teachers as “satisfactory” or “unsatisfactory.” This greater differentiation can identify educators who would benefit from targeted support.
  • The use of high-quality observational rubrics, such as the Framework for Teaching, provides teachers with more specific, constructive, and timely feedback on their classroom practice. Both principals and teachers have had positive reactions to the observations, reporting more time spent on teacher observation and reflection and more useful feedback on practice.
  • State policy changes have not convinced districts to factor student learning growth into teacher evaluation ratings. Three reactions to policies have included refusing to factor student growth into teacher evaluations, delaying incorporation of student learning into teacher evaluation scores, and obscuring student growth through idiosyncratic district implementation and post-hoc “upgrades” to underperforming teachers.
  • Districts have broad discretion to implement teacher evaluation policies under statewide guidelines, leading to substantial variation in practices between districts in the same state. Districts can choose how components of teacher evaluations are scored, compiled, and weighted in final teacher evaluation ratings. One result of this flexibility is broad differences in teacher evaluation scores between districts in the same state.
  • Districts rarely use teacher evaluation data to make consequential decisions about teacher promotion, compensation, or dismissal. Instead, credentials and seniority—rather than classroom performance—often determines who will receive tenure, promotions, and pay raises. Dismissals continue to be very rare.

Conclusions/recommendations

The authors make four recommendations:

  • States should collect and publicly report teacher evaluation ratings, including the component elements that make up those ratings, and how ratings are used to drive personnel decisions, to promote transparency and accountability.
  • States should work closely with districts to understand the causes of outcomes and variation between districts and ensure that evaluations are consistently rigorous across schools and classrooms.
  • States should not stop or slow reforms in teacher evaluations before these new policies have a change to take effect.
  • States should expect teacher evaluation reforms to co-exist with other educational reforms, such as the Common Core State Standards.

FFT focus

The review notes that the observational rubrics used for teacher evaluations in Arkansas, Delaware, Florida, Idaho, Illinois, New Jersey, New York, South Dakota, Washington, Cincinnati, Los Angeles, and Pittsburgh are based on Danielson’s Framework for Teaching, a “research-backed” protocol.

VIEW REPORT

2013: The reliability of classroom observations by school personnel

Study title/author(s)

The reliability of classroom observations by school personnel, conducted by Andrew Ho and Thomas Kane. (2013)

Purpose of study

To evaluate the accuracy and reliability of school personnel in performing classroom observations.

Research questions

  • What characteristics of lessons and observers are associated with accurate and reliable observation scores?
  • Do lessons chosen by teachers (a proxy for notification prior to an observation) earn higher observation scores than those chosen at random?
  • Are observation scores based on short segments of instruction (15 minutes) comparable to observation scores based on an entire lesson?
  • Does the order in which various lessons and segments are observed influence scores?

Population/ sample

Video recordings of eight lessons (four chosen by the teachers and four chosen at random) by 67 teachers in Hillsborough County, Florida were observed and scored by 53 administrators and 76 teacher peers. The teachers and administrators represented 32 schools at the elementary, middle, and high school levels. Each of 129 observers rated four video-recorded lessons by six teachers (24 lessons) on 10 items from Domains 2 (Classroom Environment) and 3 (Instruction) of Danielson’s Framework for Teaching (FFT), yielding more than 3,000 video scores. Each teacher’s instruction was scored an average of 46 times by different types of observers.

Major results

  • Observers rated five percent of lessons as “unsatisfactory” and only two percent “advanced.” Most scores were in the middle two categories, “basic” and “proficient.” Due to the compressed scale, a .1 point score difference could move a teacher up or down 10 points in percentile rank.
  • Administrator scores differentiated more among teachers than scores assigned by peer teachers. The standard deviation among teacher scores was 50 percent larger when scored by administrators than when scored by peers.
  • Administrators rated their own teachers .1 points higher than administrators from other schools and .2 points higher than peers. While this difference seems small in value, it was magnified by the compressed scale of scores noted above.
  • Although administrators scored their own teachers higher, they ranked their teachers similarly as administrators from other schools. This implies that administrator scores were not driven by prior impressions, favoritism, or personal bias. The correlation between same-school and different-school administrators’ scores for a given teacher was .87.
  • Allowing teachers to choose their own videos generated higher average scores, but their ranking remained the same as when videos were not chosen. That is, choosing their own videos did not mask the differences in their practice. In fact, variance in observation scores was greater among self-selected lessons.
  • A positive (or negative) impression of a teacher in the first several videos tended to linger—especially when one observation immediately followed the other.
  • Using multiple observers makes a difference in promoting reliability. The reliability of a single observation by a single observer ranged between .27 and .45. One observation by an administrator and another by an external peer increased reliability to .59 and additional observations and observers increased it further, up to .72.
  • The cost of involving multiple observers can be mitigated by supplementing observations of full lessons with shorter observations. The reliability of a 15-minute observation was 60 percent that of a full lesson observation and required less than one-third of the time.

Conclusions/recommendations

Findings highlight the importance of multiple observations by multiple observers to achieve an acceptable level of reliability, especially upon which to base consequential decisions.

Variance was also found based on inconsistencies between observers, which highlights the importance of training, certification tests, and regular calibration. To ensure a fair and reliable system for teachers, districts should employ multiple observers and set up a system to check and compare feedback given to teachers by different observers.

Findings also suggest that the element of surprise in teacher evaluations does not increase reliability or accuracy while heightening anxiety and the impression that evaluations are primarily concerned with teacher accountability. In fact, self-selected lessons showed greater variation among teachers.

Finally, the observation instrument (he Framework for Teaching) did not discern large absolute differences in practice. Most teachers fell in the middle of the scale, so that small differences in scores translated into large differences in percentile rankings. This may reflect observers’ reluctance to make sharp distinctions among teachers, a need for finer distinctions in performance level standards, or a lack of variance in teaching practice on the scales used in this study. The authors assert that the field needs instruments that allow for clearer distinctions, such as content-based observational protocols, those that assess subject-specific pedagogical best practices, or instruments for assessing instruction on specific standards in the Common Core State Standards.

FFT focus

The teacher observations in this study scored lessons on 10 items from Domains 2 and 3 of the Framework for Teaching. Analyses support the reliability and validity of the FFT to differentiate teaching practice, especially under the conditions recommended.

VIEW REPORT

2012: Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains

Study title/author(s)

Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains, conducted by the Bill & Melinda Gates Foundation. (2012)

Purpose of study

To test the value of five different instruments for conducting teacher observations by comparing them on reliability and a range of student outcomes, such as gains on achievement tests and self-reported effort and enjoyment in class.

  • Framework for Teaching (or FFT, developed by Charlotte Danielson of the Danielson Group),
  • Classroom Assessment Scoring System (or CLASS , developed by Robert Pianta, Karen La Paro, and Bridget Hamre at the University of Virginia),
  • Protocol for Language Arts Teaching Observations (or PLATO, developed by Pam Grossman at Stanford University),
  • Mathematical Quality of Instruction (or MQI, developed by Heather Hill of Harvard University), and
  • UTeach Teacher Observation Protocol (or UTOP, developed by Michael Marder and Candace Walkington at the University of Texas-Austin).

Research questions

  • Is it possible to describe high- and low-quality questioning techniques sufficiently clearly so that observers can be trained to recognize strong questioning skills?
  • Would different observers come to similar judgments?
  • Are those judgments related to student outcomes measured in different ways?

Population/ sample

Almost 7,500 videos of the classroom practice of 1,333 teachers from Charlotte-Mecklenburg, Dallas, Denver, Hillsborough County, New York City, and Memphis were observed and rated at least 3 times by trained observers. This group of teachers represents the subset of MET project volunteers who taught math or English language arts (ELA) in grades 4 through 8 and who agreed to participate in random assignment during year 2 of the project. Each video-recorded lesson was scored using each of the cross-subject instruments, CLASS and the FFT, and a third time using one of the subject-specific instruments, either MQI or PLATO. A subset of 1,000 videos of math lessons was scored a fourth time with the UTOP instrument. Data on state test scores, supplemental tests, and student surveys from more than 44,500 students were included in analyses.

Major results

  • All five observation instruments were positively associated with student achievement gains.
  • Reliably characterizing a teacher’s practice required averaging scores over multiple observations.
  • Combining observation scores with student achievement gains and survey data improved reliability and predictive power. When teachers were ranked on the combined measure, the difference between having a top- and bottom-quartile teacher was nearly 8 months in math and 2.5 months in ELA.
  • Teaching experience and graduate degrees were associated with much smaller gains in state test scores than the combined measure.
  • Teachers with strong performance on the combined measure also performed well on other student outcomes—the tests of conceptual understanding and student self-reported levels of effort and enjoyment in class.

Conclusions/recommendations

The authors highlight three implications of their findings:

  • Achieving high levels of reliability of classroom observations will require quality assurances such as observer training and certification, system-level “audits” by a second set of observers, and multiple observations to improve reliability, especially when stakes are high.
  • Evaluation systems should include multiple measures, not just value-added scores or classroom observations.
  • Classroom observations have the potential to identify specific strengths and weaknesses in teachers’ practice. Schools should look for ways to use observations for teacher development.

FFT focus

The Framework for Teaching was one of the five observational instruments assessed for reliability and association with student outcomes. The researchers found that the FFT was related to student achievement growth on state test scores, tests of conceptual understanding, and student surveys of effort and enjoyment in the classroom. Several subsequent analyses focused on the FFT as a component of the combined measure.

VIEW REPORT

2011: Rethinking teacher evaluation in Chicago: Lessons learned from classroom observations, principal-teacher conferences, and district implementation

Study title/author(s)

Rethinking teacher evaluation in Chicago: Lessons learned from classroom observations, principal-teacher conferences, and district implementation, conducted by Lauren Sartain, Sara Ray Stoelinga, and Eric R. Brown with the assistance of Stuart Luppescu, Kavita Kapadia Matsko, Frances K. Miller, Claire E. Durwood, Jennie Y. Jiang, and Danielle Glazer. (2011)

Purpose of study

To summarize findings from Chicago’s Excellence in Teaching Pilot, which aimed to improve instruction by providing teachers with feedback on their strengths and weaknesses, and highlight broad implications for districts and states working to design and develop more effective teacher evaluation systems.

Research questions

  • What are the characteristics of principal ratings of teaching practice?
    • Do evaluators rate the same lesson in the same way? Do principals rate teaching practice consistently across schools?
    • Are the classroom observation ratings valid measures of teaching practice? Is there a relationship between ratings and student learning outcomes?
  • What are principal and teacher perceptions of the evaluation tool and conferences?
    • Do participants find the system to be useful? To be fair?
    • What is the perceived impact on teacher practice?
  • What factors facilitated or impeded implementation of the teacher evaluation system?

Population/ sample

Randomly-selected teachers and administrators at half of the elementary schools in 4 areas of Chicago participated in the first year of the study (2008-09), while the others joined the following year. Sample sizes and participants varied by aspect of the study: 499 observations of 257 teachers were made by principals and highly-trained external observers to assess the reliability of FfT ratings, while principals made 955 observations of 501 teachers to assess the validity of observations. Teacher value-added scores were calculated for 417 reading teachers and 340 math teachers to determine the relationship between observational assessments and student learning. Thirty-seven pilot and control principals completed surveys to determine their engagement. All pilot schools also completed principal interviews (39), teacher interviews (26), and principal focus groups (23). A subset of 8 case study schools provided additional in-depth administrator interviews, teacher focus groups, and observations.

Major results

  • The data showed a strong relationship between classroom observation ratings on the FfT and value-added measures of student learning growth in both reading and math. The students of highly-rated teachers showed the most growth in their test scores, while students of teachers with low ratings on the FfT students showed the least growth. These results support the validity of observational ratings of teaching practice using the FfT.
  • In terms of score reliability, most principals assigned the same ratings to observed teachers as highly-trained external observers, although small percentages consistently rated teachers lower (11%) or higher (17%) than the external observers. Administrators tended to rate teaching practice reliably at the low end of the scale (Unsatisfactory and Basic) but rated teachers’ practice as Distinguished more often than observers.
  • Some principals struggled to learn and engage with the process of using the FfT to observe and rate their teachers’ practice.
  • Qualitatively, administrators and teachers thought the FfT helped lead to more reflective, evidence-based discussions about teaching practice during post-observation conferences. The FfT provided a shared language about instructional practice and improvement that guided conversations.
  • The effectiveness of conferences varied, however, depending on the principal’s skills and buy-in to the evaluation process. Many principals struggled to engage in deep coaching conversations, instead dominating conversations and/or using low-level questions that required minimal responses and didn’t push teachers’ understanding.

Conclusions/recommendations

Classroom observations using the Danielson FfT can provide valid and reliable assessments of teaching practice while also assessing specific strengths and weaknesses. Use of the FfT helped to guide more meaningful conversations about instruction. While most principals were engaged in the evaluation process, there is a need for more training and support for deep coaching conversations that translate ratings on an instructional rubric into improved performance in the classroom. Another challenge is the time and energy required to conduct evidence-based evaluation well, from analyzing observations to preparing for thoughtful conferences. Both teachers and administrators must make a long-term commitment to using evaluation to promote teacher development and improve practice in the classroom.

FfT focus

The study validated the FfT as being associated with teacher effectiveness as measured by student growth on test scores in reading and math and established that principals can rate teachers reliably as compared with highly-trained external observers. Use of the evidence-based rubric provided a common language for discussions about instructional improvement and fostered more objective and meaningful conversations. However, the study also demonstrated some concerns about use of the FfT as a tool for teacher development and instructional improvement, including principal buy-in and engagement, political concerns about teacher ratings, time constraints, and the need for support for administrators to translate FfT ratings into deep coaching for their teachers.

VIEW REPORT

 

2011: Identifying effective classroom practices using student achievement data

Study title/author(s)

Identifying effective classroom practices using student achievement data, conducted by Thomas J. Kane, Eric S. Taylor, John H. Tyler, & Amy L. Wooten. (2011)

Purpose of study

To estimate the extent to which observational measures of teaching effectiveness are related to student achievement growth and which observable practices best predict achievement gains.

Research questions

  • Do evaluations based on classroom observations identify teachers who are effective based on the test score gains of their students?
  • Which teaching practices best predict student achievement growth in mathematics and reading?

Population/ sample

The researchers used teacher observation and student achievement data from the Teacher Evaluation System (TES) in Cincinnati, Ohio during the 2003-04 and 2008-09 school years. Teacher observations based on Charlotte Danielson’s Framework for Teaching (FFT) were conducted four times during each teachers’ observation year, focusing on Domain 2 “Creating an Environment for Learning” and Domain 3 “Teaching for Learning,” which include over two dozen specific teaching practices grouped into 8 standards of high-quality teaching. Overall, data from 207 math teachers and 16,196 of their students were included in the estimation sample, along with data from 365 reading teachers and their 20,125 students. A robust set of covariates were included to control for teacher and student characteristics.

Major results

  • Observable teaching practices predict differences in student achievement growth in both reading and mathematics. And increase by one point in overall classroom practices, a composite score based on all eight teaching standards in Domains 2 & 3 equally weighted, corresponded with a 1/7 of a standard deviation (SD) increase in reading achievement and 1/10 of a SD increase in math achievement.
  • Among students whose teachers have similar overall classroom practices scores, math achievement will grow more for students whose teachers are relatively better at classroom management (Domain 2) while reading achievement will increase more among students whose teachers are relatively better at engaging students in questioning and discussion (Domain 3, Standard 4).
  • Models focusing on the effectiveness of teachers in the years before and after their TES observations suggested that improvements to instructional practice as a result of participation were strongest for beginning teachers and promoted skills in classroom management especially.

Conclusions/recommendations

Differences in student achievement based on teachers’ scores in overall classroom practices indicate that efforts to improve skills on all standards measured by the TES will benefit students. Concrete suggestions for improvement can be drawn from the descriptions of “Basic,” “Proficient,” and “Distinguished” classroom practice on all 8 standards of the TES. However, a one-point increase in teacher performance, from “Proficient” to “Distinguished,” for example, is a greater than 2 SD increase, suggesting that a one-point improvement is not as easy practically as the rubric suggests.

Teachers who must focus on a smaller number of practices for improvement due to limited time or professional development opportunities may want to focus on classroom management if they teach mathematics and on asking thought-provoking questions and engaging students in discussion if they teach reading, as results suggest the greatest impact of those practices, respectively, in each subject area. TES describes specific practices that contribute to teachers’ scores, thus providing details that teachers can use to improve their effectiveness at raising student achievement.

FFT focus

The TES observational rubric is closely based on the FFT, indicating similar results (and benefits) of using the FfT to observe teachers’ classroom practice. This study validates that the 8 instructional standards of Domains 2 & 3 of the FFT predict improvements in student achievement and provides preliminary evidence that particular practices are more or less important for promoting student achievement in particular subject areas.

This article is only available through the journal in which it was published. J. Human Resources July 1, 2011 46:587-613

 

2009: Examining teacher evaluation validity and leadership decision-making within a standards-based evaluation system

Study title/author(s)

Examining teacher evaluation validity and leadership decision-making within a standards-based evaluation system, conducted by Steven Kimball and Anthony Milanowski. (2009)

Purpose of study

To examine the validity of principals’ teaching evaluation ratings and determine whether differences in decision making contribute to the differential validity observed in these ratings. [Validity was defined by the relationship between evaluation ratings and value-added measures of student achievement.]

Research questions

  • How much does the validity of the performance rating relationship vary across evaluators?
  • Are differences in evaluator decision making in a standards-based teacher evaluation system related to differences in the strength of the student achievement- performance rating relationship?

Population/ sample

The study was carried out in a large school district in the western United States, comprised of 88 schools in which about 3,300 teachers educate more than 60,000 students, that had implemented a teacher evaluation system based on Danielson’s (1996) Framework for Teaching several years prior and thus had performance evaluation results for many teachers over several consecutive years. After exclusions for missing evaluation scores and student achievement data, a total of 5,683 students and 328 teachers were included in the analysis for the 2001-02 school year, and 9,873 students and 569 teachers were included in 2002-03. In 2001-02, 39 principals evaluated 5 or more teachers and in 2002-03, 57 administrators evaluated 5 or more teachers. For the qualitative component of the study, 23 of the evaluators with more valid ratings (high correlations with teachers’ classroom average student achievement) and less valid ratings (very low and negative correlations with teachers’ classroom average student achievement) were selected for interviews. Additional analysis of interview transcripts and written feedback to teachers was conducted for eight evaluators with two consecutive years of high (average r=.55) or low (average r=-.28) validity ratings.

Major results

  • Evaluators varied substantially in the strength and direction of the relationship between their teacher ratings and the achievement of those teachers’ students. Approximately 30% of administrators’ ratings of teachers had correlations with average student achievement below –.10 (low validity), while over 40% of administrators gave teachers ratings that correlated with their students’ achievement at .31 or higher (high validity).
  • Differences in evaluators’ motivations, or will, to conduct teacher performance evaluations were not related to the validity of their ratings. All administrators reported positive attitudes about the evaluation system and emphasized its value for teacher development over accountability. Attitudes towards and perceived level of accuracy in teacher evaluations were also similar between high- and low-validity groups, despite comments that there was little oversight or consequences from the district. Reported compliance with the evaluation process was also high for both groups of evaluators, except for the supplemental evaluation form.
  • Administrators in the higher validity group noted the value of district trainings in helping to build their evaluation skills, while only one of the lower validity evaluators talked about training from the district. Lower validity evaluators said that their experience teaching, in administration, or in business, was most helpful in conducting evaluations.
  • Evaluators in the more valid group tended to talk about using rubrics in a more analytical way. However, administrators in each group mentioned factors that led them adjust ratings toward more lenient assessments of teachers.
  • There were few differences in evaluator preparation between high- and low-validity groups, with most preparation focusing on preparing teachers for the process. Administrators in both groups tapped multiple, similar sources of evidence to conduct evaluations, most commonly classroom observations. Several teachers in both the high- and low-validity groups used a very structured approach to teacher evaluation.
  • In terms of school environment, the socioeconomic status and prior achievement of students did not differentiate the high- from low-validity groups of evaluators. Neither did the teaching or administrative experience of the evaluators nor their perceived credibility to the teachers they evaluated. In all but one case, relationships between teachers and evaluators were positive.
  • Evaluators overwhelmingly emphasized the formative purposes of teacher evaluation and tended to be lenient, focusing on praise in written evaluations rather than constructive feedback or suggestions for improvement. This leniency may reduce the validity of the evaluations.

Conclusions/recommendations

The authors conclude that evaluator will, skill and evaluation context did not have a clear or consistent influence on the validity of their teacher evaluations, that is, on the extent to which the evaluation scores were related to student achievement. Possible interpretations of this finding included complex, idiosyncratic interactions between evaluator characteristics, evaluators’ reliance on gut-level feelings about teachers, a lack of expectations or support to conduct evaluations accurately (e.g., low levels of accountability, no follow-up training, no consequences for most teachers), and a lack of sufficient detail in the data to reveal differences in evaluator will, skill and context.

Findings suggest that generating evaluation scores that are highly related to student achievement will take more than specific rubrics and basic training of evaluators. The authors advocate for the development of a “strong situation” that provides incentives for accurate evaluation, oversight that emphasizes ratings differentiate among teachers, and ongoing training and practice with feedback on accurate evaluations. They also caution against the use of teacher evaluation scores in making high-stakes decisions about tenure, promotion and compensation.

FFT focus

The teacher evaluation system in this district was based on the Framework, and the authors include detailed information about the instrument and its use in the article. The fact that some teachers’ scores on the Framework-based rubric and their students’ average achievement were highly correlated provides validation that the FFT describes observable practices that are associated with teaching that raises student achievement.

VIEW REPORT

2009: Excellence in Teaching Project

Study title/author(s)

Evaluation of the Excellence in Teaching pilot, year 1 report to the Joyce Foundation, conducted by Lauren Sartain, Sara Ray Stoelinga, and Eric Brown. (2009)

Purpose of study

To describe the initial results of a pilot teacher evaluation and improvement program aiming to provide a common definition of effective teaching, guide meaningful discussion and collaboration around teaching practice, and direct teacher development along a continuum that will help teachers have a greater impact on student learning.

Research questions

  • What are the technical properties—including reliability and validity—of the research tool itself?
  • How do principals perceive the utility of the new teacher evaluation system? Does it help achieve the stated goals?
  • How do teachers perceive the utility, fairness, and helpfulness of the new evaluation system? What components of the evaluation system are the most or least helpful? Are the pre- and post-conferences useful?
  • What supports are in place in the pilot year, are they effective, and how well are the goals and procedures communicated across the evaluation system?
  • Does the new system have the desired effect at the school level, including shaping the professional development, professional culture, teacher hiring, the quality of teaching, and student learning?

Population/ sample

Forty-four schools within 4 areas of the Chicago Public School district (approximately half of the schools within these areas) were randomly selected to participate in the 2008-2009 pilot of the new teacher evaluation system using the Danielson Framework for Teaching (FfT). Observations focused on beginning teachers and those who received low ratings under the prior checklist system. Administrators and external observers completed 277 matched observations of elementary and middle school teachers of English language arts, mathematics, science, social studies, and other.

Major results

  • The FfT has the potential to identify strong and weak teachers reliably. Administrators would benefit from additional support and training on certain levels (Basic and Proficient) and components of the Framework, especially in Domain 3 (Instruction):
    • 3a Communicating with Students
    • 3c Engaging Students in Learning, and
    • 3d Using Assessment in Instruction.
  • Teachers tended to have more difficulty implementing the instructional aspects of the Framework than those related to classroom management. Teachers struggled most with components 3b (Using Questioning and Discussion Techniques) and 3c (Engaging Students in Learning). They received the highest ratings on components 2e (Organizing Physical Space) and 2a (Creating an Environment of Respect and Rapport).
  • Framework ratings were not used for summative teacher evaluation during the pilot year of the initiative, but district officials did consider setting the performance benchmark at Basic proficiency. This would increase the proportion of teachers classified as low-performing from 0.3% to 8%.
  • Principals and teachers expressed positive feedback regarding the quality of the Framework and its ability to measure teaching performance accurately.
  • Teachers were less positive about pre- and post-conferences associated with their classroom observations, citing concerns about the time commitment and implementation of conferences.
  • The majority of principals and teachers expressed positive opinions about the trainings they received on the FfT.
  • Principal buy-in was categorized along 4 themes:
    • Those who felt a “paradigm shift” in their ideas about evaluation and appreciated the Framework’s increased objectivity and attention to specific teaching skills;
    • Those with high enthusiasm who perceived strong teacher buy-in along and substantial changes to classroom practice as a result of the new evaluation system;
    • Those who expressed mixed emotions about numerous initiatives underway and the labor-intensive nature of FfT observations; and
    • Those with low enthusiasm who felt they were already doing the right type of evaluation or that they “just knew” their teachers’ abilities, felt evaluations had little influence on instructional practice, and worked with low buy-in teachers.
  • Over half of principals reported high buy-in and positive perceptions of the new evaluation system despite the rigorous evaluation process and time commitment.

Conclusions/recommendations

The pilot evaluation identified opportunities for additional training, including

  • taking notes and translating them into evidence for FfT ratings,
  • learning more about the content of Framework components,
  • facilitating more reflective pre- and post-conferences,
  • deepening understanding of Framework Domains 1 & 4 ,
  • increasing knowledge about the Framework and the Excellence in Teaching pilot,
  • expanding training about the entire evaluation process,
  • digging deeper into challenging components (e.g., 2e, 3a, 3c, and 3d),
  • providing video exemplars of various levels of performance for each component, and
  • managing the time and implementation challenges of the evaluation system.

Most principals stuck to the time limits described for the evaluation process and were highly engaged. However, challenges included higher expectations of teaching practice under the new system, constrained timeframes that limited teachers’ opportunities to reflect and improve their practice, and the scheduling demand of evaluating all teachers within a single school year. Due to challenges with inter-rater reliability and rating certain problematic components, several approaches may be useful in setting benchmarks for summative teacher evaluations:

  • use of a “meets standards”/”does not meet standards” scale,
  • differential weighting of Framework components, including down-weighting challenging components until reliability is improved, and
  • use Framework ratings to identify appropriate supports and professional development for teachers.

FfT focus

The Excellence in Teaching pilot in Chicago used the Danielson Framework for Teaching (FfT) to observe and rate the quality of classroom practice of teachers in the district. Despite some challenges with differentiating levels of proficiency and rating more complex, challenging instructional components, the FfT showed potential for high reliability with additional training and rating supports. Most principals and teachers felt positively about the new evaluation process in terms of its quality and ability to assess teaching practice accurately. Additional training needs and supports for using the Framework ratings for summative as well as formative evaluations are identified.

VIEW REPORT

2006: Standards-based teacher evaluation as a foundation for knowledge and skill-based pay

Study title/author(s)

Standards-based teacher evaluation as a foundation for knowledge and skill-based pay, conducted by Herbert G. Heneman III, Anthony Milanowski, Steven Kimball, and Allan Odden. (2006)

Purpose of study

To review findings from research on standards-based teacher evaluation systems, many of which use the Danielson Framework for Teaching (FFT) to define a competency model of effective teaching against which individual teachers’ classroom practice is compared.

Research questions

  • What is the relationship between teachers’ standards-based teacher evaluation scores or ratings and the achievement of their students?
  • How do teachers and administrators react to standards-based teacher evaluation as a measure of instructional expertise?
  • Is there evidence that standards-based teacher evaluation systems influence teacher practice?
  • Do design and implementation processes make a difference?

Population/ sample

The authors review research from four districts that implemented standards-based teacher evaluation systems based on the FFT: Cincinnati, Washoe, Coventry, and Vaughn. The number of teachers involved ranged from 40 at one charter school in Los Angeles (Vaughn) to 3,300 teachers at 88 schools in Washoe County, Nevada. A table on page 3 of the report summarizes the characteristics of the four evaluation sites.

Major results

  • There were positive relationships between standards-based teacher evaluation scores and student achievement, but the strength of the relationship varied between sites. At Vaughn and in Cincinnati the relationships between teacher ratings and student achievement were substantial (0.26 to 0.37) while in Washoe County and Coventry the relationships were smaller (0.11 to 0.23). The correlations in Cincinnati and at Vaughn may have been higher because these sites used multiple evaluators and received intensive, high-quality training.
  • Interviews and surveys at three sites revealed positive reactions to the standards-based teacher evaluation system, especially to the competency model (the FFT) underpinning it. Teachers agreed that the performance described at higher levels of the rubric described high-quality teaching. They appreciated that this provided a clear and concise understanding of performance expectations for their instructional practice. Teachers also believe that the standards improved dialogue with principals about teaching. Other aspects of the new evaluation system inspired mixed reactions.
  • Administrators also reported positive perceptions of the competency model and evaluation system overall. They also believed that the evaluation dialogue improved under the new system. They felt that the evidence collected and explicit rubrics supported them as evaluators. The increased amount of time required to conduct the new evaluation procedures were viewed less favorably, and some administrators sought to decrease late nights and work on weekends by reducing the number and length of observations, providing more general feedback to teachers, or focusing their time on new or struggling teachers.
  • Teachers reported positive impacts on their instructional practice due to the new evaluation system, but evidence collected by evaluators suggested broad, but shallow, effects on teaching practice. Common impacts on teaching included engaging in more reflection, improved lesson planning, and better classroom management rather than more ambitious instructional practices. Feedback and assistance received from administrators also focused on classroom management and general pedagogy. Many teachers also increased their attention to student standards because of their own evaluations.
  • The three sites that conducted pilot tests of the new system took advantage of the opportunity to build support and address issues that may have caused implementation problems. However, challenges such as resistance from the teacher association and difficulties with the complexity of the performance evaluation process persisted. Implementation problems undermined the validity and fairness of the new system in the eyes of teachers.
  • Teacher training by principals tended to be inconsistent and process oriented, rather than developing an understanding of the performance standards. Evaluator training also varied in quality. Administrator training offered insufficient guidance on providing useable feedback to teachers, setting performance goals, and coaching.
  • Other implementation challenges involved lack of alignment between human resource systems, key district personnel who remained resistant or disengaged, and low accountability for administrators to conduct timely evaluations and provide constructive feedback to teachers.

Conclusions/recommendations

The instructional practices measured by the standards-based teacher evaluations contribute to student learning. Mixed reactions to certain aspects of the new evaluation system, however, suggested that its design and implementation affect its acceptability to teachers and administrators. The authors offer the following guidelines for designing and implementing a standards-based teacher evaluation system to maximize its success:

  • Emphasize that teacher performance is a factor of strategic importance to closing student achievement gaps;
  • Develop a set of teaching standards and scoring rubrics that reflects the knowledge and skills that teachers need to provide effective instruction;
  • Prepare for the additional work required of teachers being evaluated and those conducting evaluations by offering release time, using peer evaluators, and giving incentives for administrators to prioritize teacher evaluation and feedback;
  • Provide early and on-going training for teachers on the performance competencies, the purpose and process of the evaluation system, and the knowledge and skills needed within it;
  • Train administrators to conduct accurate observations and provide useful feedback and coaching to teachers, as well as on the performance competency model and the purpose and process of the new evaluation system;
  • Consider using multiple evaluators to decrease the burden of conducting observations;
  • Provide evaluators with high-quality training using a structured scoring process;
  • Support teachers in acquiring knowledge and skills to excel with concrete and specific feedback, professional development, and modeling aspects of a high performance;
  • Align the human resources system with the performance competency model underlying the new standards-based teacher evaluation system to reinforce the importance of the model and promote a shared sense of high-quality instruction;
  • Pilot the system to work out details and address potential problems; and
  • Conduct interrater agreement and validity analyses.

FFT focus

The Framework for Teaching served as the basis for the competency model underlying the standards-based teacher evaluation system at all four sites reviewed in this article. Teacher evaluation scores were associated with student achievement growth, especially in districts that offered high-quality training to classroom observers. Teachers and administrators agreed that the competency models adapted from the FFT reflected good teaching and promoted more constructive dialogue about teaching. The authors suggest that the FFT may be supplemented by evaluating teachers’ skill in implementing specific instructional programs related to districts strategies to improve student achievement (i.e., Success for All or a specific curriculum) and by assessing pedagogical content knowledge.

VIEW REPORT

2005: Teacher quality and educational equality: Do teachers with higher standards‐based evaluation ratings close student achievement gaps?

Study title/author(s)

Teacher quality and educational equality: Do teachers with higher standards‐based evaluation ratings close student achievement gaps? conducted by Geoffrey Borman and Steven Kimball. (2005)

Purpose of study

To determine whether teacher evaluation scores explain differences in achievement for students with different prior achievement scores and social backgrounds and to explore the distribution of teacher quality (as measured by the teacher evaluation system) among classrooms.

Research questions

  • Is teacher quality distributed equally across classrooms of varying compositions?
  • Is teacher quality associated with both excellence and equality in terms of student achievement?

Population/ sample

The researchers used teacher evaluation scores, student demographic and student achievement data from the Washoe County School District in Nevada, which implemented a teacher evaluation system based on Charlotte Danielson’s Framework for Teaching (FFT) in 2000. Teachers with sufficient observational data to create a composite score for the four domains of the FFT (N = 397) and their 4th, 5th, and 6th grade students (N = 7,335) were included in analyses.

Major results

  • Teachers in classrooms with a high concentration of students receiving free or reduced-price lunches tended to have lower evaluation scores. On average, these teachers scored half of a standard deviation lower on the FFT composite score than teachers whose classrooms contained few or no students who received free or reduced-price lunches.
  • Teachers in classrooms with a high concentration of minority students also tended to have lower evaluation scores. These teachers also scored half a standard deviation lower on the composite evaluation score, on average, than their peers teaching primarily white and/or Asian American students.
  • Classrooms composed students with lower pretest scores in reading and math were also largely taught by teachers who received lower evaluation scores than teachers of high-achieving students, by a similar magnitude.
  • Hierarchical linear models found that the effect of teacher quality on student achievement varied by subject (math or reading) and grade level. On average, a “good” teacher (with an evaluation score 1 standard deviation above the mean, or at the eighty-fourth percentile of the distribution) raised student achievement one fifth of a standard deviation higher than a “bad” teacher (with an evaluation score 1 standard deviation below the mean, or at the sixteenth percentile in the district).
  • Overall, “equalizing” effects, or the ability of teachers with higher evaluation scores to close achievement gaps between more and less privileged students, were close to zero. However, results of the model for 4th grade reading classes indicated that higher-quality teachers made some progress in closing the achievement gap separating poor and nonpoor students.

Conclusions/recommendations

While results suggest that disadvantaged students are systematically assigned to lower-quality teachers, it is also possible that teachers of disadvantaged students are rated lower because of the characteristics of their students. The authors also propose that school context, such as limited capacity or a weak professional culture, may constrain the performance of otherwise good teachers, and points out that each interpretation has different implications for equality of educational opportunity, the evaluation system, and school organizational processes.

Approximately 75% of the variation in student achievement occurred within classrooms. Thus, while “good” teachers raised the overall achievement of their students more than “bad” teachers did, the “good” teachers did not reduce gaps between their highest- and lowest-achieving students’ test scores. If teacher quality is expected to include the ability to close achievement gaps between more and less privileged students, this aspect of teacher quality may not be reflected by the composite score used for this study. The authors recommend placing greater emphasis on 22 rubrics across the four domains of the evaluation system that relate to closing achievement gaps and promoting training and professional development opportunities to enhance multicultural awareness and educational equity.

FFT focus

The teacher evaluation system in Washoe County, Nevada, is based on the Framework for Teaching, so results associating those teachers’ evaluation scores with student achievement and the potential to close achievement gaps validates that the Framework is measuring teacher quality.

 

VIEW REPORT

"Danielson’s Framework for Teaching has been a revelation to me; the best analogy I can offer is that the Framework is like having voice-guided GPS to direct you to a destination, when before you might have only had a destination name and an outdated road map."

Pre-Service Teacher, May 2016

“[The consultant] gave the best PD I have seen in 15 years of teaching, and was the first to explain [the] Danielson [Framework] in a human way. Bravo.”

A teacher, June 2015

“I am so impressed with the Danielson Group consultants. They are all so real. Your trainers helped make [proficient] teaching stronger and steered [basic teaching] toward increasing effectiveness.” 

A principal, June 2015

"Due to your consultant's seamless and meaningful transitions, knowledge of content, and rapport with the audience, the room was alive with energy and it made us all feel ready to begin the year off with success."

"Never before have I seen a group of seasoned educators like your consultants master the art of communicating with an audience with varied levels of expertise and interests. The two days that I spent with your team, I walked away with a desire to use the rubric to truly enhance my own practice."

"I left with a renewed look at the rubric, thinking that the rubric is the Great Equalizer! We can ALL enhance our practice by using it as a tool and a roadmap to produce students who think and are ready for college and careers. THANK YOU!"

"Your consultants' presence and organization of the day will not only impact the new teachers that attended, but will make the year alive for a vast number of students this year."

"Our workshop focused on calibration and inter-rater agreement training, so it was directly aligned to our individual and collective work with teacher performance evaluation.  With new administrators on the team, this type of training is critical."

"We were highly impressed with our Danielson Group consultant and the workshop. We have nothing but positive things to share. Staff have been emailing us, thanking us. This is the most worthwhile presentation we've been to in a while."

"The workshop you provided was hands-on, interesting, practical, and respectful of time limits. I heard more positive feedback about this workshop from staff than I have about any other."

"We wanted to let you know how much we appreciated the flexibility and professionalism that your consultant provided in our unique context. It helped us to keep on track with our schedule at a critical time. For that we are truly grateful."

"Your consultant presented a perfectly differentiated learning experience for all our principals. They were highly engaged, as demonstrated by on-topic conversations using academic language, completion of tasks requiring evidence identification, and note taking and 'grading' during classroom videos of teaching."

"Our school principals said the Framework observation training was the best training they had ever had, including the training provided when earning their Master’s degrees."

"I have a principal who was so excited about the breakthrough work with her staff in special education. I am already getting my money back!"

"My concern about the extra time it would take to implement the Framework successfully was not accurate. It took about the same amount of time as our prior evaluation system, and the benefits in professional growth and increased student achievement were more than worth it."

"I want to truly thank you for the brilliant job that you did with our training. I got such positive feedback from the team. They feel re-energized and like they have a direction and new tools to do the job."