How can assessments from multiple observations and artifacts be combined in a single rating for each of the components in the Framework for Teaching?

For the observable domains, that is, Domain 2 and Domain 3, the answer to this question depends on how many observations have been conducted, whether they are announced or unannounced, and whether the information from each observation is captured electronically or only on paper. Ideally, an observer can look at the assessments from each observation, and examine them as a whole; this is done most readily if the information has been stored electronically. But, regardless of the technology used, the observer must consider the “preponderance of evidence” to determine the level of performance for each component. Alternatively, the process can specify that the ratings for each of the observations are averaged to arrive at a mean or median score for the component. The same process can be followed for Domains 1 and 4: following an examination of the artifacts for each component, a judgment is made linking the evidence to the statements in the levels of performance.

It should be remembered, however, that the score resulting from an average of scores on individual components of the FFT is just that, an average of performance. It, in itself, does not constitute an evaluative judgment about that teacher. For example, if a teacher’s performance improves over the course of a school year in some aspect of teaching, the evaluator might want to consider that improvement when making the final, evaluative, judgment. (See next question/answer.)

How can the evidence from observations and artifacts be used to evaluate a teacher? That is, how can the assessments of the teacher’s practice be converted to an evaluation of the teacher?

The headings in the levels of performance in the Framework for Teaching (unsatisfactory, basic, proficient, and distinguished) are descriptive words – that is, they don’t on their own, make a judgment; they merely describe the practice. On the other hand, words used to evaluate teachers (words like ineffective, needs improvement, effective, and highly effective) are judgmental words; they are used to evaluate. Many educators are inclined to simply equate the descriptive words with the evaluative words, and mandate, for example, that in order for a teacher to receive an “effective” rating, all the components must be rated at the “proficient” level. Some systems even replace the FFT descriptive words (basic, etc.) with the judgment words (effective, etc.)

I recommend that school districts (or states, if the decisions are made at that level) use different words for the evaluative judgments made regarding teachers from the words used for the levels of performance of practice (such as unsatisfactory, etc. in the Danielson Framework). In that case, evaluators must be able to translate from one to the other.

How can evaluations about teacher practice, based on the Framework for Teaching, be combined with other measures of teacher effectiveness?

This requires the application of an algorithm, typically specified by the state, or district administration.

