Reviewing Social and Emotional Learning for ESSA: MOOSES, not Parrots

05/25/2017 11:53 am ET

This blog was co-authored by Elizabeth Kim

I’m delighted to see all the interest lately in social and emotional skills. These range widely, from kindness and empathy to ability to delay gratification to grit to belief that effort is more important than intelligence to avoidance of bullying, violence, and absenteeism. Social and emotional learning (SEL) has taken on even more importance as the Every Student Succeeds Act (ESSA) allows states to add to their usual reading and math accountability measures, and some are adding measures of SEL. This makes it particularly important to have rigorous research on this topic.

I’ve long been interested in social-emotional development, but I have just started working with a student, Liz Kim, on a systematic review of SEL research. Actually, Liz is doing all the work. Part of the purpose of the SEL review is to add a section on this topic to Evidence for ESSA. In conceptualizing our review, we immediately ran into a problem. While researchers studying achievement mostly use tests, essays, products, and other fairly objective indicators, those studying social-emotional skills and behaviors use a wide variety of measures, many of which are far less objective. For example, studies of social-emotional skills make much use of student self-report, or ratings of students’ behaviors by the teachers who administered the treatment. Researchers in this field are well aware of the importance of objectivity, but they report more and less objective measures within the same studies depending on their research purposes. For academic purposes this is perfectly fine. SEL researchers and the readers of their reports are of course free to emphasize whichever measures they find most meaningful.

The problem arises when SEL measures are used in reviews of research to determine which programs and practices meet the ESSA standards for strong, moderate, or promising levels of evidence. Under ESSA, selecting programs meeting strong, moderate, or promising criteria can have consequences for schools in terms of grant funding, so it could be argued that more objective measures should be required.

In our reviews of K-12 reading and math programs for Evidence for ESSA, we took a hard line on objectivity. For example, we do not accept outcome measures made by the researchers or developers, or those that assess skills taught in the experimental group but not the control group. The reason for this is that effect sizes for such studies are substantially inflated in comparison to independent measures. We also do not accept achievement measures administered individually to students by the students’ own teachers, who implemented the experimental treatment, for the same reason. In the case of achievement studies that use independent measures, at least as one of several measures, we can usually exclude non-independent measures without excluding whole studies.

Now consider measures in studies of social-emotional skills. They are often dependent on behavior ratings by teachers or self-reports by students. For example, in some studies students are taught to recognize emotions in drawings or photos of people. Recognizing emotions accurately may correlate with valuable social-emotional skills, but an experiment whose only outcome is the ability to recognize emotions could just be teaching students to parrot back answers on a task of unknown practical value in life. Many SEL measures used in studies with children are behavior ratings by the very teachers who delivered the treatment. Teacher ratings are sure to be biased (on average) by the normal human desire to look good (called social desirability bias). This is particularly problematic when teachers are trained to use a strategy to improve a particular outcome. For example, some programs are designed to improve students’ empathy. That’s a worthy goal, but empathy is hard to identify in practice. So teachers taught to identify behaviors thought to represent empathy are sure to see those behaviors in their children a lot more than teachers in the control group do, not necessarily because those children are in fact more empathetic, but because teachers and the children themselves may have learned a new vocabulary to recognize, describe, and exhibit empathy. This could be seen as another example of “parroting,” which means that subjects or involved raters (such as teachers or parents) have learned what to say or how to act under observation at the time of rating, instead of truly changing behaviors or attitudes.

For consequential purposes, such as reviews for ESSA evidence standards, it makes sense to ask for independently verified indicators demonstrating that students in an experimental group can and do engage in behaviors that are likely to help them in life. Having independent observers blind to treatments observe students in class or carry out structured tasks indicating empathetic or prosocial or cooperative behavior, for example, is very different from asking them on a questionnaire whether they engage in those behaviors or have beliefs in line with those skills. The problem is not only that attitudes and behaviors are not the same thing, but worse, that participants in the experimental group are likely to respond on a questionnaire in a way influenced by what they have just been taught. Students taught that bullying is bad will probably respond as the experimenters hope on a questionnaire. But will they actually behave differently with regard to bullying? Perhaps, but it is also quite possible that they are only parroting what they were just taught.

To determine ESSA ratings, we’d emphasize indicators we call MOOSES: Measureable, Observable, Objective Social Emotional Skills. MOOSES are quantifiable measures that can be observed in the wild (i.e., the school) objectively, ideally on routinely collected data unlikely to change just because staff or students know there is an experiment going on. For example, reports of disciplinary referrals, suspensions, and expulsions would be indicators of one type of social-emotional learning. Reports of fighting or bullying incidents could be MOOSES indicators.

Another category of MOOSES indicators would include behavioral observations by observers who are blind to experimental/control conditions, or observations of students in structured situations. Intergroup relations could be measured by watching who students play with during recess, for example. Or, if a SEL program focuses on building cooperative behavior, students could be placed in a cooperative activity and observed as they interact and solve problems together.

Self-report measures might serve as MOOSES indicators if they ask about behaviors or attitudes independent of the treatment students received. For instance, if students received a mindfulness intervention in which they were taught to focus on and regulate their own thoughts and feelings, then measures of self-reported or peer-reported prosocial behaviors or attitudes may not be an instance of parroting, because prosocial behavior was not the content of the intervention.

Social-emotional learning is clearly taking on an increasingly important role in school practice, and it is becoming more important in evidence-based reform as well. But reviewers will have to use conservative and rigorous approaches to evaluating SEL outcomes, as we do in evaluating achievement outcomes, if we want to ensure that SEL can be meaningfully incorporated in the ESSA evidence framework. We admit that this will be difficult and that we don’t have all the answers, but we also maintain that there should be some effort to focus on objective measures in reviewing SEL outcomes for ESSA.

This blog is sponsored by the Laura and John Arnold Foundation

This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.
CONVERSATIONS