Best class ever, except for gathering statistics

I just turned in grades for the summer intermediate skills class I had over the summer, and set out to compile some useful statistics just as in previous semesters. Unfortunately, the data doesn’t* say much… because the scores were too high. Every assignment category, from attendance to final exams, was higher than the same class in spring semester, sometimes by ridiculous amounts. For example:

Attendance

Spring 2018: 90.09%, standard deviation 12.9

Summer 2018: 95.85%, stdev 7.5
(including one student who was out of the country for 2 weeks in a row – otherwise it’d be 97.17% and stdev 3.97)

Homework

Spring 2018: 84.36%, stdev 14.0

Summer 2018: 96.19%, stdev 7.9

Grammar quizzes

Spring 2018: 83.9% stdev 15.5

Summer 2018: 87.95%, stdev 9.7

As the standard deviations imply, there wasn’t much spread between the highest- and lowest-performing students, and even less between the many varieties of average-performing students. This was basically a good thing – there is no upside to a large spread of homework scores for pedagogy or validity. It’s not as if my homework scores failed to validly** track some educational construct because everyone was doing uniformily well.

Summer classes have a lot of perks. They meet twice as often, 4 days instead of 2, letting you take 2 days out of the week for something like student presentations without creating a yawning 2-week gap between instructional days. The students are more dedicated – only 20% of my students in summer were taking any other classes. The class meetings are shorter too, which probably helped my students, about half of which worked. Of those who worked, 19% had morning shifts, 73% had afternoon shifts, 64% evening, and 30% night (between 10 PM and 5 AM). Despite these fairly high numbers, almost everyone did almost all the homework and did about equally well on projects, quizzes, and tests.

It’s a bit of a shame for data collection, because although I haven’t cracked the statistics textbook I was convinced to buy, I did start the term with a much more complete questionnaire on my students’ jobs, as you can see. In the end, presumably because of the narrow spread in grades overall, this yielded some correlations (evening shifts were most negatively correlated with final grades) but no significant differences between working and non-working students, even at p<0.05. Scores were too similar to yield differences among different types of students.

This didn’t confirm my big hypothesis, that working students are at an unfair disadvantage given that community colleges exist specifically to serve non-traditional college students. I have, however, narrowed my hypotheses for future work surveys a bit because “hours spent using English at work” was about as negatively correlated with final grades as “total weekly working hours” (-0.46 vs. -0.39). Next semester, I will have to compare hours of English use at work to overall hours of English use to see if working students have more opportunity for input and output, and if this is so, ask why this doesn’t yield significantly higher performance on at least some types of assignments. I can anecdotally see that students who use English at work benefit from doing so. I need to plan my classes so that this is reflected in their grades, or at least not reflected negatively.

If future classes continue to find a difference between working and non-working students irrespective of whether they use English at work, it may be that the type of competence fostered by having a service industry job where you use your L2 doesn’t outweigh the necessity of somewhat narrow means of assessment in an academic ESL class. For example, it’s inevitably my working students who have the most natural grasp of which modals can be used for formal and casual requests, offers, or requests for permission, but unless they can carve out time between the end of their shift and taking care of an elderly parent to show that grasp in an assignment, their homework scores won’t be commensurate with their abilities. The lens of assessment is only focused on students when they do assignments, not when they practice modals for hours at a time every day at work.

It may help make my classes more equitable in this regard if I minimize the amount of “assignment” they have to do to prove they’ve been getting input, while still being hard enough to fake to prevent cheating. I have a type of assignment that is aimed at dragging along as much real-world practice as possible for a minimum of “assignment”, which is sometimes very close to “go get some input, then check a box when you’re done”. An example is a book report where the students choose any graded reader from our library and then turn in a pretty perfunctory worksheet that they could probably do in 5 minutes. To me, this type of assignment is justified by 1) the high ratio of interlanguage-developing work to product, 2) the promotion of available outside resources, and 3) the high motivation levels of my intermediate students, which reduce the odds of cheating (also, the low grading time). If a similar assignment said “start 3 conversations and fill out a perfunctory report afterwards”, this could reward the time my working students spend talking without pandering specifically to them.

Maybe the future of all ESL homework is “get input, and prove you got it”. At least at the intermediate (i.e., not academic writing) level, this probably maximizes opportunities for interlanguage development while minimizing what are in my view the less valid aspects of the grading process.

input.gif

*”Data” is an uncountable noun, unless you are writing for an academic journal or have a mobile datum plan like Titus Andromedon’s that just comes with the one.

**Now that we’ve split the infinitive, the only question is whether we’ll be able to fuse it in a stable way and provide unlimited, grammatical energy for the entire world.

Stereotype threat and ELT

When they speak their L2, our language students are undertaking something mentally taxing while monitoring themselves for mistakes and in the presence of people who expect them to struggle. This is almost a perfect recipe for stereotype threat.

What is stereotype threat?

In case you’re behind on your liberal intelligentsia required reading, stereotype threat (ST) is “subtle reminders of stereotypes that presume the incompetence of certain groups. This ‘threat in the air’ can cue a concern with confirming these stereotypes that can impair the ability to perform up to one’s potential” (Schmader, 2010, p. 14). In short, fear of confirming negative stereotypes about one’s group takes up mental overhead and reliably and demonstrably hurts performance, and triggering this effect is as simple as reminding people of the stereotype before giving the test. This effect is real and has been replicated many times with many different groups – men and women most often (Johns et al, 2005) , but also White men and Asian men (Aronson et al, 1999) – even tracking implicit bias scores on a national scale in a study with hundreds of thousands of subjects (citation too long – click here).

The precise psychological mechanism behind this is apparently under dispute, but general anxiety along the lines of an affective filter (I don’t think I need a citation for this) seems not to be it. Rather, mental resources seem to be taken up imagining ways to fail. Working memory available for the task is reduced in favor of monitoring oneself for mistakes and spontaneous, intrusive negative thoughts (Cadinu et al., 2005; Schmader, 2010).

What’s it got to do with ELT?

I think it should be clear that our students, to varying degrees, are under ST almost all the time. Less obvious is the fact that many teachers are, too. Learners and teachers alike may be facing a penalty to their language use that has a cause besides incomplete knowledge or acquisition.

If placed in a context where stereotypes are known and especially when ELLs are implicitly being compared to NSs, we can expect ELLs to perform worse than otherwise at language-mediated tasks (I’m reminded of this article in which the author recounts having found solace in the relatively language-free world of math in her teenage ESL years – the Asian math stereotype probably didn’t hurt either). We can expect NNS teachers also to make more errors when they know they are being evaluated by NS teachers. Performance is likely to be worse both for input and for output in both cases. As Rydell et al (2010) write, “At least in the present task setting, we see that overt emphasis on the existence of the stereotype both prevents learning … and, to a significant degree, prevents expression of learning that has already occurred” (p. 14046) (Yes, that is the real page number). ST is likely to affect students in the ELT classroom as well – an ESL class in the USA where everyone thinks “Asian students don’t talk” is probably worse for Asian students, all other things held equal, than an EFL class in Asia taught by a NNS.

These conditions follow NNSs outside the classroom, too. Even well-known ELLs like Arnold Schwarzenegger and Melania Trump have jokes made at the expense of their intelligence – mostly based on accent, the hardest part of NS speech to adopt. It doesn’t seem to have discouraged Arnold, but whenever he speaks in public he is one error away from confirming everyone’s perception of him. I have certainly experienced this feeling myself, and I didn’t have Arnold’s fortitude. Our students’ lives are replete with conditions in which they will be judged on their language use and stereotypes about their national group or ESL students in general are known.

The mechanisms of ST appear especially designed to vitiate SLA. Working memory is probably as relevant as a danger to language acquisition as it is to math, but hyperconsciousness of mistakes is clearly more relevant to language use than many other subjects. Teachers may be instructing students to do exactly that as an effort to encourage noticing (Schmidt, 1993), usually thought of as a good thing, while ST holds self-monitoring to be an inhibitor of performance (Schmader, 2010). It is possible that while noticing facilitates acquisition in the long run, it distracts from other essential processes (e.g. understanding the intentions of one’s conversation partner) in the short run.

In fact, one effect of ST has been described as reduced ability to sort relevant information from noise, which would clearly hurt students’ ability to notice and turn input to intake. One such experiment used Chinese characters to test women’s “visual processing”, and found a ST effect of clear relevance for language teachers (Rydell et al, 2010).

Questions for study

If you haven’t noticed yet, I haven’t done any research to back up my suspicions that ST is an extremely important future topic for SLA. I do have a few ideas for research questions:

  • Assuming ST for SLA is real, how will we know? Grammaticality judgment tests seem the most analogous to the mathematics-based research on ST that has been the most common so far, but wouldn’t real-time processing skills (like participating in a conversation) show a larger effect?
  • What constitutes a “trigger” for ST? Is the presence of NSs enough, or the possibility that NSs will read/see the students’ output, or just a box for “nationality” at the top of the test?
  • For that matter, how would you avoid triggering ST or creating a control group? ST-inducing instructions often look something like Candinu et al’s: “recent research has shown that there are clear differences in the scores obtained by men and women in logical-mathematical tasks” (2005 p. 574) (Interestingly, they left it to the test-takers to infer that women did worse, not just different, on these tests.) Non-ST instructions either simply leave that part out or explicitly negate it, along the lines of “… that there are no differences in the scores…”. How would this condition be accomplished plausibly on a language test of NNSs? Would it be believable to preface a test with, “This grammar topc shows no measurable differences between American and Chinese test-takers”?
  • What groups have relevant stereotypes that could trigger ST? Is “ESL student” enough of a stigma? (Many students act as if it were.)
  • Are different ELT classes more threatening than others? Can interventions by the teacher mitigate ST, for example by making explicit the fact that students will not be judged by NS norms?

References

Aronson, J., Lustina, M. J., Good, C., Keough, K., Steele, C. M., & Brown, J. (1999). When white men can’t do math: Necessary and sufficient factors in stereotype threat. Journal of experimental social psychology 35/1, pp. 29-46.

Cadinu, M., Maass, A., Rosabianca, A., and Kiesner, J. (2005). Why Do Women Underperform under Stereotype Threat? Evidence for the Role of Negative Thinking. Psychological Science 16/7 pp. 572-578. Available at: http://www.jstor.org/stable/40064271

Johns, M., Schmader, T., & Martens, A. (2005). Knowing is half the battle: Teaching stereotype threat as a means of improving women’s math performance. Psychological Science 16/3, pp. 175-179.

Rydell, R. J., Shiffrin, R. M., Boucher, K. L., Van Loo, K., Rydell, M. T., & Steele, C. M. (2010). Stereotype threat prevents perceptual learning. Proceedings of the National Academy of Sciences of the United States of America, 107/32 pp. 14042-14047. Available at: http://www.jstor.org/stable/25708852

Schmader, T. (2010). Stereotype Threat Deconstructed. Current Directions in Psychological Science 19/1 pp. 14-18. Available at: http://www.jstor.org/stable/41038531

Schmidt, R. (1993). Awareness and Second Language Acquisition. Annual Review of Applied Linguistics 13, pp. 206-26.

Tokyo Medical University and anticipatory childcare penalties

As you may have heard, in a scandal that incorporates almost everything toxic about Japan’s educational, workplace, and oyaji cultures, Tokyo Medical University, a top medical school in Japan, was discovered to have had a secret policy of discriminating against female applicants to their medical program for almost the last decade. Specifically, they reduced female applicants’ entrance exam scores to 0.9 or 0.8 of their actual levels so as to keep the female population of incoming classes down to 30%. College entrance exams being pretty much the single most important determiner of a young person’s career prospects, lots of people are livid in Japan, and the international press has picked up the story. It’s quite a blood-boiler.

juken_goukakuhappyou_cry

Here are a few random thoughts that squeak out between the anger:

Read More »