Do the timing and number of edits on a draft predict improvement?

Since I started using Google Classroom for writing classes a few years back, I’ve noticed a pattern in the emails Google sends you whenever a student clears a comment you left. A few times, I’ve been able to tell when a student was still working on a paper past the deadline or if they got enough sleep the night before (emails at 3:20 AM are a bad sign). Most often though, you just find that a lot of students are making edits the morning that a paper is due, as your first email check of the morning features 30+ emails all saying “Bob resolved a comment in Final Essay”.

There exists a tool called Draftback (introduced to me, as with many edtech tools, by Brent Warner), a browser extension for Chrome, that lets you replay the history, letter by letter, of any Google Doc that you have edit access on. Its most obvious utility is as a tool for detecting academic dishonesty that plagiarism checkers like Turnitin miss (like copy/pasted translations, which show up in the editing history as whole sentences or paragraphs appearing all at once as opposed to letter by letter). It also has the benefit of showing you the exact times that edits were made in a document, which you can use to track how quickly students started responding to feedback, how many revisions they made (grouped helpfully into sessions of edits made less than 5 minutes apart), and whether these revisions were all made in the 10 minutes the student said he was just running to the library to print it. Draftback is the kind of tool that you hope not to need most of the time, but is hard to imagine life without when you need it.

This video gives a good introduction to Draftback.

With the pattern in my email inbox fresh in my mind (a term just having ended here), I thought I’d use Draftback to see whether this flurry of last-minute editing had some bearing on grades. To be specific, I used Draftback to help me answer these questions:

  • Do numbers of edits correlate with scores on final drafts (FD) on papers?
  • Does the timing of edits correlate with FD scores?
  • Do either of these correlate with any other numbers of interest?

This required quite a bit of work. First, I copied and pasted rough draft (RD) and FD scores for each one of my students’ essays for the past 3 terms, totalling 6 essays, into a big Google Sheet, adding one more column for change in grade from the RD to the FD (for example, 56% on the RD and 92.5% on the FD yields a change of 65.18%). Then, I generated a replay of the history of each essay separately. Because each essay is typed into the same Google Doc, this gives me the entire history of the essay, from outline to final product. After each replay was generated (they take a few minutes each), I hit the “document graphs and statistics” button in the top right to see times and numbers of edits in easier-to-read form. I manually added up and typed the timing and number of the edits into the Google Sheet above. Last, I thought of some values culled from that data I might like to see correlated with other values. Extra last, I performed a few t-tests to see if the patterns I was seeing were meaningful.

(The luxury of a paragraph about how annoying the data was to compile is part of the reason I put these on my blog instead of writing them up for journals.)

Example “document graphs and statistics” page. From this, I would have copied 1468 edits for the due date (assuming the due date was Monday the 30th), 79 edits 4 days before the due date, and 1911 edits for 5 days before the due date, with 0 edits for every other day.

The values that I thought might say something interesting were:

  • % of edits (out of all edits) that occurred on a class day
    • I’m curious whether students who edit on days when they don’t actually see my face do better – i.e., if students who edit on the weekends write better. Eliminating class days also helpfully eliminates lab days, the two class days a week when all students are basically forced to make edits. Incidentally, our classes meet Mon-Thu and final drafts are always due on the first day of the week. The average across all the essays was 63%, with a standard deviation of 38%.
  • % of edits that occurred on the due date
    • Specifically, before 1 PM – all my final drafts are due at the beginning of class, and all my classes have started at 1 PM this year. My assumption is that a high % of edits on the due date is a sign of poor work habits. The average was 21% with a standard deviation of 31%.
  • total # of edits
    • One would hope that the essay gets better with each edit. This number ranged from near 0 to more than 6000, with both an average and standard deviation of about 1700. Obviously, if you calculate this number yourself, it will depend on the length of the essay – mine were all between 3 and 5 pages.
  • maximum # of edits per day
    • I’m interested in whether a high number of edits per day predicts final grades more than a high number of edits total. That is, I want to know if cram-editing benefits more than slow-and-steady editing. The average and standard deviation for this were both about 1200.
  • # of days with at least 1 edit
    • Same as the above – I want to know if students who edit more often do better than ones who edit in marathon sessions on 1 or 2 days. The average was 3.25 days with a standard deviation of about 1 day.

All of the above were computed from the due date of the last RD to the due date of the FD, up to a maximum of 1 week (my classes last for 6 weeks, and there is very little time between drafts – read more about the writing process in my classes here). When I was done, after several hours of just copying numbers and then making giant correlation tables, I had hints of what to look into more deeply:

2 essays from each student, each taken separately.

As you can see in cells C9-H14 (or duplicated in I3-N8), students didn’t necessarily use the same revision strategies from essay to essay. A student who had a ton of edits on one day for essay 1 might have fewer edits spread out over more days for essay 2, as evidenced by the not-terribly-strong correlations in the statistics between essay 1 and essay 2. To take one example, “days with > 0 edits” on essay 1 was correlated with “days with > 0 edits” on essay 2 at just 0.21 (cell M7). Some of these differences were still statistically significant at p=0.05 (a good enough p for a blog, imo):

  • Students who did > 2000 total edits on essay 1 had an average of 3428 total edits on essay 2. Students who did <= 2000 total edits on essay 1 had an average of 1650 total edits on essay 2.
  • Students who did > 50% of their edits for essay 1 on the due date did an average of 45% of their edits for essay 2 on the due date. Students who did <= 50% of edits on essay 1 on the due date did an average of 17% of their edits for essay 2 on the due date.

Anyway, because it seemed prudent to consider the strategies used on each essay rather than the strategies used by each student, I made a second spreadsheet where the individual essays rather than the students (who each wrote 2 essays) are the subject of comparison, resulting in this much-easier-to-read correlations table:

Here I treat each essay as a unique data point rather than 2 products of the same student.

Columns I and J (or rows 9 and 10) are probably the most interesting to other writing teachers: those hold the correlations between statistics derived from Draftback data and I) final draft scores and J) change in score between the rough draft and final draft. In plain English, the correlations here suggest:

  • As expected, % of edits on class days and % of edits on the due date are negatively correlated with the final grade for the essay. That is, people who did a lot of their edits in class or right before turning in the essay seemed to do worse (but not by much-neither produces statistically significant differences in FD grades or in improvement between RD and FD).
  • Total # of edits and max edits per day are both positively correlated with final grades (and with each other). Editing more tends to produce better essays.
  • Everything that is true for the final scores is also true for the change in scores between RD and FD. The fact that RDs were even more negatively correlated with % edits on class days and % edits on the due date than those values were with FDs mean that the changes appear to be positively correlated, but I take it as meaning that those strategies with an improvement from very bad RD scores to mildly bad FD scores.

To give a bit more detail, these were some statistically significant differences (p=0.05):

  • Students who did > 2000 total edits had an average grade of 86.8% on the FD. Students who did <= 2000 total edits had an average grade of 78.7% on the FD.
  • Students who did > 3000 total edits had an average grade improvement of 17.8% between the two drafts. Students who did <= 3000 total edits had an average grade improvement of 4.9%.
  • Students who did edits on > 3 days had an average grade of 84.8% on the FD. Students who did edits on <= 3 days had an average grade of 78.9%.
  • Students who did edits on > 5 days (that is, almost every day) had an average grade improvement of 33.6% between the two drafts. Students who did edits on <= 5 days had an average grade improvement of 5.8%.

The data suggests a fairly uncontroversial model of a good writing student – one who edits often, both in terms of sheer numbers of changes and in terms of frequency of editing sessions. In fact, “model student” rather than “model essay” may be what the data is really pointing at – the amount and timing of the work that went into a particular essay seems sometimes to show more about the student’s other work than it does about the quality of that essay.

For example, it’s not clear why data derived from the time period between RD and FD would be correlated with RD scores (in fact, you would expect some of the correlations to be negative, as high RD scores might tell a student that there is less need for editing), but perhaps the fact that the same data points that are correlated with FD scores are correlated in the same ways with RD and final course grades indicates that the data shows something durable about the students who display them (my caveat earlier notwithstanding). It is feasible that the poor work habits evidenced by editing a major paper a few hours before turning it in might affect students’ other grades more than that paper itself.

In fact, this seems to be the major lesson of this little research project. One t-test on % edits on due date was statistically significant – one that compared students’ final course grades. To be precise, students who did > 20% of their total edits on the due date had average course grades of 84.5%. Those who did <= 20% of their total edits on the due date had average course grades of 88.8%.

Just to pursue a hint where it appeared, I went back into my stat sheets for each class for the last year and copied the # of assignments with grade 0 (found on the “other stats” sheet) for each student into my big Google Sheet. Indeed, there was a statistically significant difference. That is, students who made > 20% of edits made on the day an essay was due got a score of 0 on 5% of assignments across the term, and students who made <= 20% of edits made on the day an essay was due got a score of 0 on 3.2% of assignments across the term.

Like many characteristics of “good students”, from growth mindset to integrative motivation, whether a pattern of behavior correlates with success and whether it is teachable are two almost unrelated questions. It doesn’t necessarily follow from this research that I should require evidence of editing every day or that I should move due dates forward or back. It does suggest that successful students are successful in many ways, and that editing essays often is one of those ways.

I might just want to tell my students that I really love the Google Docs “cleared comment” emails that I get on Monday morning and I wish I got them all weekend, too.

The Academic Support Catch-22

There is a pattern among formerly-known-as-remedial “academic support” classes that I’ve noticed that may work against their intended purpose.

The pattern is a result of the assumption that the subtext of planning and preparation in most assignments in college needs to be made text. That is, the assumptions of what needs to happen for a college student to be successful need to be made explicit and accounted for. For example, here is a representation of creative writing that I think gives a pretty accurate representation of the work that has to be done vs. what ends up on the page:

writing iceberg
Source.

Academic support often seems to work by taking all of those hidden parts of the writing process out in the open and making them graded assignments themselves. An assignment that in another class might look like this:

Write a research paper on a topic covered in this class. (100 pts)

might turn into a weeks-long writing unit like this:

  • Brainstorming discussion notes (classwork)
  • Research goal discussion: 5 pts
  • Mind map: 2 pts
  • Library scavenger hunt (classwork)
  • Works Cited and Plagiarism worksheet: 5 pts
  • Outline w/ annotated Works Cited page: 10 pts
  • Outline pair feedback (classwork)
  • Introduction in-class writing (not graded)
  • Rough draft 1: 10 pts
  • RD1 peer feedback (classwork)
  • RD1 tutoring visit reflection discussion: 5 pts
  • RD2: 20 pts
  • RD2 professor feedback reflection Flipgrid: 5 pts
  • RD2 office hours appointment: 2 pts
  • FD: 70 pts
  • FD writing process reflection discussion: 5 pts
  • Optional FD re-submission for makeup points
  • Optional FD re-submission for makeup points reflection
  • Optional FD re-submission for makeup points reflection2

Ok, the last two are jokes, but otherwise this writing process, where every step is explained, given its own rubric, shared, and reflected upon, is quite normal for a writing class that is coded “for English learners”, “academic support”, or just has a professor trying a more workshoppy approach.

This can be invaluable unless it sets too strong a precedent for explicit requirements of the writing process in students’ minds. Some students, particularly in ESL, may have no idea at all what the writing process is supposed to entail or how to use the resources like libraries, tutoring, etc. It’s better that at least one class during a college student’s first year puts this all on the record, but it might be counterproductive if too many do. It shouldn’t be lost on us that each step made explicit in the “academic support” writing process makes it resemble a typical college writing assignment less and less. If students expect these steps always to be explicitly outlined, they may neglect them or delay them on assignments where they are not.

The contrast between two types of assignments in my classes crystallize these concerns for me. The first type resembles the detailed, all-steps-accounted-for work flow above. I have 2 papers in a term whose writing processes basically fill all of the 2 or 3 weeks ahead of their final due dates with discussions, peer review, presentations, and pre-writing. The second type is an “all-term” assignment given the first week of class and due the last week, usually worth a significant amount of points but doable in a few hours with the right preparation. Examples of this type of assignment are “go to an on-campus event and take detailed notes” or “email a professor in the department you plan to major in and ask 3 questions”. Students tend to do the first type of assignment with the appropriate level of dedication, preparing them well for the big essays that come at the end of the two- or three-week unit. At the same time, they tend to leave the second type of assignment until the weekend before the last week of class, days before they are due, and often run into problems like not having campus events to go to on Presidents’ Day weekend (this post is a topical one). This tells me that, in my classes at least, the precedent of having all the “underwater part of the iceberg” work outlined in detail for some assignments results in the underwater part being ignored for others.

Another factor may be that, for the first type of assignment, students are all doing the same thing at the same time and know that avoiding embarrassment during a week’s worth of discussions and presentations depends on their doing their work. For the second, on the other hand, students may all go to different events, email different professors, etc. all at different times and never have to show their work to their classmates. Again though, it is not unusual for major assignments in other classes to be solitary affairs. The many reasons that students seem to neglect solitary assignments with implicit requirements on time and preparation only highlight the problems that that neglect causes.

I don’t really have a solution for the skewing of expectations that academic support seems to produce – I just verbally warn students that most of the steps in our writing process will need to be taken of their own volition in their History, Psychology or Accounting classes. Maybe I need to give points for reflecting on that warning.

COCA for translationists

(Corpus of Contemporary American English, alongside the other BYU corpora from Mark Davies)

For basically all my career, from my eikaiwa days Japanese university to community college to the IEP I teach at now, I’ve been trying to get my students to see vocabulary as more than lists of words with accompanying translations.

Image result for 英単語
Source

Sure, knowing one translation of “marry” is probably better than not knowing anything about “marry”, but it really just gets your foot in the door of knowing that word (and leaves you less able to enjoy semantically ambiguous sentences like “The judge married his son”). You still don’t have much of an idea of what kind of person uses that word, in what kind of situation, and (of special concern for fluency) what other words usually surround that word.

Part of what cramming for tests does to language learners (and really learners of anything) is convince them that the minimum amount of knowledge to be able to fill in the right bubble is efficient and expedient. One of the longest-running efforts of my career is trying to disabuse my students of the notion that when vocabulary is concerned, this kind of efficiency leads to anything worthwhile. To the contrary, the more seemingly extraneous information you have about any given word, the better you will remember it and the more fluently and accurately you will be able to use it.

(Naturally, the site where I first encountered this phenomenon was in Japan, where the question “What does that mean?” is almost incomprehensible except as a synonym for “Translate this into Japanese according to the translation list provided by your instructor”. But knowing a word and being able to use it (a dichotomy which collapses with any scrutiny) demands (again, a collapsed dichotomy being treated as a single subject) quite a lot more than an abstract token in a foreign language being linked to a more familiar token in one’s first language in memory. One can know that “regardless” “means” とにかく or 関係なくin Japanese without knowing what preposition usually follows it, which noun from “outcome”, “result”, or “upshot” most commonly follows that preposition, or that it has an even more academic ring than near-synonym “nonetheless” (which doesn’t have an accompanying preposition at all). Interestingly, overreliance on translation seems to be something of a vestigial trait of language education in Japan – people justify it for its utility on tests, but the tests themselves haven’t required translation in many years.)

Even when my students understand this, however, they still aren’t sure how to implement it. I get a lot of positive reactions to comparisons between chunks in English and in their first language (asking how many words a child hears in phrases like in “Idowanna”, やだ, 我不想 or je veux pas) or between words and animals (a lion can technically eat roast turkey, but what do lions usually eat?). Students readily identify chunks and idiomatic expressions that they hear outside of class (“Would you like to” and “got it” are some of the most-noticed). In the run-up to a vocabulary quiz though, where I want students to show all that they know about vocabulary, what I see most often on students’ desks is the familiar lists of translated pairs:

regardless 而不管 however 然而 nonetheless 尽管如此 nevertheless 但是 notwithstanding 虽然

It seems that students, when they “study”, tend to default to the strategies that they think got them through high school. Usually, students who have this tendency also have familiar patterns of scoring on quizzes: fine-to-high scores on the cloze (fill-in-the-blank) questions and low scores on anything outside of the narrow range where translation is applicable. I see this as a result of not being able to see how to use this knowledge of other features of vocabulary in their customary mode of studying.

I started using COCA in class as a way to plug the fuzzy, often-neglected dimensions of vocabulary learning – in particular register, genre, colligation and collocation – into a behavioral pattern that students have completely mastered. That is, COCA is a way to make a more complete picture of vocabulary compatible my students’ most familiar way of studying – sitting at a desk and looking up discrete words.

With that long preamble over, let’s have a look at the specific activities I use over the course of a term.

First glance at COCA

Starting on the first day, words of particular interest are added to a class web site – either my own, Vocabulary.com, or Quizlet (I’ve tried quite a few) – and drawn on for review, activities, and quizzes. Starting in week two, I introduce the idea of chunks (which they need in order to complete the reading circles sheets from that week on), either with a presentation or less formally, for example with a quiz game.

In a shorter term, I’ll introduce COCA the same week, or in a longer semester, around week 4 (my IEP has lightning-quick 6-week terms). The introduction usually has to be done in the lab – it’s much better if each student can do his or her own searches. I alternate between a worksheet and a presentation for the first introduction. This takes about an hour.

From experience, students never fail to see the utility of COCA at this stage and never seem to have trouble with the idea of another online resource. The issues that typically arise on the first day are:

  1. COCA locks out searches from IP addresses if there are too many in one day (as in a class of 20 or so all using COCA for the first time in a lab). This usually starts to afflict my classes after the first 20 minutes or so of searches.
  2. At minimum, students have to create accounts after the first few searches, which used to require a .edu email address, but doesn’t seem to now.
  3. The use of spaces on COCA is idiosyncratic. A search for ban_nn* (without a space) will find intances of “ban” used as a noun, while ban _nn* (with a space) will find “ban” plus any noun, for example “ban treaty”, or hilariously, “ban ki-moon”. ban* (without space) will find any word starting with “ban”, and ban * (with space) will find “ban” plus any word or punctuation mark. Punctuation needs to be separated with spaces as well. These rules trip up students fairly early on, as they search for, for example due to the fact that* and don’t find what they expect.

Weekly activities

After the first introduction, COCA will be in at least one homework or classwork assignment every week.

Classwork

From time to time, but especially before quizzes, students do a jigsaw-style group activity I call vocabulary circles. As you can see, a good half of it is COCA-derived. If you don’t know how these usually work, students with different jobs are assigned one word per group, share them with “experts” who had the same job from other groups, reconvene and share them with their own group, and then have to take turns presenting all their group’s work to their classmates.

Reading

COCA searches are a part of many of the reading circles sheets I use (reading circles are the only way I do any intensive reading in class). Vocabulary specialists (or whatever you call them) are always responsible for chunks as a category of vocabulary as well as collocations for other words.

Discussions

Starting the week that COCA is introduced, weekly “Vocabulary Logs” on Canvas include COCA work like that reproduced below:

This week, you must use COCA to find something interesting about a word from our class vocabulary list. You must find these 3 things:

What other words usually come before and after that word?
Who usually uses that word? (For example, lawyers, academic writers, news anchors, etc.)
Which forms of the word are the most common? (For example, “present simple”, “plural”, “adverb”, etc.)

You get 6 points for answering all of these questions.
Then, in a reply, use a classmate’s word in a new example sentence that you make. This section will be graded on correctness, so read your classmate’s post carefully. (2 pts)

Or this option to take translationism head-on:

This week, you will compare a word from another language (for example, your first language) to a word in English. The words should be translations of each other.
You will point out how the two words are similar or different in these areas:

Collocation: Do the same or similar kinds of words come before or after the words?
Grammar: Are the words the same part of speech? Are the rules for the parts of speech different in the two languages?
Register: Do the words appear in the same kinds of situations? Are they similar in formality?
Meaning: Do the words have second or third meanings that are different?

This post is worth 6 points. Reply to a classmate for 1 more point.

Quizzes

The quizzes in my classes after COCA has been introduced all have some explicitly COCA-derived questions and some questions that are graded on COCA-relevant considerations.

In questions like the one below, “grammar” includes part of speech and colligation.

Use the word in a sentence that makes the meaning clear. (1 pt for grammar and 1 pt for clear meaning)
(sustainable) _____________________________________________________________

Some questions target collocations specifically (ones that have been discussed specifically in class):

Circle the most common collocation. (1 pt each)
A difficult environment can precipitate ( fights / conflict / argument ).
Adaptation ( onto / to / with )  a new culture takes time.

Other questions target the colligations of vocabulary that should be familiar for other reasons:

Fill in the blank with one of the following. (1 pt each)
Regardless of Owing to Because Also
_______________________ the waiter made a mistake with our order, our meal was free. _______________________, the chef sent us a free dessert. Lucky us!

Students cannot have COCA open during the quiz, but they can (and are advised to) get to know the words inside and out beforehand. As you may have seen, our vocabulary lists can grow fairly long by the end of the term, but words often appear on more than one quiz.

Essays

See my last post on the subject.

I am getting on board the “reflection as revision” train – grading reflection on grammar instead of grammatical accuracy on all drafts besides the first. COCA is the vehicle I use for this.

Conclusions

I presented this to you as a way to get students with an unhealthy focus on one-to-one translation to think about vocabulary in a way that better facilitates real-world use. Actually, it works even better with students predisposed to think of vocabulary in more holistic terms – but those students would often be fairly good learners just with enough input. The advantage of using COCA is that it can easily piggyback on habits that certain students may overuse – many of my students have browser extensions on their computers that translate any word the mouse hovers over. Adding one more dictionary-like tool that includes what dictionaries miss is a way to swim with that tendency rather than against it.