Kenna Barrett Explores Turing’s “Test” and Automated Essay Evaluation

Kenna's Duck
What does Alan Turing’s famous thought experiment have to do with writing essays?

On Wednesday, November 13th, 2013, Kenna Barrett, (PhD candidate, English) delivered a talk exploring the possible relationship between Alan Turing’s commonly known “Turing Test” and Automated Essay Evaluation (AEE). Throughout what Barrett named an “interdisciplinary, work-in-progress” she explored parallels between the Turing Test’s questionable ability to produce human-like responses and AEE’s controversial abilities to “score” the writing of humans.

According to Barrett, the computer, as a metaphor of the mind, has long been a “pervasive trope in cognitive science,” and there seems to be a growing desire to compare inner workings of the two. With the Turing Test and AEE, there is an effort to design a computer program that can mimic the mind. Typically, in AEE a certain set of “trins and proxes” (characteristics/features of writing) is first established by humans to evaluate writing. A computer program is then created as a model based on that human scoring, and the model can then be applied to a training set of writing samples. Any score applied using an AEE program is a combination of a human score and a computer predicted score; however, Barrett reminds us that the outcome is  connected to a score statistically, but not “meaningfully.” The AEE computer programs cannot read or score for meaning. Barrett explained that the Turing Test—initially designed to ask “can a machine think?”—works in a similar fashion; mimicking human intelligence, the program was designed to “fool” other humans into thinking its responses to questions were in fact human.

After explaining in detail how the Turing Test has been used to inform the evolution of automated essay scoring over time, Barrett went on to pose another question: Does AEE really pass a Turing Test? In other words, when a computer program is designed to mimic human reading and essay scoring, will the eventual evaluation look like it came from a human? Will we be able to know the difference? While Barrett noted that only further analysis of every essay and every score would reveal whether an AEE system could pass the Turing Test, this led her to another question. Even if machines are Turing-intelligent, are they acceptable?

Barrett again reminded us that AEE systems, regardless of intelligence and or acceptability, do not understand meaning and they do not react to language—“they merely count it.”

Despite AEE’s lack of the capacity to make meaning from texts in the same way that humans can, Barrett informed her audience that regardless of how we (literature scholars, writers, and writing teachers) feel about automated essay scoring or evaluation, it already plays a large role in the world of academe. AEE systems are regularly used for scoring both the GMAT and the GRE and other uses are sure to develop. Additionally, there are other programs that are growing in popularity due to varying customization options.

So, if AEE is the reality, Barrett asks, what will educators do when it does work (because it will), and what further issues or complications will that present?

Thanks to Barrett and her framing of possible Turing Test and AEE intersections, those of us fascinated with the world of writing assessment will be more informed as we explore those issues and discover new ways to manage them.

Leave a Reply