LLM's as Samizdat Jackson
And do we want to be in the teaching business or the grading business?
Some years back, early in my teaching career, I heard a story about Eric Cornell at Colorado giving an exam in a 400-student intro class, where he warned them before the test “Don’t even think about copying answers, because there are four different versions of the exam.” On the day of the test, they were printed on four different colors of paper, so dishonest students gleefully copied answers from other students with the same color test.
Of course, the color of the paper was not in any way correlated with the version of the test, so after the exam he brought a huge number of students up on charges of cheating. Which led to a bunch of failures and expulsions and all that fun stuff.
Several years later I found myself at the same table with Eric for dinner at a conference, and asked him about it. He confirmed that he had, in fact, done the four-colors-of-paper thing (the test version was indicated by an easy-to-miss symbol in one corner of the page), and that it had been pretty amusing, but also a huge mess. He also said something that struck me as odd at the time, but that I’ve since come into alignment with: “I used to get really worked up about that kind of thing, but I’ve mellowed since then. I wouldn’t do that stunt today.”
I was thinking about that because the last several days have seen a lot of wailing and gnashing of teeth on academic social media, presumably because semester schools are in finals now, about Large Language Model “AI” systems being the End Of Academia As We Know It. Faculty are reporting enormous piles of LLM-generated papers and code, and lamenting that nothing works right any more.
And I find myself curiously unperturbed by a lot of this. A small bit of this is a kind of calendar-related dissociation (we’re on a trimester system and so have four more weeks of class before finals, and am Not Thinking About It), but I think my lack of reaction is a combination of a form of the mellowing Cornell mentioned and a kind of disciplinary “Been there, done that…”
The core problem posed by LLMs is, after all, essentially the same thing we’ve had to deal with in STEM courses for decades. Exams and homework problems in physics and other science fields have a single correct answer, whish means that the high-scoring end of the distribution of papers will all look more or less the same. That’s true even on a smaller scale than a lecture-hall course with Scantron tests— there are only so many ways to apply Newton’s Laws, and math is pretty universal. All the correct solutions are the same, the incorrect ones come with infinite variety.
Moreover, the solutions are well known to a huge number of people, making it really hard to contain. You can dress this up to some degree with weird framing stories and odd notation so that solutions copied from other sources stand out, but too much of that risks tipping your physics exam into a reading comprehension test. And as you move up the curriculum, the number of problems with analytical solutions dwindles rapidly— for quantum physics at the junior level there are maybe a dozen problems you can solve with pencil and paper, full stop. The set of those that are good as homework or exam problems is even smaller.
As a result, inauthentic solutions from shady sources have been a bugaboo for physics courses basically forever, at every level. Thirty-odd years ago when I was in grad school, there were samizdat copies of a solution manual for all the problems in J.D. Jackson’s E&M book circulating, and textbook publishers have always been churning out new editions that shuffle the order and wording of the end-of-chapter problems. The desk copies I used to get every few years would come with a concordance— a document saying which problems in the new book corresponded to which ones in the old book, for faculty who wanted to reproduce past homework assignments.
When these were all on paper it was bad enough, but everything is computer now, and those solution manuals that were the stuff of legends when I was a student are a quick Google search away these days. I’m not above using them myself—when I teach a new course out of a trade textbook, I do a search for “[Author] [Title] problem solutions” and generally get something in the first page of results that I can use as a sanity check for my own solutions.
In a sense, the LLM situation in non-STEM disciplines is just the tech catching up to where Google got us to twenty-mumble years ago. The dead-tree samizdat solutions to Jackson were always there, but not a huge problem because they were kind of cumbersome. In the same way, paying somebody else to write a History paper has always been an option, but until very recently it was enough of a hassle that it was never too prevalent. LLM slop papers, though, are more like Google-able PDFs of the Jackson solutions, in that they’re now available with minimal effort to basically anybody dishonest enough to use them.
The problem of how to deal with students whose goal is to pass with minimal effort is universal; it’s also, as they say in Math, previously solved. The solution is to do the same set of things we’ve been doing in intro STEM courses forever: in-person exams with tightly limited resources allowed to students. Some of my colleagues will go as far as specifying what model of calculator students can use, and making them stack coats and bags and phones at the front of the room. A few flatly forbid bathroom breaks during tests, though for me that’s a couple of bridges too far.
(A colleague once had a student come up to him before an exam in the pre-med course and tell him they had seen another student stash a copy of the textbook in the bathroom. He went there, found the book and removed it, then waited to see who would ask to go halfway through the test, but nobody did…)
But at some point, for me anyway, the mellowing Cornell mentioned starts to kick in. The turning point for me was a good number of years ago now, when we were still putting together common exams for intro courses with multiple sections, when a colleague absolutely insisted that none of the questions could be recycled from past years. “The frats have banks of exam questions from old tests, you know…” was the reason they gave.
Somewhere around there, we cross a line where I start to find the whole business silly and paranoid. I mean, I’m not sure that the effort involved in memorizing, recognizing, and recalling the solutions to specific exam questions from previous years is meaningfully different from, you know, studying for the exam. To the extent that it’s an actual problem, it’s easily defeated by relatively minor changes: increase or decrease parameters by 25%, permute the order of the multiple-choice options, etc.. There’s no reason not to re-use a good question from a past year, especially given how hard it is to come up with good questions in the first place.
And having tapped that keg of worms, over time I found myself wondering “What are we really doing here, anyway?” At a kind of 30,000-foot level, there are two different processes in play when running a course: providing resources and opportunities for students who want to learn, and hunting down and punishing those who don’t, pour encourager les autres, as it were. Given that faculty energy and attention are finite resources, there’s a bit of a trade-off between these: time spent inventing novel ways to catch cheaters is time that’s not spent doing things to help the students who aren’t cheating to learn more.
Which is kind of a roundabout way of saying “I’ve mellowed since then.” On reflection, I’ve come to think it’s more important to direct my limited resources toward providing opportunities to those who will take advantage of them than toward punishing those who choose to squander the opportunities they’re offered. That doesn’t mean it’s a complete free-fire zone when it comes to assignments in my classes— I’ll still make students clear their desks before taking an exam, and look skeptically at papers that use terms or methods I’m pretty sure the student doesn’t actually understand— but there are diminishing returns to precautions beyond weeding out the really crude and obvious forms of cheating. I’m giving the same exam in two shifts this term, for example, because I think the extra time afforded by giving the exam in the lab period is more important than avoiding the risk of somebody in the early lab tipping somebody in the late lab off about what’s on the test.
This LLM moment, then, may be a time to reflect on what, exactly, we regard as the top priority of higher education. Is the primary goal providing education to those who are interested in becoming educated, or are we primarily concerned with the credentialing and gate-keeping functions of academia? If the latter concern dominates, then it’s going to require some extreme efforts to LLM-proof syllabi: in-person oral or hand-written exams rather than papers written outside of class, tight requirements on and careful scrutiny of bibliographies, etc. It’s probably impossible to force disinterested students into putting in the effort to actually learn (but then, it never was), but with effort most of them can be steered off the path of least resistance.
The other approach would be to focus more on the good students, and worry less about the bad. Put more time and effort into helping the ones who want to learn to do awesome things, and less into chasing after the ones who aren’t interested. There are some moderate-effort things you can do to limit the worst uses of cheating technology, and those are absolutely worth it, but trying to catch everyone looking for a shortcut can easily suck up energy that ought to go instead toward nurturing the students who are interested and willing to put in genuine effort, to learn the subject for real rather than merely learning to fake it.
So, I think the real question presented to faculty is “Why are we assigning the work we’re assigning?” If we’re asking students to do particular tasks because doing them is essential to the process of learning, then we should continue to do them even in a world with readily available “AI,” but also make clear to the students that that’s what they’re for. The ones who are interested in the opportunity to learn will still do the work, if they know that’s what it takes.
On the other hand, if we’re assigning particular tasks primarily to have a basis for assigning grades, that’s a different issue, and it’s probably worth examining how much we want to bein the grading business rather than the teaching business. A good deal of the fretting over LLMs as the death knell for higher education are fundamentally rooted in the same kind of credentialism that many of the fretters denounce in our students. I’m not sure that relationship has gotten as much thought as it deserves, and if there’s any silver lining to this cloudy moment, it might be as a spur to thinking about that issue.
There’s some of the inconclusive noodling you’ve come to expect from me regarding crises in higher education. I’d keep poking at this, but I have a couple dozen student project videos to review, so can’t spare the time. If you want to see whether I come back to it later, here’s a button:
And if you want to argue the other side or share funny/appalling stories about cheating, the comments will be open:
The question is interesting. Students are going to be a mix. Sure, they want to learn. But they also want to party. Or even, just have good grades b/c good grades > bad grades.
Therefore, even in a benign environment, incentives and trades off guarantee some interest in short cuts.
But, in a cut throat environment, where better grades from a better school can alter your entire career arc... then yes of course cheating is absolutely a strategy some players (sorry, students) will consider.
And if said careers carry risks to the outside world (medicine, avionics engineers), it seems fair to me that companies relying on the credentials provided by the academic system not be deceived when a student says he's in the top 5% of his Harvard class...
LLMs are really good tools. We should be teaching students how to make best use of them, not trying to prevent their use. We should also teach how to get things done when the internet is not available and not pretending that isn't always the case.