Can AI be used to cheat on programming tests?
Missed a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.
Plagiarism is not limited to essays. Programming plagiarism – where a developer intentionally copies code without acknowledgment – is an increasing trend. According to a New York Times article, at Brown University, more than half of the 49 allegations of academic code violations in 2016 involved computer science fraud. At Stanford, as many as 20% of students in a single computer science course in 2015 were flagged for possible cheating, the same piece reports.
Measure of Software Similarity, or MOSS, has been one of the most popular systems for detecting plagiarism in software since its development in 1994. MOSS can analyze code in a range of languages, including C, C++, and Java, automatically pairing programs with similar code and marking individual passages in programs that resemble the same.
But a new one study believes that freely available AI systems can be used to complete introductory level programming tasks without activating MOSS. In a paper co-authored by researchers at Booz Allen Hamilton and EleutherAI, a language model called GPT-J was used to generate code “without specific clues that future plagiarism detection techniques could use to try to identify algorithmically generated code.”
“The main purpose of the paper was to contextualize the fact that GPT-J can resolve introductory computer science exercises into a realistic plagiarism threat model in an educational setting,” Stella Biderman, an AI researcher at Booz Allen Hamilton and co-author of the study, VentureBeat told me via email. “[Our] findings showed that a student with access to GPT-J and very minimal knowledge of computer science can deliver introductory level assignments without activating MOSS.”
Biderman and Edward Raff — the other co-author — had GPT-J answer questions that required it to code programs that could create conversion tables from miles to kilometers, calculate a person’s BMI, given weight and height, and more. GPT-J made minor errors that in most cases needed to be corrected, but these errors often required no programming beyond the ability to run code and search the Internet for error codes.
While Biderman has found no evidence that GPT-J is in fact used to cheat on commands, the work raises questions about whether it (or similar tools) can be abused in professional coding testing. Many technology companies rely on internal or external exams to assess the knowledge of software tenants. Depending on the design, these could – at least in theory – be susceptible to AI-generated code.
“MOSS was developed long before things like GPT were possible, but this illustrates the importance of understanding how digital tools evolve over time to introduce new risks and constraints,” Biderman added.
Rick Brownlow, the CEO and co-founder of Geektastic, a tech review platform, says he hasn’t seen any evidence of plagiarism by a test taker using AI. He notes that for most companies, a coding test is just one part of a hiring process. Candidates are generally expected to be able to explain their solutions in a way that reveals if they were dishonest about their programming skills.
“[O]your plagiarism tools will pick up when someone has copied some or all of another solution, [even spotting] when someone has covered up some of the copied code to avoid detection. If – and this is a big if – AI could write a ‘good’ solution to one of our take home challenges and this was original (i.e. not dragged the solution off the web and copied), then this is just as hard to spot as someone using their developer friend from Google to help,” Brownlow told VentureBeat. “I think when we get to a point where AI solves the home coding challenges, we’ll be at the point where you don’t hire software engineers anymore.”
Jake Hoffner, CEO of Qualified.io, says his company also detects cheating based on aspects such as “lack of coding effort (e.g., copy and paste, minimal editing)” and recommends that clients run candidates through their code. But he sees a future where AI changes the nature of programming assessments and shifts the focus from actual coding to code management skills.
Indeed, emerging AI-powered suggestion and review tools promise to lower development costs while allowing programmers to focus on less repetitive tasks. At the Build developer conference in May 2021, Microsoft outlined a feature in Power Apps that uses OpenAI’s GPT-3 language model to help people choose formulas. OpenAI’s Codex system, which powers GitHub’s Copilot service, can represent entire lines of code. Intel’s ControlFlag can automatically detect coding errors. And Facebook’s TransCoder converts code from one programming language to another.
“[At] the point that AI is starting to write more quality code, the industry as a whole is starting to move towards developers…. driving machines to write code, but less involvement in the actual coding,” Hoffner said.[T]the need for code to be involved is starting to take a backseat to many of the “reinventing the wheel” tasks developers still perform today, such as building a mobile app that retrieves and writes data. Coders move from these common tasks to things that are less defined and new. These are areas where there won’t be enough existing code for AI systems to learn from, so coders will still have to run it — and these are the tasks we’ll be testing in the assessment area.”
Nis Frome, GM at coding challenge and tutorial platform Coderbyte, says he sees less risk in AI being used to cheat on coding exams than employers”[sacrificing] great candidate experiences for honest candidates.” Too much focus on fraud prevention usually comes at the expense of recruitment and selection, he says, resulting in candidates being rejected.
a 2022 survey from CoderPad and CodinGame puts the problem in focus. Nearly half of recruiters cite finding qualified developers as their biggest challenge, with 39% claiming they have now expanded their candidate base to include developers with a non-academic background – up from 23% in 2021.
“We see countless techniques for cheating, from sending the rating to someone else to copying answers online. We have no doubt that candidates have tried GPT-J or . to use co-pilot when taking code reviews on Coderbyte,” Frome told VentureBeat via email.[But] cheating will always be a cat and mouse game… Chances are if most of your candidates are cheating, you have a sourcing problem! Maybe you need more senior candidates and shouldn’t be posting vacancies on university job boards. The solution is not to create an authoritarian and annoying experience for all candidates.”
Biderman points out that police integrity, whether AI or not, is not a new venture. In the same vein as Hoffner’s prediction, the advent of easy-to-use code-generating AI could simply require re-evaluations where debugging tasks are performed with AI-generated solutions, she says.
“We can still teach students the important computer science skills they need and find new uses for [AI]. These structural changes could deliver better results to reduce plagiarism and shortcuts, while paving the way for a future where more AI-powered development tools are in the hands of a larger group of users,” Biderman added. This also helps us prepare for a possible future where AI and machine learning may be able to do more than just introductory level assignments, and we should be preparing for that.”
VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more