Opinion AI

Has ChatGPT signalled the end of assessment as we know it?

27th March 2023

Authors

Dr Seun Kolade

Associate Professor, Leicester Castle Business School

You have heard about ChatGPT. Of course you have. Since its launch in November 2022 by the San Francisco-based Open AI, the state-of-the-art chatbot has captured the public’s imagination, and caused quite a stir. It reached 1 million subscribers within five days of its launch: an unprecedented feat that took the likes of Instagram and Spotify months to achieve. With approximately 175 billion parameters at its command, ChatGPT is one of the largest and most powerful natural language processing AI models available, with vast and versatile capabilities surpassing previous chatbot models. Basically, you can ask ChatGPT to explain things, and you can ask the super Chatbot to explain it in styles. For example, I have just asked ChatGPT to explain the theory of relativity to a five-year-old, and it took less than 10 seconds to produce a really good response. How cool is that?

You can also ask ChatGPT to generate an academic essay in response to a typical assessment brief. Now, that’s a different level of “cool”. It is no wonder that, for stakeholders in the higher education sector, reactions to ChatGPT’s capabilities have quickly shifted from excitement to sheer consternation. For several decades, higher education providers have grappled with the intractable problem of essay mills, exacerbated by the ubiquity of the internet. For a fee, essay contractors produce “original” essays that escape the detection of plagiarism tools like Turnitin. This is problematic enough, but ChatGPT has effectively exacerbated this problem a million-fold. Now, with a simple click, students are able to generate original essays in response to assessment briefs, at practically no cost. ChatGPT is cheating made easy. It has sounded the death knell of assessment as we know it. Or has it?

In a recent study, my colleagues and I conducted a quasi-experiment to probe further. We recruited two other participants and fed a typical assessment brief into ChatGPT from five unique user accounts. Account 1 generated a content that returned a 4% similarity, an excellent outcome for originality. We repeated the instruction on Account 1 a further five times for essays 2 to 6 respectively. Those five essays returned very high similarity scores, between 86% and 99%. That was because Turnitin picked off the similarities from Essay 1, already in its records. So we repeated the instruction from four other unique accounts and devices. The subsequent results for essays 7 to 10 returned 18, 19, 24 and 17% similarities respectively- again showing good outcomes for originality. In fact, much of the similarities detected came from the text of the assessment brief! It is a lecturer’s nightmare, but does it have to be?

Things have been moving quite rapidly in the world of artificial intelligence, and HE providers cannot afford to play catch up. In the wake of the reactions that greeted the launch of ChatGPT, Open AI launched a classifier to distinguish between AI-written and human-written text. However, its capabilities are limited as it offers only probabilistic indicators such as “likely written” or “unlikely written”. And so far, at the time of writing, it has only been able to correctly identify 26% of AI-written text as “likely AI-written”. Another ChatGPT detection tool developed by a Princeton University student, GPTZero, has shown very good promise as a detection tool.

What is clear is that artificial intelligence will not go away in a hurry. In a way, it is refocusing minds on the limitations of current assessment models that appear to prioritise knowledge testing over competence assessment and performance evaluation. It is necessary to test the ability of students to aggregate and synthesise existing ideas in the process of new knowledge generation. However, in a dynamic, fast-paced knowledge economy, it is important to go beyond the know what (knowledge testing) to know how (competence assessment) and show how (performance evaluation). This is where AI tools can be co-opted as allies, rather than resisted as the enemy of learning and assessment.

Imagine a world in which AI tools can be harnessed for otherwise time and resource intensive summative assessments, thereby freeing up valuable time to focus on more bespoke, one-to-one formative feedback that is best suited for human tutors. Imagine the adoption of computerised adaptive tests for competence assessments and the deployment of computer serious games for immersive learning that simulates real-life situations and offer endless, iterative opportunities for feedback. Yes, AI is the “frenemy” that wields a double-edged sword. With one edge it offers so much value; with the other it portends known risks that should be controlled and kept at bay.