Like it or not, ChatGPT and other forms of generative conversational AI are here to stay. Last weekend, John Naughton, writing in the Guardian, compared ChatGPT to Excel*, noting that “[Excel] went from being an intriguing but useful augmentation of human capabilities to being a mundane accessory”.
It would never occur to current educators to forbid the use of Excel in Universities. Indeed, its use is not only expected and tolerated but even, in certain circumstances, actively encouraged. Using it allows the development of other skills. For instance, spending less time on calculations, frees up time to focus on data interpretation, reflecting on limitations (e.g., means vs median), exploring how better to communicate the findings, and so on. Plus, knowing how to use Excel helps with student employability, in those cases where familiarity with this tool is valued by employers.
Likewise, while I am worried about the consequences of this technology for several aspects of public life, in my view, rather than fighting the use of generative AI, we (educators) need to incorporate it in the classroom, in order to:
- develop students’ literacy about these tools, so that they don’t make bad decisions based on answers produced by them – much like we did with Wikipedia; and
- improve students’ employability skills because, let’s face, students these days face a very challenging job market, and AI creates a very significant threat for their future careers.
With these goals in mind, I have been reflecting on how ChatGPT might impact how I teach and assess students in the module that I am teaching this semester: Research Methods. Here are my thoughts, so far. They are a bit crude and they will evolve (i.e., change) as I learn more about the tool, see more applications, and discuss experiences and concerns with others. Consider this as my starting point and, please, of share your thoughts, experiences and suggestions with me.
What is ChatGPT, and generative conversational AI?
ChatGTP is a chatbot powered by AI, and, for the time being, is free. It generates text in response to prompts (for instance, a question or an essay title) that is very convincing. Check, for instance, this experiment that Dave Chaffey, using ChatGPT to define digital marketing strategy, and develop and operationalise a marketing plan. However, while the answers may be useful and very convincing, they also contain various inaccuracies.
To understand the type of answers that ChatGPT is likely to produce, it is important to understand how it works. ChatGPT uses a type of AI called a “Large Language Model” (LLM). The paper “Talking About Large Language Models” authored by Murray Shanahan, offers a very readable and accessible introduction to LLMs.
Shanahan starts by emphasising that, while LLMs’ capabilities increasingly “resemble those of humans… those systems work in ways that are fundamentally different from the way humans work… But it is a serious mistake to unreflectingly apply to AI systems the same intuitions that we deploy in our dealings with (other humans), especially when those systems are so profoundly different from humans in their underlying operation.” (Page 1).
Shanahan goes on to explain that:
LLMs are generative mathematical models of the statistical distribution of tokens in the vast public corpus of human-generated text, where the tokens in question include words, parts of words, or individual characters including punctuation marks. They are generative because we can sample from them, which means we can ask them questions. But the questions are of the following very specific kind. “Here’s a fragment of text. Tell me how this fragment might go on. According to your model of the statistics of human language, what words are likely to come next?”
It is very important to bear in mind that this is what large language models really do. Suppose we give an LLM the prompt “The first person to walk on the Moon was ”, and suppose it responds with “Neil Armstrong”. What are we really asking here? In an important sense, we are not really asking who was the first person to walk on the Moon. What we are really asking the model is the following question: Given the statistical distribution of words in the vast public corpus of (English) text, what words are most likely to follow the sequence “The first person to walk on the Moon was ”? A good reply to this question is “Neil Armstrong”. (Page 2)
In addition to answering questions / completing statements, LLMs can also be asked to “summarise news articles, to generate screenplays, to solve logic puzzles, and to translate between languages, among other things”. (page 4).
However, in every instance the LLM is using the same mechanism: producing sentences that are the statically most likely continuation of the previous one (including the prompt).
This is very different from “knowing” the answer. As Shanahan says:
“knowing that the word “Burundi” is likely to succeed the words “The country to the south of Rwanda is” is not the same as knowing that Burundi is to the south of Rwanda. To confuse those two things is to make a profound category mistake.” (Page 5)
This is because “the LLM itself has no access to any external reality against which its words might be measured, nor the means to apply any other external criteria of truth, such as agreement with other language-users”.
In summary, LLMs, including ChatGPT, can easily generate sentences that “look good one after the other”, and get continuously better at it. However, they can’t judge whether the content of those sentences are correct.
So, the next question is:
Will students use ChatGPT to help them with their coursework?
With ChatGPT becoming so popular, and performing so well at generating text that is very readable, I think that we should assume that students will use this tool to help them with their coursework. And, in a way, I actually hope that they do. As I said, when invited to comment on a LinkedIn discussion on this topic:
Will students use this technology? I hope they do. Just like I hope they will use google search, or YouTube or even wikipedia to familiarise themselves with a new topic; or use excel to perform calculations and find trends in data… I hope that they do because they need to learn how to use this technology; and, most importantly, so that they can understand the tool’s limitations (the sooner they do so, the more future / bigger mistakes they avoid) and the sooner they can start devoting their energy to the skills that they need to succeed in a workplace “powered by AI”
We will need to adapt the coursework and to teach students about the strengths and weaknesses of this tool. The learning will be in reflecting how the results were produced and critiquing them. Just like we had to adapt to Wikipedia as a popular and widely available source of information; or Excel became a popular tool to perform calculations. As employees, if they are going to be at the receiving end of documents produced by ChatGPT, they need to have the confidence and ability to question them.
We will need to have discussions about plagiarism and ethics, of course. But also about judging the quality of the replies produced by ChatGPTL, because the answers currently being produced by ChatGPT read well but are full of inaccuracies and flawed reasoning. Students need to learn to identify flawed reasoning, check for accuracy of data, etc… – which will be great preparation for the work place (Just like it was great preparation to talk about Wikipedia and its flaws; or to question what a particular number or trend in Excel is really telling us).
It will be a huge challenge for overworked faculty who were just begging to get back on their feet after the disruption caused by Covid-19! But will need to be done.
Which leads us to the next question:
What do we need to change in how we teach and assess business students?
As I am preparing to start a new semester, where I will be teaching a new module, this question has been at the front of my mind. I have been mulling over the opportunities and challenges presented by ChatGPT, and this is where I am at, so far.
First, the opportunity: ChatGPT is a quick and (for the time being) free mechanism of producing examples to discuss in class. For instance, I can ask ChatGPT:
- For examples of application of theory A or B – and ask students to spot the flaws
- To make up an advert for product X or job Y – and ask students to assess how well that might appeal to certain functional vs hedonic customer needs or job applicants’ intrinsic vs extrinsic motivations
- To discuss the consequences of AI for industry Z – and ask students what might be missing from the list, or how organisations should prepare for that.
There are even a number of very interesting applications emerging, where ChatGPT is paired up with specific tools or platforms. For instance, there is now a Chrome extension for YouTube that allows you to use ChatGPT to generate text summaries of any video.
We could use this tool to generate transcripts to analyse in class to, for instance, look for differences in approaches by different types of influencers (e.g., Health vs Travel vs Study…). Or B2B vs B2C advertising. Or official sources vs conspiracy theorists. Or how the discourse around a topic (e.g., sustainability) has been changing over time. Or… so many applications!
Second, the challenge: it will be tempting for students to use ChatGPT as input for their coursework and, as a result, being caught for academic misconduct (i.e., cheating, plagiarism, etc…)**. The key here is to choose assessments for which there is little or no advantage in using “historical knowledge”. For instance:
- We should design assessment that focuses on the application of knowledge to very niche scenarios about which there isn’t much – or anything – written. For example, in my modules, I usually use live case studies from micro-businesses, or entrepreneurs, or grass-root organisations.
- We could also ask students to apply knowledge to very novel scenarios. E.g., no more asking about the impact of Covid-19 on X. Choose something that happened in the last 6-12 months***. Not only there won’t be much written about it, but also it will probably no be directly relevant.
- If we are asking students to apply knowledge to a mainstream issue, or letting them choose their own examples, then we need to limit the resources that students can draw on to a small number of very recent sources****.
Finally, the breakthrough: The key skill that we need to develop in our students (as far as generative AI is concerned) is literacy about the strengths and limitations of ChatGPT. Students need to become familiar with the type of output produced by ChatGPT, so that they can look for red flags, ask the right questions and, eventually, detect when someone tries to use it on them. Moreover, students need to learn how to use it to their advantage. Thus, we can design assessments where students are actually prompted to use ChatGPT. They would also be required to critique the answer produced by the tool. Moreover, they would need to suggest improvements to the answer based on specific resources discussed in the class (very recent ones, as discussed above****).
Ethan R Mollick and Lilach Mollick discuss exactly this type of use of ChatGPT, in their paper “New Modes of Learning Enabled by AI Chatbots: Three Methods and Assignments” and published in SSRN. In one of their examples, they asked ChatGPT to provide three examples of application of a specific concept, and then asked students to detect the problems in the examples generated. In another example, they asked ChatGPT to describe a process, and then told students to look for exceptions to the process described in the answer, or for steps missing.
In summary, ChatGPT is here to stay. Like other technologies, while ChatGPT in itself is neither good nor bad, it certainly will not be neutral. These are some of my thoughts so far about what this all means for how we teach and assess university students.
If you are an educator, what adjustments are you making? If you are a student or employer, what do you feel the need for?
*Thank you to Robin Croft for bringing it to my attention
**Detections tools are already starting to emerge – e.g., GPTzero.
***I appreciate that choosing a very recent scenario presents a challenge, in light of the long quality assessment cycles in many universities. In one University I worked at, we had to have ready, by end of July, the brief for the assessment that students would need to submit in the following January.
**** The corpus used to train ChatGPT is from 2021, but other tools will continue to show up, which may use more recent databases.