Find opportunities that are right for you to continue your education outside your home country.

About

Mission Blog Donate Apply

For Applicants

Programs Institutions Applicants' Stories Resources Templates Guides Admission Support

For Educators

Collaborate with us

Find opportunities that are right for you to continue your education outside your home country.

How to use AI chatbots in education?

May 05, 2025

Discover how students and universities are using ChatGPT and AI chatbots. Learn about academic risks, AI hallucinations, ethical use, and low-cost alternatives like Deep Seek.

According to a recent study, 92% of British students use AI chatbots in their final papers, which experts and journalists describe as "explosive growth." It's no surprise, because such tech tools are becoming increasingly affordable. Even if you don't have the money for a paid version of Chat GPT, you can use low-cost alternatives such as the Chinese service Deep Seek, which is of comparable quality to Chat GTP.

How are universities addressing the dramatic shift in technology and changing student habits? How can they be used in educational settings with integrity? What other applications can generative AI chatbots have in education besides paper writing? Let's answer these questions in our article.

How chatbots work

You've probably met a lot of people who believe that AI chatbots are like a magic wand that can answer any question with a single wave. This perception is far from accurate. Chat GPT and other chatbots are not supercomputers that can remember everything at once. However, they are clearly capable of analysing massive amounts of data in real time. So, how do chatbots work? In a nutshell, you send a query, and the AI generates text based on how it predicts each subsequent word will be correct in a given context.

AI-powered chatbots combine machine learning, natural language processing, and massive amounts of training data. When you type a message, the bot attempts to understand exactly what you meant by analysing meaning, intent, context, and keywords. It then activates a language model (similar to GPT) that "invents" an answer on the fly rather than selecting one from a set of templates. It accomplishes this by predicting which word should come next in the chain based on what it has seen previously in the texts on which it has been trained. That is, it essentially continues the text based on a probability estimate of the next word in the answer. The models are trained on a wide variety of texts, including books, articles, websites, and dialogues, and they learn to recognise language patterns such as how words combine, what sentence structures exist, and how topics and thoughts develop. Chatbots do not "understand" us in the same way that humans do, but they can mimic understanding thanks to a large database of examples of how people speak and write.

One simple example from a media publication:

Suppose you ask a bot, "What is the capital of France?". The chatbot first determines what the conversation is about:

The main topic is "capital" and the location is "France".

- The word "is" indicates that you are asking about the current situation, rather than ancient cities that may have previously served as the capital of a region of modern France. The machine learning model operates in this manner with all words analysed. This is what allows it to complete the context and recognize that you are asking about the capital of the state rather than capitals as material resources.

What does all of this mean in practical terms? This means that chatbot results must be fact-checked. As the same Chat GPT points out in the chat window, the system is capable of making mistakes. You bet, because the Internet contains a wealth of information, much of which cannot be considered reliable. This is especially relevant if you are a student who produces high-quality academic work. While models can produce impressively accurate results, numerous studies have found significant shortcomings in academic settings. Here are some of the results of such tests, which cover the entire range of possible errors: factual inaccuracies, poor reasoning, misunderstanding of context, "hallucinations" of information, and the ethical issues associated with biased outcomes.

Mistakes with chatbots in academia

Researchers regularly evaluate Chat GPT performance on formal assignments ranging from medical, law, business, and even school subject tests. Frequently, the results have been average, and sometimes outright failures.

One of the first studies on questions for medical students found that the chatbot's answers on the United States Medical Licensing Examination (USMLE) were roughly 60% accurate - in areas where passing all three steps of the exam was required. However, ChatGPT failed more specialised medical tests. For example, ChatGPT-3.5 scored only 65.1% (and ChatGPT-4 scored 62.4%) on the 2021-2022 Gastroenterology exam, falling short of the 70% pass rate. Similarly, in ophthalmology, ChatGPT performed poorly on two simulated exams. The tests allowed experts to warn students not to rely on chatbot results in high-stakes situations such as final exams.

Chat GPT also failed the final bar exams and MBA exams, with the latter being plagued by errors in basic arithmetic calculations despite "amazing" results on some theoretical questions.

ChatGPT can fail even elementary school exams. In one experiment, ChatGPT was given actual questions from Singapore's sixth-grade final exam. The AI "failed," with an average of only 16 out of 100 in math and 21 out of 100 in science. The bot performed poorly on any questions involving charts or graphs (receiving zero points), and it even incorrectly completed simple text-based arithmetic tasks. To be fair, retests conducted later may reveal correct answers to questions that the bot previously failed.

In addition to multiple-choice exams, AI chatbots have been tested on open-ended academic tasks such as essays, with often disappointing results. Teachers have noted that, while Chat GPT can write coherent texts, the quality of the content can be low, lacking the analysis and critical thinking required of academic writing.

A history professor conducted one telling experiment, asking ChatGPT to write an essay on the nineteenth-century abolitionist Frederick Douglass. The essay was grammatically correct and appropriate; it was essentially a basic biographical overview; however, it was extremely superficial: "the outlines were there, but the details were missing." Most importantly, the essay "lacked arguments or references" to back up its facts. In other words, it contained no thesis statement or analysis, only a general summary. When graded according to the instructor's rubric, the ChatGPT essay received a D (satisfactory). There were no obvious lies or invented facts in this essay, but it was an example of rote writing.

Students who read and critiqued the AI-generated essay identified its flaws. It was factually correct in outline and did not contain obvious spurious details, but it lacked depth, critical argumentation, and original thought, all of which are necessary components of academic work. Thus, Chat GPT's work was more likely to feature high school-level texts than what is expected of university students.

"Hallucinations" of artificial intelligence

The most well-known failures of AI chatbots in academia are cases of "hallucinations," in which a bot confidently fabricates data or sources. Chat GPT's ability to generate plausible-sounding but inaccurate or nonexistent facts is well documented.

One of the most serious issues is the fabrication of citations and references. Test results show that Chat GPT can generate a large number of convincingly fake scientific citations. These phantom citations frequently resemble real ones, including the names of actual researchers and properly formatted journal titles, volumes, page numbers, and DOIs. Closer inspection reveals that the cited study or article does not exist. Such an error can easily mislead students or researchers who aren't particularly diligent about checking sources.

In addition to citations, chatbots can "hallucinate" accurate information on any topic. When asked about a historical event or scientific theory it is unfamiliar with, it can provide a plausible explanation or quote.

The examples provided above demonstrate that artificial intelligence chatbots frequently fail to meet academic standards. They may fail exams, provide incorrect answers, write poor essays, or fabricate information. More importantly, "hallucinations" are not random errors; they are the direct result of the model's text generation. If the model is unable to recall a quote or a specific detail, it will generate an answer that fits the pattern.

How do universities respond?

The introduction of AI chatbots in academia has raised significant ethical and faculty concerns. Essentially, when used unfairly, it becomes a modern form of plagiarism or cheating, which the educational system must now address. Additionally, there are numerous concerns about AI's bias and objectivity. When the models are trained on large amounts of internet data, they may inadvertently reproduce biased or offensive content. In academia, for example, a chatbot may provide a biased interpretation of historical events.

Academic leaders are reviewing academic standards policies, issuing guidelines for the use of artificial intelligence, and changing course and exam rules. The reactions range from strict bans to actively incorporating AI into the learning process.

Many universities have updated their academic integrity or plagiarism policies to explicitly include AI. The policies now state that using AI without authorization or declaration of this fact constitutes academic misconduct. For example, the University of Sydney's Academic Integrity Policy now states that it is possible to use AI assistants, but with the permission of the lecturer and an open declaration of this (i.e. links to the chatbot's information).

In a similar effort to increase transparency, Oxford now requires students to "provide clear evidence of how [AI] was used in preparing work for the exam." This is consistent with Oxford's general anti-plagiarism policy, which requires students to explicitly state any additional assistance of this nature.

Universities are implementing new practices to "keep a finger on the pulse" of the situation. Duke University in the United States has formed a an organisation to keep institutional policies regarding the use of AI in courses up to date. Such committees monitor AI developments and recommend policy changes to keep the rules up to date as AI tools evolve. In general, policy revision aims to maintain academic integrity by treating AI misuse as equivalent to other types of cheating. At the same time, it is critical to leave room for legitimate use of technology, including attribution and operational principles.

Many administrators believe it is beneficial to educate students and employees on how to use AI effectively and ethically. In the United Kingdom, the Russell Group (an association of 24 major universities including Oxford and Cambridge) issued a joint statement in July 2023 outlining five guiding principles for the use of artificial intelligence in education. Similarly, in Australia, the Group of Eight Universities (Australia's leading research institutions) reaffirmed the principles for the ethical and responsible use of generative AI, committing to "enable students, academic and research staff to work productively, effectively, and ethically with generative AI".

Universities are issuing practical guidelines for students and faculty on how to handle artificial intelligence. According to a recent survey of the top 50 universities in the United States, 94% have issued faculty-focused guidelines outlining the need to develop and disseminate AI policies for each specific course. These guidelines frequently encourage responsible use of AI and seek to educate students about what can and cannot be done. Harvard University is one example of this approach. In mid-2023, the Harvard administration sent an email advisory to all branches regarding the use of chatbots in education. The message stated, "The University supports responsible experimentation with generative AI tools."

The most notable changes at universities have been in the areas of grading and course exams. Concerned faculty members are experimenting with methods to make assignments more "AI-resistant." This has already resulted in the revival of traditional formats for face-to-face written examinations. In the United States, many universities have announced plans to reduce the number of take-home essays in favor of more face-to-face exams, either handwritten or oral. In addition to exams, some instructors are reworking assignments to make them more creative, applied, or personalised - tasks that AI finds difficult to complete correctly.

How can we use AI with integrity?

Chatbots can be a useful tool for students if used responsibly and in accordance with academic transparency guidelines. The key is to treat it as a tutor or assistant, rather than as a go-to strategy or shortcut for completing a task. Consider using a calculator in math: it's useful, but only when you already know what you're doing and when its application is appropriate for the task at hand.

Chat GPT allows students to break down complex literature into simpler outlines, explain complex concepts in simple language, and even create practice test questions to test their knowledge. A chat bot can make good stylistic edits to your text. You can request that it change the tone, style, bring simplicity, or, on the contrary, make the text more "loaded" with complex phrases. All of this will complement, rather than replace, your independent work. The greatest risk is that students attempt to make the chatbot think for them rather than using it to improve their own reasoning.

Experts remain cautiously optimistic that AI models will evolve and improve. Newer versions (of the same GPT-4) have already demonstrated higher exam scores in certain areas. However, the current consensus is that we should exercise caution when using AI in academia. Chatbots will remain somewhat imperfect assistants until they can reason, fact-check, and understand human-level context. The key is to use these tools responsibly while exploring the pitfalls. Understanding the reasons for Chat GPT's academic failure allows universities to better educate both students and AI systems, ensuring that technology is not a substitute for learning, but rather an aid to it.