Can computers really mark exams? Benefits of ELT automated assessments

ɫèAV Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At ɫèAV, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

ɫèAV English International Certificate (PEIC)

ɫèAV English International Certificate (PEIC) also uses automated assessment technology. With a two-hour test available on-demand to take at home or at school (or at a secure test center). Using a combination of advanced speech recognition and exam grading technology and the expertise of professional ELT exam markers worldwide, our patented software can measure English language ability.

Read more about the use of AI in our learning and testing here, or if you're wondering which English test is right for your students make sure to check out our post 'Which exam is right for my students?'.

More blogs from ɫèAV

  • A teacher stood by a long wooden desk where her students are sat smiling at her

    What’s it like to teach English in France?

    By Steffanie Zazulak
    Reading time: 3 minutes

    Kirsty Murray taught English for a year at a collège (the French equivalent of a secondary school) in Villers-Cotterêts: a town in the north of France known for being the birthplace of Alexandre Dumas. She taught mixed-ability groups of 11- to 16-year-olds, with classes ranging in size from 10 to 35 students. Here, she shares the five lessons she learned from the experience.

  • A teacher helping a teenage student working at her desk in a library

    How teachers can use the GSE for professional development

    By Fajarudin Akbar
    Reading time: 4.5 minutes

    As English teachers, we’re usually the ones helping others grow. We guide learners through challenges, celebrate their progress and push them to reach new heights. But what about our own growth? How do we, as educators, continue to develop and refine our practice?

    The Global Scale of English (GSE) is often seen as a tool for assessing students. However, in my experience, it can also be a powerful guide for teachers who want to become more intentional, reflective, and confident in their teaching. Here's how the GSE has helped me in my own journey as an English teacher and how it can support yours too.

    About the GSE

    The GSE is a proficiency scale developed by ɫèAV. It measures English ability across four skills – listening, speaking, reading and writing – on a scale from 10 to 90. It’s aligned with the CEFR but offers more detailed learning objectives, which can be incredibly useful in diverse teaching contexts.

    I first encountered the GSE while exploring ways to better personalize learning objectives in my Business English classes. As a teacher in a non-formal education setting in Indonesia, I often work with students who don’t fit neatly into one CEFR level. I needed something more precise, more flexible, and more connected to real classroom practice. That’s when the GSE became a turning point.

    Reflecting on our teaching practice

    The GSE helped me pause and reflect. I started reading through the learning objectives and asking myself important questions. Were my lessons really aligned with what learners at this level needed? Was I challenging them just enough or too much?

    By using the GSE as a mirror, I began to see areas where I could improve. For example, I realized that, although I was confident teaching speaking skills, I wasn’t always giving enough attention to writing development. The GSE didn’t judge me. It simply showed me where I could grow.

    Planning with purpose

    One of the best things about the GSE is that it brings clarity to lesson planning. Instead of guessing whether an activity is suitable for a student’s level, I now check the GSE objectives. If I know a learner is at GSE 50 in speaking, I can design a role-play that matches that level of complexity. If another learner is at GSE 60, I can challenge them with more open-ended tasks.

    Planning becomes easier and more purposeful. I don’t just create lessons, I design learning experiences that truly meet students where they are.

    Collaborating with other teachers

    The GSE has also become a shared language for collaboration. When I run workshops or peer mentoring sessions, I often invite teachers to explore the GSE Toolkit together. We look at learning objectives, discuss how they apply to our learners, and brainstorm ways to adapt materials.

    These sessions are not just about theory: they’re energizing. Teachers leave with new ideas, renewed motivation and a clearer sense of how to bring their teaching to the next level.

    Getting started with the GSE

    If you’re curious about how to start using the GSE for your own growth, here are a few simple steps:

    • Visit the GSE Teacher Toolkit and explore the learning objectives for the skills and levels you teach.
    • Choose one or two objectives that resonate with you and reflect on whether your current lessons address them.
    • Try adapting a familiar activity to better align with a specific GSE range.
    • Use the GSE when planning peer observations or professional learning communities. It gives your discussions a clear focus.

    Case study from my classroom

    I once had a private Business English student preparing for a job interview. Her speaking skills were solid – around GSE 55 – but her writing was more limited, probably around GSE 45. Instead of giving her the same tasks across both skills, I personalized the lesson.

    For speaking, we practiced mock interviews using complex questions. For writing, I supported her with guided sentence frames for email writing. By targeting her actual levels, not just a general CEFR level, she improved faster and felt more confident.

    That experience reminded me that when we teach with clarity, learners respond with progress.

    Challenges and solutions

    Of course, using the GSE can feel overwhelming at first. There are many descriptors, and it can take time to get familiar with the scale. My advice is to start small: focus on one skill or one level. Also, use the Toolkit as a companion, not a checklist.

    Another challenge is integrating the GSE into existing materials, and this is where technology can help. I often use AI tools like ChatGPT to adjust or rewrite tasks so they better match specific GSE levels. This saves time and makes differentiation easier.

    Teachers deserve development too

    Teaching is a lifelong journey. The GSE doesn’t just support our students, it also supports us. It helps us reflect, plan, and collaborate more meaningfully. Most of all, it reminds us that our growth as teachers is just as important as the progress of our learners.

    If you’re looking for a simple, practical, and inspiring way to guide your professional development, give the GSE a try. It helped me grow, and I believe it can help you too.

    Additional resources

  • A woman sat on a sofa with a tv controller

    Five great film scenes that can help improve your English

    By Steffanie Zazulak

    Watching films can be a great way for people to learn English. We all have our favourite movie moments and, even as passive viewers, they're probably teaching you more than you realise. Here's a selection of our favourite scenes, along with the reasons why they're educational as well as entertaining.