Blog

Where Tech Meets Teaching: Custom AI for Exam Assessment

Education has come a long way. Classrooms are more connected, content is more interactive, and students are learning in ways that weren’t even possible a few decades ago. But behind the innovation in digital learning, some things remain the same, especially the pressure on educators.
From managing large classes to ensuring fair, consistent assessment, today’s instructors face an ever-growing list of challenges. The stakes are high, the work is complex, and too often, the tools simply aren’t designed with real educational settings in mind.

That’s where one professor at the University of Toronto found himself - managing a high volume of oral exams with no scalable, standardized way to grade them. So, he partnered with our team at Roca Mindhub and together to build a custom AI-powered platform that supports educators.

This case study explores how thoughtful custom software development reshaped the assessment process - and what it could mean for the future of scalable, fair, and supportive evaluation in higher education.

When Assessment Becomes the Obstacle

Oral exams have clear pedagogical benefits: they test for critical thinking, real-time reasoning, and conceptual understanding. But they also introduce a handful of issues that make them difficult to scale or standardize.
Our collaboration began with a deceptively simple question: How can we create fairer, faster, and more insightful assessments, without losing the human touch?
In tackling this, we focused on three systemic pain points in oral exams:

  • Subjectivity in assessment: Even within structured systems like the Canadian grading framework, oral exams often leave room for interpretation. Two instructors might evaluate the same student response differently - not because of content, but due to tone, confidence, or unconscious bias. These human factors make it difficult to ensure consistency across classrooms, undermining the fairness and reliability of student evaluation.
  • Time-consuming and logistically heavy: Scheduling and running dozens of oral exams takes hours of valuable time. In larger courses, that multiplies fast. Teaching assistants and professors spend more time on logistics than meaningful engagement.
  • Lack of constructive feedback: Oral assessments often leave no record. Unlike written exams, there’s no way to review what was said, how it was delivered, or why a student may have struggled. That makes it harder to offer constructive feedback - and even harder to improve future teaching practice.

The Solution: A Custom AI-Powered Assessment Tool

From day one, our goal was to support educators, not replace them. We weren’t building a generic AI tool or off-the-shelf educational software. This was a custom-built platform, designed to align with the course curriculum, the needs of teaching assistants, and the specific grading practices in place - all while accounting for the human factors that shape real classroom dynamics.

The result was an AI-driven assessment system for oral exams that could analyze student responses in real time. It didn’t just evaluate what students said, but how they said it - factoring in tone, pacing, and expression. By combining computer vision, natural language processing (NLP), and machine learning, the platform helped create a more consistent and supportive student evaluation process.

Facial Expressions Analysis

The platform used AI models to detect general emotional cues during student responses - aspects like visible stress, confusion, or low engagement. Importantly, it didn’t try to decode specific micro-expressions. Instead, it flagged patterns that might warrant a second look, such as:

  • Noticeable shifts in facial expression or posture
  • Signs of hesitation or visible discomfort
  • Periods of reduced focus or attention

These signals didn’t penalize students, but helped the system offer support. If the platform picked up on signs of discomfort, it will serve a follow-up question to give the student a second chance to demonstrate knowledge more clearly.

Tone and Cadence

We also analyzed vocal delivery using AI-driven speech tools. Why? Because delivery can distort how performance is assessed, especially in high-pressure situations. The system looked for patterns such as:

  • Long pauses, which might indicate confusion or stress
  • Overly fast delivery, often tied to anxiety or memorization
  • Repetition or filler, suggesting low understanding

Instead of letting these elements impact grades, the platform used them as signals. When something seemed off, it triggered follow-up questions from the curriculum, giving students a chance to explain further or clarify their thinking.

Measuring Conceptual Depth

The core of the assessment still came down to “Did the student understand the material?.  The platform used NLP to analyze content depth and identify whether students were truly engaging with key concepts. It looked for:

  • Reference to core course ideas
  • Conceptual explanations vs. memorized definitions
  • Signs of guessing, vague phrasing, or off-topic responses

Just as with the other dimensions, if a student’s answer seemed shallow or rehearsed, the system tested for deeper understanding - pulling from a curated, course-specific question bank.

With the help of the AI system, TAs could focus on outlier responses - such as highly polished answers that might signal over-preparation, or unclear ones showing confusion or disengagement. This allowed them to spot patterns, better understand student performance, and apply their judgment where it mattered most. Rather than replacing the human element, the platform supported it, helping educators assess more fairly, consistently, and meaningfully.

Benefits for Educators and Students

  • Increased Objectivity and Confidence in Grading: Applying one clear set of rules to both what students say and how they say it takes much of the guesswork out of grading. TAs and instructors say they now score with more confidence, because every mark ties back to transparent criteria.
  • Time Saved Without Losing Quality: With routine checks and follow-up prompts handled automatically, teaching staff spent less time on each exam, being able to better focus on offering, mentoring, and updating the curriculum.
  • A More Supportive Learning Environment: Students appreciated the chance to respond more than once,  especially those prone to exam anxiety. They also benefited from feedback based on clear criteria, which helped improve motivation, trust, and ultimately, student performance.

What’s Next?

The system continues to evolve by learning from ongoing feedback and usage data, with machine learning algorithms steadily improving their ability to identify patterns that indicate strong or weak student understanding. Future updates may include multilingual support to accommodate diverse learners, real-time transcription to enhance accessibility, and integration with learning management systems (LMS) for seamless use across various educational environments.

Looking beyond oral exams, this AI-driven assessment approach has broad potential applications. Because it emphasizes evaluating performance, behavior, and conceptual mastery, the platform can be adapted for interviews, language proficiency testing, remote certification, or any scenario that requires scalable, nuanced assessment of human understanding.

Where AI Supports, Not Replaces

This project wasn’t just about building AI software, it was about enhancing what educators do. It shows that technology doesn’t replace human judgment; it works alongside it. Together, they can take on real challenges like scale, fairness, and deeper insight into how students learn.

As education continues to evolve, the tools we build must evolve with it. With the right collaboration between people and technology, better learning outcomes aren’t just possible - they’re within reach.