AI-native assessment · Human-first teaching

Your students are using AI. Are you assessing the skill — or just the output?

MentorplAI turns every writing task into a longitudinal signal: how students work with AI, how their judgment develops across tasks, and how reliably they evaluate each other's work.

Try it now → How it works ↓

MentorplAI — Longitudinal Dashboard

Student	Task 1	Task 2	Task 3
A. Müller	82Agency 28%	85Agency 31%	89Agency 25%
B. Okonkwo	71Agency 55%	69Agency 52%	78Agency 47%
C. Nakamura	76Agency 61%	61Agency 87%	58Agency 91%
D. Patel	54Agency 33%	68Agency 29%	75Agency 27%
E. Johansson	80Agency 22%	—	—

Three problems AI made harder to ignore

These existed before AI. They're just more visible now.

✗

Output quality isn't the same as skill

A polished submission can come from genuine skill or from minimal engagement with a capable model. Grading the output alone doesn't tell you which.

⚠

The process is invisible

Without seeing how a student works — what choices they make, whether their thinking develops — it's hard to give useful feedback or spot who needs help.

Assessment hasn't kept up

Working effectively with AI is a professional skill. Assessing as though AI doesn't exist doesn't prepare students for what they're already doing.

How it works

One coherent loop per task. Each loop adds to the picture.

Lecturer sets up the task

Create a module, write a task prompt, and add quality baselines — sample responses at each level from naive to professional. The AI can generate them from your prompt. These baselines calibrate the grading system.

Students define their working principles

Before writing, each student authors a personal skill — a short set of DOs and DON'Ts capturing how they intend to work with AI on this task. This makes their approach explicit and gives them something to reflect on across tasks.

AI generates a draft — the student shapes it

Their skill prompts a personalised draft. Students write and edit from there in a structured studio. How much they diverge from the draft becomes the Agency signal — a measure of how much they shaped the output beyond the starting point.

Peers compare submissions through structured judgement

Students compare pairs of submissions against five dimensions. Comparing directly is more reliable than scoring on a scale — it's the same principle used to evaluate AI models. Each evaluator's reliability is measured alongside the submissions they judge.

Two scores, one grade — and a longitudinal picture

Each task produces a grade from submission quality (65%) and evaluator reliability (35%). Across tasks, lecturers see how grades and agency evolve — patterns that a single snapshot cannot reveal.

Three signals, read together

No single number is the story. Their combination — and how they change across tasks — is where the insight is.

Submission quality

A quality level derived from pairwise peer comparison, calibrated against the lecturer's quality baselines. Rising scores across tasks suggest real development.

Agency

How much the student shaped and steered the output beyond the AI starting draft. Persistently low agency across tasks may signal limited engagement — though context always matters.

Evaluator quality

How consistently and accurately a student judges peers' work — agreement with consensus, calibration against known quality anchors, and informativeness of their choices.

Built on established ground

Each design choice is grounded in something that already works.

⚖️

Comparative judgement

Comparing two things directly is more reliable than scoring them on an abstract scale. This is how AI models are evaluated at scale — MentorplAI applies the same principle to student work.

📝

Explicit working principles

Having students articulate and revise their own DOs and DON'Ts creates a record of how their approach to AI develops across tasks — and surfaces misalignments between what they say they'll do and what they produce.

👥

Peer assessment

Evaluating others' work develops critical judgment that producing work alone doesn't. Done systematically, peer evaluation signals both the quality of the work reviewed and the quality of the reviewer.

📈

Longitudinal observation

A single data point is ambiguous. Patterns across multiple tasks are meaningful. Tracking how a student behaves over time reveals what a snapshot assessment cannot.

What lecturers actually get

Signals to inform teaching — not replace it.

🔭

See patterns across your whole class

The longitudinal dashboard shows every student across every task — grade trajectory, agency score, and evaluator reliability — in one view. Spot who's improving and who needs a conversation.

💬

Targeted feedback to specific students

Deliver personalised written feedback — with attachments or links — directly to a student, visible only to them in their results. High-signal conversations, not bulk annotation.

🚀

Live test runs — no pre-registration needed

Share a QR code in class. Students join, write, and evaluate each other's work in minutes. Ranked grades on the spot. Works without setting up a full module first.

What this is — and what it isn't

Being clear about limits is part of using any tool well.

✓ What MentorplAI is

✓
A structured environment that makes how students work with AI visible — not just what they produce.
✓
A longitudinal record that helps lecturers have better, more specific conversations with students.
✓
An assessment design that treats AI collaboration as a skill to develop — one that can be observed across tasks.
✓
A peer evaluation framework that develops critical judgment, not just produces a grade.

✗ What MentorplAI is not

✗
An AI detector. Signals are for lecturers to interpret with context — not automatic verdicts.
✗
A surveillance tool. The system surfaces patterns for teaching decisions — it doesn't replace professional judgment.
✗
Foolproof. Students can game any system. What changes here is the cost: shallow work produces consistent patterns across tasks that are harder to sustain convincingly.
✗
A replacement for good pedagogy. The system works best when used as a starting point for teaching — not an endpoint for evaluation.