AI-native assessment · Human-first teaching

Your students are using AI. Are you assessing the skill — or just the output?

MentorplAI turns every writing task into a longitudinal signal: how students work with AI, how their judgment develops across tasks, and how reliably they evaluate each other's work.

MentorplAI · Live Demo
Task Setup
Task Prompt
Write a research proposal that clearly states your hypothesis, methodology and expected impact on the field.
Quality Baselines
Naive — one vague sentence AI ✓
Basic — idea, no structure AI ✓
Proficient — clear sections AI ✓
Expert — cited, rigorous AI ✓
Expert Baseline · Preview
This study investigates the relationship between peer feedback quality and skill development in AI-assisted writing. Drawing on comparative judgement methodology, we hypothesize that structured evaluation practice correlates with measurable gains across consecutive tasks…
🌐 EN · PT
Publish Task →
✓ DO
Explore multiple angles before choosing a direction
Write my own intro before asking AI to draft
Check every claim against the source material
✗ DON'T
Accept the first draft without editing for tone
Submit without reading the full text aloud
↑ Evolving from Task 1
Added: "check every claim against sources"
AI Draft
This research investigates the relationship between feedback frequency and academic writing quality. The study will employ a mixed-methods approach, combining quantitative analysis of pre/post assessments with qualitative thematic analysis…
Your Version ✎
Our study questions a core assumption in writing pedagogy: that more feedback automatically improves quality. Using longitudinal data from three cohorts, we test whether structured peer comparison outperforms frequency-based models…
Agency
67% diverged from draft
Dimension: Argumentation 3 of 5 comparisons
Submission A
Our research builds on Smith's framework to argue that peer comparison creates more durable assessment literacy than traditional rubric-based scoring approaches…
A is stronger →
Submission B
The evidence across three studies shows consistent gains when students evaluate before they produce. Evaluation is not merely summative — it is generative…
← B is stronger
Student Task 1 Task 2 Task 3 Agency Trend
A. Silva 78 82 89 25%
B. Santos 71 68 75 47%
C. Oliveira 76 61 58 91% ↑
D. Ferreira 54 68 75 27%
Enunciado da Tarefa
Escreva uma proposta de pesquisa que apresente claramente sua hipótese, metodologia e impacto esperado para a área.
Baselines de Qualidade
Ingênuo — uma frase vaga IA ✓
Básico — ideia sem estrutura IA ✓
Proficiente — seções claras IA ✓
Especialista — citado, rigoroso IA ✓
Baseline Especialista · Pré-visualização
Este estudo investiga a relação entre qualidade do feedback por pares e desenvolvimento de habilidades na escrita com IA. Com base na metodologia de julgamento comparativo, hipotetizamos que a prática estruturada de avaliação se correlaciona com ganhos mensuráveis entre tarefas consecutivas…
🌐 EN · PT
Publicar Tarefa →
✓ FAZER
Explorar várias perspectivas antes de escolher uma direção
Escrever minha própria introdução antes de pedir um rascunho à IA
Verificar cada afirmação com o material-fonte
✗ NÃO FAZER
Aceitar o primeiro rascunho sem ajustar o tom
Entregar sem ler o texto completo em voz alta
↑ Evoluindo desde a Tarefa 1
Adicionado: "verificar cada afirmação com as fontes"
Rascunho da IA
Esta pesquisa investiga a relação entre frequência de feedback e qualidade da escrita acadêmica. O estudo empregará uma abordagem de métodos mistos, combinando análise quantitativa de avaliações pré/pós com análise qualitativa temática…
Sua Versão ✎
Nosso estudo questiona uma premissa central da pedagogia da escrita: que mais feedback melhora automaticamente a qualidade. Usando dados longitudinais de três coortes, testamos se a comparação por pares estruturada supera os modelos baseados em frequência…
Agência
67% divergiu do rascunho
Dimensão: Argumentação 3 de 5 comparações
Submissão A
Nossa pesquisa parte do framework de Smith para argumentar que a comparação por pares cria uma literacia avaliativa mais duradoura do que as abordagens tradicionais baseadas em rubricas…
A é mais forte →
Submissão B
As evidências de três estudos mostram ganhos consistentes quando os estudantes avaliam antes de produzir. Avaliar não é apenas somativo — é generativo…
← B é mais forte
Estudante Tarefa 1 Tarefa 2 Tarefa 3 Agência Tendência
A. Silva 78 82 89 25%
B. Santos 71 68 75 47%
C. Oliveira 76 61 58 91% ↑
D. Ferreira 54 68 75 27%

Most assessment measures what was produced.
MentorplAI measures how students work — and whether that improves.

Three problems AI made harder to ignore

These existed before AI. They're just more visible now.

Output quality isn't the same as skill

A polished submission can come from genuine skill or from minimal engagement with a capable model. Grading the output alone doesn't tell you which.

The process is invisible

Without seeing how a student works — what choices they make, whether their thinking develops — it's hard to give useful feedback or spot who needs help.

?

Assessment hasn't kept up

Working effectively with AI is a professional skill. Assessing as though AI doesn't exist doesn't prepare students for what they're already doing.

How it works

One coherent loop per task. Each loop adds to the picture.

1

Lecturer sets up the task

Create a module, write a task prompt, and add quality baselines — sample responses at each level from naive to professional. The AI can generate them from your prompt. These baselines calibrate the grading system.

2

Students define their working principles

Before writing, each student authors a personal skill — a short set of DOs and DON'Ts capturing how they intend to work with AI on this task. This makes their approach explicit and gives them something to reflect on across tasks.

3

AI generates a draft — the student shapes it

Their skill prompts a personalised draft. Students write and edit from there in a structured studio. How much they diverge from the draft becomes the Agency signal — a measure of how much they shaped the output beyond the starting point.

4

Peers compare submissions through structured judgement

Students compare pairs of submissions against five dimensions. Comparing directly is more reliable than scoring on a scale — it's the same principle used to evaluate AI models. Each evaluator's reliability is measured alongside the submissions they judge.

5

Two scores, one grade — and a longitudinal picture

Each task produces a grade from submission quality (65%) and evaluator reliability (35%). Across tasks, lecturers see how grades and agency evolve — patterns that a single snapshot cannot reveal.

Three signals, read together

No single number is the story. Their combination — and how they change across tasks — is where the insight is.

Q

Submission quality

A quality level derived from pairwise peer comparison, calibrated against the lecturer's quality baselines. Rising scores across tasks suggest real development.

Ag

Agency

How much the student shaped and steered the output beyond the AI starting draft. Persistently low agency across tasks may signal limited engagement — though context always matters.

EQ

Evaluator quality

How consistently and accurately a student judges peers' work — agreement with consensus, calibration against known quality anchors, and informativeness of their choices.

Built on established ground

Each design choice is grounded in something that already works.

⚖️

Comparative judgement

Comparing two things directly is more reliable than scoring them on an abstract scale. This is how AI models are evaluated at scale — MentorplAI applies the same principle to student work.

📝

Explicit working principles

Having students articulate and revise their own DOs and DON'Ts creates a record of how their approach to AI develops across tasks — and surfaces misalignments between what they say they'll do and what they produce.

👥

Peer assessment

Evaluating others' work develops critical judgment that producing work alone doesn't. Done systematically, peer evaluation signals both the quality of the work reviewed and the quality of the reviewer.

📈

Longitudinal observation

A single data point is ambiguous. Patterns across multiple tasks are meaningful. Tracking how a student behaves over time reveals what a snapshot assessment cannot.

What lecturers actually get

Signals to inform teaching — not replace it.

🔭

See patterns across your whole class

The longitudinal dashboard shows every student across every task — grade trajectory, agency score, and evaluator reliability — in one view. Spot who's improving and who needs a conversation.

💬

Targeted feedback to specific students

Deliver personalised written feedback — with attachments or links — directly to a student, visible only to them in their results. High-signal conversations, not bulk annotation.

🚀

Live test runs — no pre-registration needed

Share a QR code in class. Students join, write, and evaluate each other's work in minutes. Ranked grades on the spot. Works without setting up a full module first.

What this is — and what it isn't

Being clear about limits is part of using any tool well.

What MentorplAI is

  • A structured environment that makes how students work with AI visible — not just what they produce.

  • A longitudinal record that helps lecturers have better, more specific conversations with students.

  • An assessment design that treats AI collaboration as a skill to develop — one that can be observed across tasks.

  • A peer evaluation framework that develops critical judgment, not just produces a grade.

What MentorplAI is not

  • An AI detector. Signals are for lecturers to interpret with context — not automatic verdicts.

  • A surveillance tool. The system surfaces patterns for teaching decisions — it doesn't replace professional judgment.

  • Foolproof. Students can game any system. What changes here is the cost: shallow work produces consistent patterns across tasks that are harder to sustain convincingly.

  • A replacement for good pedagogy. The system works best when used as a starting point for teaching — not an endpoint for evaluation.

"I haven't seen any system designed to teach students how to work with AI — and that's what this is."

— Assessment researcher, education technology

See your class differently.
Teach at your best.

MentorplAI is live and being used in courses today. Try it with your next assignment.