AI-native assessment · Human-first teaching

Your students are using AI. Are you assessing the skill — or just the output?

MentorplAI turns every writing task into a longitudinal signal: how students work with AI, how their judgment develops across tasks, and how reliably they evaluate each other's work.

Try it now → How it works ↓

MentorplAI · Live Demo

Task Setup

Task Prompt

Write a research proposal that clearly states your hypothesis, methodology and expected impact on the field.

Quality Baselines

Naive — one vague sentence AI ✓

Basic — idea, no structure AI ✓

Proficient — clear sections AI ✓

Expert — cited, rigorous AI ✓

Expert Baseline · Preview

This study investigates the relationship between peer feedback quality and skill development in AI-assisted writing. Drawing on comparative judgement methodology, we hypothesize that structured evaluation practice correlates with measurable gains across consecutive tasks…

🌐 EN · PT

Publish Task →

✓ DO

Explore multiple angles before choosing a direction

Write my own intro before asking AI to draft

Check every claim against the source material

✗ DON'T

Accept the first draft without editing for tone

Submit without reading the full text aloud

↑ Evolving from Task 1

Added: "check every claim against sources"

AI Draft

This research investigates the relationship between feedback frequency and academic writing quality. The study will employ a mixed-methods approach, combining quantitative analysis of pre/post assessments with qualitative thematic analysis…

Your Version ✎

Our study questions a core assumption in writing pedagogy: that more feedback automatically improves quality. Using longitudinal data from three cohorts, we test whether structured peer comparison outperforms frequency-based models…

Agency

67% diverged from draft

Dimension: Argumentation 3 of 5 comparisons

Submission A

Our research builds on Smith's framework to argue that peer comparison creates more durable assessment literacy than traditional rubric-based scoring approaches…

A is stronger →

Submission B

The evidence across three studies shows consistent gains when students evaluate before they produce. Evaluation is not merely summative — it is generative…

← B is stronger

Student	Task 1	Task 2	Task 3	Agency
A. Silva	78	82	89	25%
B. Santos	71	68	75	47%
C. Oliveira ⚠	76	61	58	91% ↑
D. Ferreira	54	68	75	27%

Enunciado da Tarefa

Escreva uma proposta de pesquisa que apresente claramente sua hipótese, metodologia e impacto esperado para a área.

Baselines de Qualidade

Ingênuo — uma frase vaga IA ✓

Básico — ideia sem estrutura IA ✓

Proficiente — seções claras IA ✓

Especialista — citado, rigoroso IA ✓

Baseline Especialista · Pré-visualização

Este estudo investiga a relação entre qualidade do feedback por pares e desenvolvimento de habilidades na escrita com IA. Com base na metodologia de julgamento comparativo, hipotetizamos que a prática estruturada de avaliação se correlaciona com ganhos mensuráveis entre tarefas consecutivas…

🌐 EN · PT

Publicar Tarefa →

✓ FAZER

Explorar várias perspectivas antes de escolher uma direção

Escrever minha própria introdução antes de pedir um rascunho à IA

Verificar cada afirmação com o material-fonte

✗ NÃO FAZER

Aceitar o primeiro rascunho sem ajustar o tom

Entregar sem ler o texto completo em voz alta

↑ Evoluindo desde a Tarefa 1

Adicionado: "verificar cada afirmação com as fontes"

Rascunho da IA

Esta pesquisa investiga a relação entre frequência de feedback e qualidade da escrita acadêmica. O estudo empregará uma abordagem de métodos mistos, combinando análise quantitativa de avaliações pré/pós com análise qualitativa temática…

Sua Versão ✎

Nosso estudo questiona uma premissa central da pedagogia da escrita: que mais feedback melhora automaticamente a qualidade. Usando dados longitudinais de três coortes, testamos se a comparação por pares estruturada supera os modelos baseados em frequência…

Agência

67% divergiu do rascunho

Dimensão: Argumentação 3 de 5 comparações

Submissão A

Nossa pesquisa parte do framework de Smith para argumentar que a comparação por pares cria uma literacia avaliativa mais duradoura do que as abordagens tradicionais baseadas em rubricas…

A é mais forte →

Submissão B

As evidências de três estudos mostram ganhos consistentes quando os estudantes avaliam antes de produzir. Avaliar não é apenas somativo — é generativo…

← B é mais forte

Estudante	Tarefa 1	Tarefa 2	Tarefa 3	Agência
A. Silva	78	82	89	25%
B. Santos	71	68	75	47%
C. Oliveira ⚠	76	61	58	91% ↑
D. Ferreira	54	68	75	27%

How it works

One coherent loop per task. Each loop adds to the picture.

Lecturer sets up the task

Create a module, write a task prompt, and add quality baselines — sample responses at each level from naive to professional. The AI can generate them from your prompt. These baselines calibrate the grading system.

Students define their working principles

Before writing, each student authors a personal skill — a short set of DOs and DON'Ts capturing how they intend to work with AI on this task. This makes their approach explicit and gives them something to reflect on across tasks.

AI generates a draft — the student shapes it

Their skill prompts a personalised draft. Students write and edit from there in a structured studio. How much they diverge from the draft becomes the Agency signal — a measure of how much they shaped the output beyond the starting point.

Peers compare submissions through structured judgement

Students compare pairs of submissions against five dimensions. Comparing directly is more reliable than scoring on a scale — it's the same principle used to evaluate AI models. Each evaluator's reliability is measured alongside the submissions they judge.

Two scores, one grade — and a longitudinal picture

Each task produces a grade from submission quality (65%) and evaluator reliability (35%). Across tasks, lecturers see how grades and agency evolve — patterns that a single snapshot cannot reveal.

Built on established ground

Each design choice is grounded in something that already works.

⚖️

Comparative judgement

Comparing two things directly is more reliable than scoring them on an abstract scale. This is how AI models are evaluated at scale — MentorplAI applies the same principle to student work.

📝

Explicit working principles

Having students articulate and revise their own DOs and DON'Ts creates a record of how their approach to AI develops across tasks — and surfaces misalignments between what they say they'll do and what they produce.

👥

Peer assessment

Evaluating others' work develops critical judgment that producing work alone doesn't. Done systematically, peer evaluation signals both the quality of the work reviewed and the quality of the reviewer.

📈

Longitudinal observation

A single data point is ambiguous. Patterns across multiple tasks are meaningful. Tracking how a student behaves over time reveals what a snapshot assessment cannot.

What this is — and what it isn't

Being clear about limits is part of using any tool well.

✓ What MentorplAI is

✓
A structured environment that makes how students work with AI visible — not just what they produce.
✓
A longitudinal record that helps lecturers have better, more specific conversations with students.
✓
An assessment design that treats AI collaboration as a skill to develop — one that can be observed across tasks.
✓
A peer evaluation framework that develops critical judgment, not just produces a grade.

✗ What MentorplAI is not

✗
An AI detector. Signals are for lecturers to interpret with context — not automatic verdicts.
✗
A surveillance tool. The system surfaces patterns for teaching decisions — it doesn't replace professional judgment.
✗
Foolproof. Students can game any system. What changes here is the cost: shallow work produces consistent patterns across tasks that are harder to sustain convincingly.
✗
A replacement for good pedagogy. The system works best when used as a starting point for teaching — not an endpoint for evaluation.

Your students are using AI. Are you assessing the skill — or just the output?

Three problems AI made harder to ignore

Output quality isn't the same as skill

The process is invisible

Assessment hasn't kept up

How it works

Lecturer sets up the task

Students define their working principles

AI generates a draft — the student shapes it

Peers compare submissions through structured judgement

Two scores, one grade — and a longitudinal picture

Three signals, read together

Submission quality

Agency

Evaluator quality

Built on established ground

Comparative judgement

Explicit working principles

Peer assessment

Longitudinal observation

What lecturers actually get

See patterns across your whole class

Targeted feedback to specific students

Live test runs — no pre-registration needed

What this is — and what it isn't

✓ What MentorplAI is

✗ What MentorplAI is not

See your class differently.
Teach at your best.

Your students are using AI. Are you assessing the skill — or just the output?

Three problems AI made harder to ignore

Output quality isn't the same as skill

The process is invisible

Assessment hasn't kept up

How it works

Lecturer sets up the task

Students define their working principles

AI generates a draft — the student shapes it

Peers compare submissions through structured judgement

Two scores, one grade — and a longitudinal picture

Three signals, read together

Submission quality

Agency

Evaluator quality

Built on established ground

Comparative judgement

Explicit working principles

Peer assessment

Longitudinal observation

What lecturers actually get

See patterns across your whole class

Targeted feedback to specific students

Live test runs — no pre-registration needed

What this is — and what it isn't

✓ What MentorplAI is

✗ What MentorplAI is not

See your class differently.Teach at your best.

See your class differently.
Teach at your best.