Methodology · A field manual

How Strata Mundo works

Transparent by design. Every choice — sources, vocabulary, the analysis rules, the plan logic — is explained here.

The three questions Strata Mundo answers

I
Question I
Where is the learner in their math journey, really?
A telemetry-based assessment that reads the trajectory — drags, removals, commits, resets, timing — and produces a categorical mastery map per CCSS standard. Four states (Mastered / Working on / Needs attention / Not yet probed) with named misconceptions and traceable evidence.
II
Question II
What should they work on next, exactly?
A mastery atlas grounded in the published Common Core Coherence Map and the Illustrative Mathematics curriculum sections. Concept dependencies are visible. The Plan Architect skips what is mastered and starts at the first section with any flagged standard.
III
Question III
What different effective tools are out there to truly master that skill?
A tailored plan from a curated, multimodal library. Concrete → representational → abstract sequencing. On-screen + off-screen + hands-on activities per concept. The plan is a living document — it shifts as the learner’s mastery evolves.

Authoritative sources

Strata Mundo doesn't invent terminology, groupings, or sequencing. Every level of the hierarchy comes from a published, authoritative source.

Progression
Cross-grade developmental arc through one mathematical domain (e.g., Fractions, K-5).
Source: Progressions for the Common Core State Standards in Mathematics
Bill McCallum, Hung-Hsi Wu, Phil Daro et al. (University of Arizona, Institute for Mathematics and Education)
https://ime.math.arizona.edu/progressions
Section
In-grade grouping of lessons within a Unit. Each Section contains several lessons targeting specific Standards.
Source: Illustrative Mathematics K-5 Curriculum
Illustrative Mathematics, distributed by Kendall Hunt. CC BY 4.0 license.
https://im.kendallhunt.com/k5
Standard
Individual learning target. Each standard has a CCSS-M code (e.g., 4.NF.A.1).
Source: Common Core State Standards for Mathematics
National Governors Association + Council of Chief State School Officers
https://www.corestandards.org/Math/
Coherence Map
Cross-grade prerequisite arrows showing which standards depend on which others. Used for differential diagnosis (within-concept vs. prerequisite gap).
Source: The Coherence Map
Student Achievement Partners, based on Jason Zimba's Wiring Diagram
https://tools.achievethecore.org/coherence-map/

The hierarchy: Progression → Section → Standard. Each level uses its source's actual published terminology.

Glossary

Mastery map — the structured output of analysis. Every standard gets one of four states.
Mastered (green, emerald-600) — reliably understood with clear reasoning across multiple problems. Meets analysis rule R10.
Working on (amber, amber-700 for text) — partial understanding. Some right, some wrong, OR right only after multiple attempts without clear reasoning.
Needs attention (red-600) — a specific named misconception detected with evidence in the telemetry.
Not yet probed (stone-400) — this standard hasn't been touched in any completed assessment yet. Neither known nor unknown.
Telemetry — every interaction during an assessment recorded as a timestamped event: placement, removal, commit_attempt, reset.
Focused probe — a narrow re-assessment of one standard (4–6 problems, ~10 min). Run after the recommended activities to verify a misconception has resolved.
Plan Architect — Anthropic Managed Agent that reads a mastery map and writes a tailored plan with 2–3 activities per priority gap.
Smart-skip — when generating a plan, the Plan Architect skips Sections that are already mastered and starts at the first Section with any flagged standard.

How the assessment works

~10 minutes, ~9 problems. Drag-and-build mechanic: the learner drags unit fraction pieces (1/2, 1/3, 1/4, 1/6, 1/8) onto a target bar to construct the requested fraction.
Some problems force equivalence reasoning by restricting the palette (e.g., "build 2/3 using only sixths").
No typed answers. No multiple choice. The mechanic asks the learner to show, not tell.
Every interaction is recorded as process telemetry — drags, removals, commits, resets, timing.
v1 covers 11 standards across grades 2–4 (the part of the K-5 Fractions Progression we currently probe, plus 2 prerequisite Geometry standards on partitioning).

How analysis works

A single Claude Opus 4.7 call reads the telemetry and produces the mastery map. Analysis follows ten reasoning rules (R1–R10) that prioritize process over outcome.

R1 — Process over outcome. Don't infer mastery from a correct final answer alone.
R2 — First-commit-success with deliberate pacing is a strong "demonstrated" signal.
R3 — Strategy-switching on reset (different denominators on the second attempt) is comparably strong evidence — self-correction is one of the strongest mastery signals research has (Rittle-Johnson 2017, Siegler's overlapping-waves theory).
R4 — Same-strategy resets = guessing/fiddling, not reasoning.
R5 — Three or more commit attempts with the same composition = working, not mastered.
R6 — Rapid commits AND wrong = guessing (Wise 2017). Speed alone is not a guessing signal.
R7 — Specific wrong-commit content maps to specific named misconceptions, declared in each problem's response map.
R8 — No commit attempt → not_assessed.
R9 — Evidence in data, not narrative. Use plain language for guides; problem IDs go in audit fields, not in prose.
R10 — "Mastered" requires success across multiple problems for a standard, with clear reasoning.

How the plan is generated

The Plan Architect is an Anthropic Managed Agent running on Claude Opus 4.7. It reads the mastery map and writes a guide-facing plan in 1–3 minutes.

Differential diagnosis. For each priority gap, the agent decides whether the issue is within-concept or whether it's actually a prerequisite gap from an earlier standard.
Smart-skip. The agent identifies the FIRST IM Section containing any flagged standard and starts there. Earlier sections where everything is mastered or untouched are marked accordingly.
2–3 activities per priority gap. Activities are picked from a curated resource library, never generated.
Concrete → Representational → Abstract sequencing. Hands-on first, video/digital second, worksheet/symbolic last (Van de Walle 2014).
Avoids failed resources. If a previous plan tried a resource and the misconception is still flagged, the agent picks a different resource for the next plan.
Plain-language rationale. Every activity gets a 1–2 sentence explanation of why it's prescribed for this learner's specific misconception.

The probe loop

The general assessment maps the broad mastery picture across many standards.
The plan prescribes activities for the flagged standards.
After the learner does the activities, the guide runs a focused probe on one standard — ~4–6 problems, ~10 minutes — to verify the misconception has resolved.
If resolved → the standard moves to Mastered in the parent mastery map.
If not resolved → the Plan Architect re-plans with options: same activities + more time, different modality, or escalate to a prerequisite.
This loop is what distinguishes a diagnostic of current misconception from proof of mastery. Mastery is earned through the loop over time, not claimed from a single assessment.

How the diagnosis is grounded

Named misconception detection with traceable evidence. Wrong-answer patterns are mapped to misconceptions from the literature, citing the problems where they fired. Educators see a specific cognitive error, not a percentage.
Strategy-switching on reset is positive evidence of mastery. A learner who tries one approach, gaps, resets, and tries another successfully is demonstrating self-correction — one of the strongest mastery signals research has (Rittle-Johnson 2017, Siegler's overlapping-waves theory).

Community contributions: AI-vetted, human-approved

The library of activities grows the way good teaching practice has always grown — through the contributions of many practitioners. Anyone can submit a new activity for any standard via the Contributepage, or directly from a learner's plan via the "Suggest an activity for this standard" link next to each gap.

Every submission goes through a two-stage review: an AI reviewer (Claude Opus 4.7) first, then a human reviewer. The AI never approves directly — it only passes, flags, or rejects. Final approval is always human. Both sets of criteria are documented below.

Stage 1 — AI vetting criteria

The AI applies the criteria in order. Each criterion has an ID; when a submission is flagged or rejected, the specific IDs are cited so the contributor knows exactly what to address.

Section 1 — Completeness (must pass all)

1.1 Title is a specific name, not a generic phrase. ✗ "Math activity" ✓ "Build-a-fraction interactive — PhET"
1.2 Description explains what the learner does (action + concept), not just what they learn.
1.3 Modality matches the description.
1.4 At least one CCSS-M standard is selected, plausibly related to the description.

Section 2 — Pedagogical fit (project hard rules; reject if violated)

2.1 NOT a learner-facing chatbot or AI tutor.
2.2 NOT primarily gamified with tokens, coins, streaks, or leaderboards.
2.3 Grade band fits 3rd–4th grade (or a Coherence Map prerequisite like 2.G.A.3).
2.4 Activity actually teaches the standards selected, not adjacent ones.

Section 3 — Source quality (borderline if any concern)

3.1 URL (if provided) is from a recognizable educational source OR is specific enough to verify.
3.2 Source/vendor name matches the URL's domain.
3.3 For physical materials, brand or vendor identifiable.
3.4 No obvious blocklist domains (gambling, ads, content farms).

Section 4 — Safety + appropriateness (reject if violated)

4.1 Description is on-topic for math education.
4.2 No promotional/advertorial language.
4.3 No personally identifying info about specific children.
4.4 Language appropriate for an educational context.

Section 5 — Non-duplication (best-effort, flag for human)

5.1 If submission appears identical to a known curated resource, flag.

Section 6 — Instructions to the AI itself

6.1 Never approve. Only humans approve.
6.2 Never reject for stylistic preferences.
6.3 Never reject "different from typical" approaches that meet pedagogical fit. Distinctive approaches are valuable.
6.4 When uncertain, prefer "borderline" over "rejected."
6.5 Always cite the specific criterion ID(s) violated.
6.6 Respond only with valid JSON.

Stage 2 — Human review criteria

The human reviewer applies all of the AI criteria above plus the following judgments, which require human discernment:

Human-only judgments

H1 Does the activity exemplify quality teaching practice — does it model the kind of learning we want children to have?
H2 Is the activity additive to the existing library, or is it materially redundant with what we already have?
H3 If a URL is provided, the human verifies it actually points to the activity described (the AI cannot fetch URLs).
H4 For physical materials, the human verifies the material is purchasable/findable.
H5 Does the description set realistic expectations? (Misleading promises about what a learner will achieve are rejected.)
H6 Final pedagogical judgment: does this belong in a Strata Mundo learner's plan? When the human says yes, the activity is approved.

The criteria are versioned with the codebase and revised as we learn what works. The current source of truth lives in lib/ai-vet-activity.ts.

What we deliberately don't do

No learner-facing chatbot. All learner-facing interactions are structured: forms, problems, visual feedback. The LLM does cognitive work behind the scenes, never as a chat with a child.
No percentage scores. Categorical states only. Percentages collapse different mastery realities (fluent guessing, slow reasoning, partial understanding) into one number that hides the diagnosis.
No gamification. No XP, streaks, leaderboards, badges, or extrinsic rewards. Mastery-based settings reject these mechanics; we honor that.
No gated progression. The system suggests; the guide decides. Mastery is declared by the guide, supported by the evidence we surface.
No selling of learner data. Ever.

What v1 doesn't yet do

v1 only renders build_fraction problems. Problem types for number-line placement, comparison, identification, and partitioning are in the bank but not yet rendered. Each focused probe currently varies in surface features (denominators, palettes, magnitudes) but not across representation types. That arrives in v1.1.
v1 covers fractions only. v1.5 extends to all of 4th-grade math (Operations, Place Value, Measurement, Geometry).
Multi-curriculum resource picker (Beast Academy / Saxon / Singapore / Math-U-See / Montessori) is post-v2.

Privacy and data

Learner data lives in a Supabase Postgres database with row-level security.
Anonymous authentication — no email, no password collection in v1.
Telemetry events (drags, commits, etc.) are stored alongside the assessment row. No third-party analytics.
The Plan Architect agent runs on Anthropic's Managed Agents infrastructure. No learner names or PII are sent to the agent — only mastery-map states + standard codes.

License

Code: MIT licensed.
Illustrative Mathematics K-5 Section structure: CC BY 4.0. We use IM section names verbatim with attribution.
PhET Interactive Simulations referenced as a resource: CC BY 4.0. Attribution: "PhET Interactive Simulations, University of Colorado Boulder."
CCSS-M and the Coherence Map: referenced for the standards taxonomy and prerequisite structure.