The Motivation Ceiling
Better AI tutoring can't solve a social motivation problem; new research hints at why.

“Stop asking me questions and just tell me what to do next.” The student was a few rounds into a collaborative project with a custom chatbot, and she’d had enough. The chatbot, designed for a study earlier in my PhD, was performing its best practices: asking guiding questions, probing the roots of her interest, trying to scaffold understanding rather than just delivering answers. The student bluntly wanted none of it.
This, of course, wasn’t an isolated case. In transcripts from students working on their own homework with AI tools, the dominant interaction pattern is often just: “tell me.” “tell me.” “do it.” Even in the best cases, the AI is designed for dialogue and the student wants a vending machine.
Khan Academy effectively retired its chatbot Khanmigo this month. The scale of the failure is significant. Khanmigo had arguably the best possible conditions for an AI tutoring product: an existing platform with millions of engaged users, a pedagogically thoughtful design team, early access to OpenAI’s models, years of testing and tweaking different interaction behaviors. There’s been a lot of good writing about what happened and why (some here: Dan Meyer, Chalkbeat). The short version, from Khan Academy’s own leadership: students were passive. “We see more ‘IDK IDK,’” said Chief Academic Officer Kristen DiCerbo, “more passive kinds of interaction than we would like.”
For most of my PhD, I’ve been making a case that’s well-established in learning sciences but still runs against the dominant assumptions in AI and edtech. The prevailing product bet, going back decades, is that more data and better personalization will unlock better learning. Many people thought: if the AI is good enough, if the pedagogy is sound enough, if the interface is polished enough, the engagement will follow. Khanmigo was the strongest contender in that argument, and students still responded “IDK. Tell me.”
The failure underscores what my research has been probing from a different angle: the bottleneck isn’t the product, it’s the roots of motivation. The industry has tried to close the motivation gap with extrinsic incentives like gamification and point systems. Alpha School has built its model around extrinsic rewards for AI-tutored learning; Duolingo uses streaks and leaderboards. These approaches acknowledge the gap, but they’re treating the symptoms.

A few months ago I ran a study to test the motivation question empirically: whether the source of feedback — human or AI — changes how students engage, even when the content itself is identical. It does. The more surprising result came from students who didn't quite believe the framing they were given. Together, the two findings suggest that the motivational ceiling on AI tutoring is lower than the industry assumes — and that raising it requires a different approach than making AI feel more human. The fix is using AI to amplify genuine human connection in learning.
We tested this with students receiving personalized feedback in an online tutorial. On a custom platform, ~150 participants completed a creative coding tutorial and got feedback at four checkpoints. Every piece of feedback was generated by the same LLM, using the same prompting structure, degree of specificity, and tone, tailored to each person’s specific code. The only difference was what participants were told: some were told it came from AI, others were told it came from a human teaching assistant. We included a third condition — AI feedback with a built-in delay — to separate the effect of AI-vs-human source attribution from the confound of timing (in a natural environment, AI feedback arrives almost instantly, while an online human TA takes a few moments to read and respond).
The paper, under review, is on arxiv — I’ll share the two findings that matter most here.
First: participants who believed a human was reviewing their work spent significantly more time on task and actively focused (about 28%) — even though they rated the feedback itself as equally accurate and useful. They also wrote longer, more complex code. The behavioral gap was invisible in the self-report data; people rated the AI feedback as just as helpful as human feedback, and then they engaged with it less.

We were also curious what drove the difference, as far as students could reflect on their own motivations. Consistently, their reports pointed to something more nuanced than feeling watched — it manifested as their own sense of intrinsic interest. We measured several self-report factors of social presence, and the scores were flat across all groups. Everyone felt similarly observed or judged.
Participants who believed a person had engaged with their work reported more genuine interest and more autonomous investment — saying they’d have put in the same effort even without feedback (we controlled for baseline differences in reported interest and experience in the topic). The mechanism looks closer to perceived authenticity — the sense that a real person is on the other end — than to accountability or surveillance.
So Khanmigo’s struggles didn’t shock me: isolated in a controlled experiment, the AI’s problem is the motivation, not the content. Much like trying to change someone’s mind with logic rather than emotion, you can’t fix a motivation problem by making the content better.
The second finding stood out to a surprising degree, even to me — and it may matter more for what comes next.
Nearly half of the participants who were told their feedback came from a human didn’t entirely believe it (this was a particularly AI-savvy field of participants). These skeptics didn’t just revert to AI-level engagement — they did worse than the group who knew from the start they were working with AI. They produced shorter programs, less complex code, spent less time in focus, and skimmed feedback at three times the rate of every other group. By their own report, they were more likely to skim the feedback and less likely to be motivated by it, essentially withdrawing from the feedback loop entirely.
Believers and skeptics were statistically identical on every baseline measure; the only difference was whether the framing felt credible. The outcomes are consistent with something like a trust violation: transparent AI makes no relational claim, so there's nothing to violate. But when a system claims human involvement — or, perhaps, when a teacher uses AI-generated content under their own name — and there's a sense of deception, it can activate a form of betrayal aversion. The result is active disengagement.
This is worth considering, because one obvious response to the Khanmigo failure is to make AI tutors feel more human — warmer, more relational. I recently gave a talk on a paper about the specific nuances of peer-peer interaction that drive greater interest and curiosity, and got multiple questions along the lines of "so we could bake these qualities into AI chatbots, right?" But the credibility data suggests that if students sense the performance is artificial, the effect could be much worse than straightforward, honestly attributed AI. The path forward isn’t making AI better at pretending to be a person.
Sal Khan now says the biggest lever is human systems. I agree — it's where my research has pointed, and where a growing body of evidence across learning sciences is converging. But “humans matter” isn’t specific enough to be actionable. The question is: what specifically about human presence drives engagement that AI alone can’t produce?
Our feedback study points to one answer: rather than surveillance or evaluation pressure, the critical factor looks like authenticity and relational quality. People invest more when they believe a real person has engaged with their work. They invest less when they suspect a system is performing a connection it doesn't actually feel.
There’s a motivational ceiling on interactions where no human is present, and that ceiling is lower than the ed-tech industry assumed. Khanmigo hit it with every advantage in the world. The question now is whether we keep trying to raise that ceiling with better product design — or start designing systems that use AI to amplify the human dynamics that actually drive motivation to learn.

“…designing systems that use AI to amplify the human dynamics that actually drive motivation to learn.” Do you have ideas of what this could look like? Seems almost oxymoronic (at least with respect to trying to design systems that have no humans, except the student, in the loop).