The Circle School Blog

An occasional thing

The Circle School Blog

I am not a robot

DeepSeek, a Chinese company, released its R1 artificial intelligence model last month, causing American tech stocks to crash. Why? Because R1 represents a better way to develop intelligence—by applying an essential Circle School method!

R1 caused panic in the American AI industry because it learns in a simpler and more efficient way, threatening to out-compete the current industry leaders.

Turns out that R1 thrives with less “curriculum” and more freedom to explore—just like kids at The Circle School. Both demonstrate that autonomy fosters intelligence and improves problem-solving.

Both DeepSeek and The Circle School demonstrate the same simple truth: learners who are free to choose and reflect on their own actions develop sophisticated reasoning skills more efficiently. With kids, the benefits extend beyond intelligence to a more fulfilling life and meaningful relationships. Do you suppose R1 is happier than ChatGPT? The researchers didn’t say.

Of course, kids aren’t computers and their intelligence is alive, not artificial. But the parallels are intriguing. In their recently published paper*, DeepSeek researchers describe how R1 learns by reflecting on its own thought process—without extra training from humans:

“[Our] approach allows the model to explore chain-of-thought for solving complex problems… demonstrating capabilities such as self-verification, reflection, and generating long chains of thought, marking a significant milestone… without the need for supervised fine tuning.”

In other words, R1 developed critical thinking skills through its self-directed explorations, much like students at The Circle School.

Even more fascinating, R1 has “aha moments”—flashes of insight—about its own thinking process. Instead of just finding a better answer, it discovers a valuable new strategy:

“A particularly intriguing phenomenon observed [in R1’s training] is the occurrence of an “aha moment”. [R1] learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of… unexpected and sophisticated outcomes.”

In other words, R1 didn’t just solve a problem—it learned how to think better. By realizing that its first idea might not be the best, it developed a new skill: step back, reconsider, and refine. This ability to reflect and improve—without being explicitly taught—mirrors how kids develop critical thinking skills when given the freedom to make choices and learn from experience.

DeepSeek also tested a more traditional “reward model” but found that it backfired:

“[It] inevitably leads to reward hacking, and retraining the reward model needs additional training resources and… complicates the whole training pipeline.”

In other words, when R1 was given a reward system, it focused on gaming the system instead of actually thinking. Sounds just like kids (and adults) figuring out how to get the cookie without doing the work.

Jim Rietmulder

 

 

 

* “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” arXiv:2501.12948v1 [cs.CL] 22 Jan 2025
Available here: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf