The Self-Referential Testing Effect

Trace Underwood
9 min readJul 29, 2019




When I was in grade school, I read this great book called The Mysterious Benedict Society. It started with a bunch of kids taking an absurdly difficult test to obtain entrance into a shadowy organization, which I appreciated as a kid who thought far too highly of his own test-taking skills and dreamed of participating in shadowy organizations. The part that really stuck in my mind, though, was the nature of the test as described. See, it contained all these bizarre, overspecific factual questions: which set of laws did king so-and-so of such-and-such country introduce in the mid-1500s, which chemicals are involved in the formula for some obscure compound, so forth.

The twist that each protagonist uncovered to join the team left me eagerly smug, certain that if I only got the chance I too could have solved it, as any good twist in a children’s book should. The first twenty-five of the fifty questions were like I stated above, but from the twenty-sixth, they shifted: In the mid-1500s, this set of laws was introduced in which country by which king? Which compound contains this particular group of chemicals? So it went down the list, each problem in the second half answering one in the first, and vice versa. All the material the kids needed was on the test itself.

Parts of tests became a game for me, following that example. I would delight in rooting out bits of info that the testers would let slip, feeling insufferably clever for “cheating the system” instead of paying attention in class and learning the material in the first place. The highlight, or lowlight, came when I realized I had never properly learned how to use radians to measure angles in an algebra class and shuffled frantically through problems on the final to gather what I needed to solve questions using it.

Whatever residual smugness I felt from any of that was dispelled handily when I later discovered the brilliant Self-Referential Aptitude Test, which distills that concept to its essential elements with questions like this:

Give it a shot if you’d like. It’s a fascinating exercise. I only solved a few questions before backing down and checking the answer, but it was great fun.

Through pleasant years away from standardized tests those memories faded, and were only brought back by some quizzes in the database course I recently finished. Before each section, I would take a pre-test. The quizzes were unusually long, and at some point they may have started scraping the content barrel a bit, since the same trend described above began to happen. Not for every question, of course, but for enough that some terms I had never heard at the start of the quizzes started feeling like old friends by the end.


All that has piqued my curiosity. Testing, particularly standardized testing, has fallen out of favor in recent years, if indeed it ever was in favor. At the same time, the testing effect is well-documented at this point: testing content is among the most effective study tools we know of.

Roediger and Karpicke ran a series of landmark experiments to demonstrate its use, including one where undergraduates either read a passage four times repeatedly, read it three times and tested once, or read it once and tested three times. The students found the passage more interesting when testing after reading, but expected to remember it better when reading repeatedly. Five minutes later, the repeated readers *did* remember a bit better, but when it came time to retest in a week the readers had forgotten almost half the material while the testers retained almost all they initially learned:

In another brilliant study, Richland, Kornell, and Kao demonstrated the effectiveness of testing before learning material, even when learners can’t answer the pre-test questions. Pre-testing study items helped learners more than emphasizing relevant points, providing extended study time, or simply providing the test questions for review before reading:

The testing effect, that unusual and striking efficacy of active recall, is essential to spaced repetition, itself one of the greatest innovations in learning retention.

But what if we went further? What if we built on that? Take a course, or a lesson, or even a simple passage of text — whatever it is you want to teach someone — and isolate the core elements you want them to learn. Don’t explain any of it directly to the learner. Instead, create a series of problems and questions out of that material. Rather than focusing on explaining the material and leaving testing only as an afterthought for evaluation, what if some people focused attention on creating meaningful self-explaining tests, leaving lectures or videos as an afterthought?

Here’s a quick first shot at a six-question version of what I’m thinking about. It’s cheating, since it repeats some of what I already wrote, but it’s a decent proof of concept and the topic is important enough to be worth repeating. Answers to the quiz are provided below.

1. According to Henry L. Roediger and Jeffrey D. Karpicke, repeatedly studying material helps with __________ recall, while studying followed immediately by testing helps with ________ recall.

2. In the 2009 study “The Pretesting Effect”, students quizzed about a paper before reading it do ________ on a post-test of questions they missed than students who see key items highlighted in the paper.

3. In 2006, Roediger and Karpicke published their findings on the testing effect in a landmark study titled ____________.

4. In “The Pretesting Effect”, _____________ outperformed students who _______________, demonstrating the value of active searching for answers over simple exposure to questions.

5. When Roediger and Karpicke asked students to predict their future performance, those who ___________ predicted they would outperform those who _______________.

6. Which authors wrote the 2009 study “The Pretesting Effect”?


There are some obvious objections to using this as a teaching method. As this is primarily a proposal, not a comprehensive review, I’ll focus on three that are to my mind the most important. The first is obvious: What’s wrong with having people read or watch something, then present a test on it? Why bother with “pure testing” in the first place?

In the abstract, I don’t think anything is “wrong” with a study-then-test approach. It’s reliable, it leads to recall, and it’s time-proven. My primary concern is that from my experience, when learners are largely self-guided, they tend to disregard content they see as peripheral. Every textbook under the sun has comprehension questions before and after its chapters, but at least in my own learning, I’ve often been guilty of skipping those questions, more interested in just getting through the reading. It’s one thing to lead a class through both a reading and a test, but if the goal is to enable self-directed learning, the most effective methods should be the overwhelming default.

The next: wouldn’t it lead to learners receiving a series of scattered ideas without adequate context?

A proper conclusion would require some experimentation, but let me explain one reason I’m not terribly concerned about this. Have you ever watched a YouTube video or read an article that gave you a perfect explanation of a difficult concept? Perhaps you found yourself nodding along, or thinking to yourself, “Yes! Exactly!”, or feeling like you suddenly understood an argument in great depth. Maybe, then, you wanted to explain that same point to someone else, or even simply remember what it was talking about afterwards.

Unless you’re very much unlike me, you would *not* suddenly burst into a thorough, lucid explanation of the entire idea. A couple of moments would spring to mind — a phrase here, a statistic there — but nothing near so coherent or perfect as what you heard. Try to explain the same thing a month later, and you’d be lucky to get a sentence or two out before wearing your material thin, and without regard for the order in which it was presented to you. In my experience, we build edifices of gossamer, feeling convinced we know a great deal of context because we heard material once, but actively able to pull up only a few threads when we actually need it.

If you construct a test instead of an article to teach material, you will likely end up providing less context than a lecture, a video, or an article. Context is costly when each piece requires a few questions. But I’m not so sure the learner would end up learning less, ultimately. At the least, they would have isolated the essential.

The second is less conceptual, more practical: Wouldn’t it take more time and effort to develop lessons these way?

Probably. And for a teacher, time-strapped and working for a couple of classes in a live room, that’s a major concern. It’s important to be able to prepare material quickly for the sake of both students and sanity. Online, though, the equation changes dramatically. It’s already possible to read about or watch almost anything via a quick Google search. What is most convenient to produce has been done many times over, and long ago. Each effort scales, too, able to reach an arbitrarily large audience without regard to time. At this point, it is much less useful to create convenient material — videos of lectures, dumps of textbooks, explainer videos and thinkpiece articles and so forth — as it is to create memorable, effective material that fills less convenient gaps.

I don’t know for sure that a pure testing approach would do that. But I think it’s underexplored, and it has a good shot.

If testing without prior explanations sounds strange to you, consider that video games do it successfully all the time. The puzzle game The Witness winds the player wordlessly through a world of increasingly complex puzzles, hinting step-by-step at the routes to solutions. Dungeon crawlers like Crypt of the NecroDancer and platformers like Super Meat Boy tell you a few basic controls, then toss you repeatedly against ever-more-complex enemies that you figure out on the fly or die trying. In many games that aim to provide more guidance, hints pop up only at the moment a player needs to do something new or has failed at a task, rather than being presented in advance in longer self-contained blocks.


Oh, and here are the answers to the above quiz, for those wondering:

1. Lindsey E. Richland, Nate Kornell, and Liche Sean Kao studied the value of unsuccessful test answers in the 2009 paper ____________.

2. The study “Test-enhanced learning” demonstrates that students who study material and immediately test predict that long-term they will ___________ students who repeatedly study material, but the reverse is true.

3. Richland, Kornell, and Kao demonstrated that students who incorrectly answered pre-test questions outperformed those who attempted to memorize the test questions, indicating that ________ is better than __________.

4. Which researchers wrote the landmark 2006 study “Test-enhanced learning”?

5. According to Richland, Kornell, and Kao, students who ____________ outperform students who see key test items highlighted in a paper before reading.

6. In “Test-enhanced learning”, researchers found that students are more confident in their recall after _________ which is more effective for immediate recall, but __________ allows better long-term retention.

It would hardly be sporting of me to provide an actual key, after all. If you scoured the first six questions wondering where the connections were, congratulations, you’ll probably remember more later.

The above test is far from perfect. For the information contained, I think it’s too short, overreliant on cloze deletion over other testing methods, and a bit clunky overall. That’s okay, though. There are a lot of ways to play around with the concept. My intuition is that on balance, completing even a clunky self-referential quiz about a topic will lead to more long-term learning than simply reading an article or watching a video. I intend to experiment more with the genre, and I would be delighted to see other examples.

And hey, if it turns out to be useless for actual learning, at least I’ll be providing my grade-school self the test of his dreams.



Trace Underwood

Passionate about learning, expertise, education, and the strength of narratives and deliberate restrictions. Rarely original, occasionally accurate.