At Pencil, our mission has always been clear: to build AI that serves as a powerful tool for creative people, not a replacement for them. But as the industry moves from AI experimentation to AI integration, we’ve noticed a growing problem.
95% of AI pilots never reach production. This isn't because the technology doesn't work; it’s because it is incredibly difficult to prove how it works in a repeatable, professional context. Most AI vendors showcase highly curated "hero" demos where everything is perfect. They don’t show the failures, the hallucinations, or the human hours required to get to that "one-shot" miracle. Today, we are launching Adevals: a systematic, transparent, and public evaluation of AI capabilities across 100 real-world marketing use cases.
CMOs are under immense pressure to justify AI investments. Creatives are rightfully skeptical of "magic button" promises. The industry needs a source of truth—an honest, use-case-level evidence base for decision-making.
Adevals provides that evidence. It is a live laboratory where we test everything from video scriptwriting and audience strategy to technical measurement reporting. We aren’t just showing the highlights; we are publishing the failures too. Because if you cannot see the failure cases, you cannot trust the success cases. The Methodology: Beyond the PromptEvery evaluation in the Adevals library follows a strict, transparent protocol:
We believe "greatness" comes from the synergy of model performance and human skill. Adevals documents that journey, showing exactly what is required to move from a prompt to a production-ready asset.
To test these use cases properly, we needed a brand that pushes the envelope. Traditional corporate brands can be safe and, frankly, a bit boring for AI testing. Enter Plaque Slayer. Plaque Slayer is our fictional, "rabid" oral care brand. It’s bold, irreverent, and unapologetically aggressive. Its brand voice is a battle cry against "party funk" and "plaque armies." We chose this direction because it gives us—and you—the creative permission to see if AI can handle raw, visceral energy and unconventional story telling.We are open-sourcing the Plaque Slayer brand assets. We are making the brand principles, tone of voice, and visual identity available to the community to use as a standardized "sandbox" for AI testing.
We don’t want Adevals to be a closed shop. We want it to be an industry-wide benchmark. If you are a creative, a prompt engineer, or an AI enthusiast, we invite you to contribute your own evals to the library. To support this:
For brands looking to move beyond experimentation, we are opening a dedicated path for enterprise testing. Advertisers can now submit their own specific use cases to our team. In return, we provide a comprehensive Adevals Readiness Report. This report outlines:
At Pencil, our goal is to surprise you with what creativity can do when it is unburdened by mundane production tasks. Adevals is not a static project; it is a living map of the industry’s progress.
We will be extending Adevals over time as models and techniques—from foundational LLMs to custom workflows—improve. As the "AI magic" becomes more powerful, the human skill required to direct it becomes even more vital. By documenting this evolution, we want to make the achievement of brilliant, high-performing creative advertising possible for everyone.
The "90s-style," non-creative way of visualizing AI data is over. We are building a data-driven experience that celebrates creative work and stands up to the scrutiny of the best in the business.
Will Hanschell Co-founder, Pencil