Articles  

Introducing Adevals: The Open Benchmark for Marketing AI

At Pencil, our mission has always been clear: to build AI that serves as a powerful tool for creative people, not a replacement for them. But as the industry moves from AI experimentation to AI integration, we’ve noticed a growing problem.

95% of AI pilots never reach production. This isn't because the technology doesn't work; it’s because it is incredibly difficult to prove how it works in a repeatable, professional context. Most AI vendors showcase highly curated "hero" demos where everything is perfect. They don’t show the failures, the hallucinations, or the human hours required to get to that "one-shot" miracle. Today, we are launching Adevals: a systematic, transparent, and public evaluation of AI capabilities across 100 real-world marketing use cases.

Why Adevals Matters

CMOs are under immense pressure to justify AI investments. Creatives are rightfully skeptical of "magic button" promises. The industry needs a source of truth—an honest, use-case-level evidence base for decision-making.

Adevals provides that evidence. It is a live laboratory where we test everything from video scriptwriting and audience strategy to technical measurement reporting. We aren’t just showing the highlights; we are publishing the failures too. Because if you cannot see the failure cases, you cannot trust the success cases. The Methodology: Beyond the PromptEvery evaluation in the Adevals library follows a strict, transparent protocol:

  • The Top Use Cases: We’ve identified 100 of the most critical tasks in the modern marketing lifecycle.
  • The Execution: The exact prompts, workflows, and AI models used.
  • The Output: The raw, un-airbrushed result.
  • The Verdict: A clear Pass/Fail based on technical specs and creative quality.
  • Human Investment: We track the human time required and the number of generations needed to reach the result.

We believe "greatness" comes from the synergy of model performance and human skill. Adevals documents that journey, showing exactly what is required to move from a prompt to a production-ready asset.

Open Sourcing "Plaque Slayer"

To test these use cases properly, we needed a brand that pushes the envelope. Traditional corporate brands can be safe and, frankly, a bit boring for AI testing. Enter Plaque Slayer.
Plaque Slayer is our fictional, "rabid" oral care brand. It’s bold, irreverent, and unapologetically aggressive. Its brand voice is a battle cry against "party funk" and "plaque armies." We chose this direction because it gives us—and you—the creative permission to see if AI can handle raw, visceral energy and unconventional story telling.We are open-sourcing the Plaque Slayer brand assets. We are making the brand principles, tone of voice, and visual identity available to the community to use as a standardized "sandbox" for AI testing.

A Call to the Creative Community

We don’t want Adevals to be a closed shop. We want it to be an industry-wide benchmark. If you are a creative, a prompt engineer, or an AI enthusiast, we invite you to contribute your own evals to the library. To support this:


  1. 500 Free Generations: We are giving 500 generations within the Pencil platform to anyone who wants to contribute a verified eval using the Plaque Slayer brand.
  2. Public Credit: Every contributor will be listed as the author of their eval, providing a public-facing gallery of your skill in "working with" the AI.

An Offer to Enterprise Advertisers

For brands looking to move beyond experimentation, we are opening a dedicated path for enterprise testing. Advertisers can now submit their own specific use cases to our team. In return, we provide a comprehensive Adevals Readiness Report. This report outlines:

  • Efficiency Gains: A data-backed projection of the time and cost savings available for your specific workflows.
  • Implementation Roadmap: A detailed breakdown of the technology and people investments required to move these use cases from pilot to production.
  • Benchmarking: How your specific needs compare to the top 100 use cases currently being tracked.

Looking ahead

At Pencil, our goal is to surprise you with what creativity can do when it is unburdened by mundane production tasks. Adevals is not a static project; it is a living map of the industry’s progress.

We will be extending Adevals over time as models and techniques—from foundational LLMs to custom workflows—improve. As the "AI magic" becomes more powerful, the human skill required to direct it becomes even more vital. By documenting this evolution, we want to make the achievement of brilliant, high-performing creative advertising possible for everyone.

The "90s-style," non-creative way of visualizing AI data is over. We are building a data-driven experience that celebrates creative work and stands up to the scrutiny of the best in the business.

Will Hanschell Co-founder, Pencil