The Proof Ledger

We audited 20 B2B ad accounts. Here is what we found

LN

Louis Newman

Founder, Magnetite · 11 June 2026 · 7 min read

TL;DR: Across 20 B2B ad account audits and a 113-product automated discovery sweep by Magnetite (magnetite.ai), the Proof Engine, every product company with a real user base had genuine, unused third-party praise. The oldest unchanged ad we found had been running for 10 months, and one full discovery sweep costs about five cents. The aggregate numbers, the exclusion taxonomy and the methodology are below.

Over the past ten months I audited 20 B2B ad accounts and built deep proof inventories for 9 of those companies. This post is the aggregate of what we found. Every number in it comes from that work or from our engine's live metering, and where a finding is anonymized I say so.

The aggregate, in one table

MetricValueProvenance
B2B ad accounts audited20Manual audit work, Aug 2025 to Jun 2026
Deep proof inventories built9Subset of the audited companies
Product companies with unused praise100%Every audited company with a real user base
Oldest unchanged ad found10 monthsSame creative, same copy, same audience
Products swept by the engine109 of 113 had genuine praiseLive engine metering, June 2026; full dataset in the Proof Index
Cost of one full discovery sweepabout $0.05Live engine metering, June 2026

Sources and the monthly-updated industry benchmarks sit on the benchmarks page.

The headline finding

Every product company with a real user base had unused praise. Not most. All of them.

By unused praise I mean a specific thing: a public post, review or mention from a genuine third party that praised the product, that the company never amplified, never cleared rights on, and in most cases never saw. Not testimonials they collected. Not case studies they wrote. Praise that already existed in the wild, written by people their buyers actually believe.

The freshest example: during one audit of a sales-intelligence platform selling into the DACH market, we found unprompted praise in German from their exact ICP, posted the day before our audit. One reaction. Their only live ad had been running unchanged since August 2025.

The state of the ad accounts

The accounts themselves told a consistent story.

The oldest unchanged ad we found had been running for 10 months. The same creative, the same copy, the same audience, while the account quietly paid more per click every month as the audience tuned it out.

A proposal-software company had a founder-praising workflow post hit 268 reactions in a single day, organic, free, written by someone their buyers trusted. Their ad account at the time was running one pain line, rewritten six ways. Six variations of the same sentence is not a creative strategy. It is what supply exhaustion looks like in an interface.

These are not lazy teams. They are teams doing what the playbook says: write copy, test variants, refresh when performance decays. The playbook just ignores the largest supply of credible creative in their market, which is other people's posts.

What the engine sees

The manual audits had a ceiling: my time. So we built the discovery engine and pointed it at the problem.

In June 2026 we swept 113 B2B SaaS products through the engine: audit clients, famous benchmarks (Figma, Linear, HubSpot, Gong, Vercel) and a mid-market cohort that matches our own customer profile. The result: 109 of 113 had genuine third-party praise on LinkedIn within the last month. Four products had every discovered mention excluded by the grading gates, which we report rather than hide. The aggregate is published as an open dataset in the Proof Index. The full cost of discovery for one product was about five cents.

Two of the public benchmark sweeps are worth quoting because the products are famous enough to check our work:

ProductPosts found in one sweepGraded as genuine praiseGold grade
Notion743211
Loom744524

Everything in the Notion row was posted within days of the sweep. Loom is the richest inventory we have measured so far.

We published the Notion sample audit with linked sources so you can verify the units yourself.

What gets thrown away matters as much as what gets kept

A discovery sweep is noisy by design, and most of what it finds is not usable proof. Across the June sweeps the engine excluded roughly two thirds of raw findings. The reasons are instructive:

  • Wrong entity. "CloudTalk" the sales dialer shares a name with an unrelated SIP provider. "Storylane" collides with Articulate Storyline. A meeting tool collided with the acronym TL;DR. If you are not checking entity identity, your proof inventory is contaminated.
  • Employees and partners. An employee praising their own company is allowed on LinkedIn, but it is not customer proof, and grading it as such would be dishonest. Integration partners announcing a feature are marketing, not praise.
  • No proof content. Mentions in passing, requests for recommendations, posts that name the product without endorsing it.

The same reasons, with the real collisions that taught us each one:

Exclusion reasonWhat it looks like in the wildReal examples from our sweeps
Wrong entityThe name matches, the company does not"CloudTalk" the sales dialer versus an unrelated SIP provider; "Walnut" the demo platform versus real estate firms; "Surfe" the CRM connector versus surf shops; "Storylane" versus Articulate Storyline
Employee or partnerReads like customer love, is actually marketingEmployees praising their employer; integration partners announcing a feature
No proof contentThe product is named but not endorsedTool-stack listicles with no experience attached; requests for recommendations

Exclusion is not failure. It is the work. The value of a proof inventory is exactly the credibility of what survives the filter.

The taxonomy that emerged

Grading hundreds of units produced a stable taxonomy, which we wrote up in detail in the proof grading system. The short version:

  • Gold: a first-person account from a genuine user describing their actual workflow and outcome.
  • Expert: an authority figure in the buyer's space endorsing the product with less personal detail.
  • Mention: listicle and tool-stack inclusions. Low specificity, still useful in volume.
  • Review: structured review-site content. Credible, but solicited-context.

One pattern surprised us enough to become a grading rule: the best units are often the quietest. The DACH find had one reaction. A gold-grade workflow post from a practitioner with exactly the right audience routinely outperforms a 700-reaction listicle on credibility, because the listicle's reach is broad and shallow while the practitioner's reach is narrow and exactly your buyer. We made it a hard rule that low engagement must not cap a unit's grade.

How the audits were run

The manual audits followed the same shape every time. We looked at what the company was actually running (formats, how long each creative had been live, how many distinct voices appeared in the account), then searched the public web for genuine third-party praise the company had never used: LinkedIn first, then X, Reddit, review sites and podcasts. Each finding was logged with its source link, author type and date, and graded by hand against the rubric that later became the proof grading system.

The engine sweeps automated the discovery half of that work in June 2026. One sweep unions multiple discovery connectors, dedupes by URL and author, runs every candidate through the entity gate, and grades the survivors. The metering is part of the system: every sweep logs its own cost, which is how we can say a full sweep costs about five cents rather than estimating it.

Two honest limits of the data. First, 20 accounts is a meaningful sample of mid-market B2B SaaS, not a census; we publish the aggregate because the pattern has been uniform, and we will keep updating the numbers as the count grows. Second, the audits skew toward companies that talk to us, which likely means teams that already suspect their advertising has a problem. If anything, that biases the unused-praise finding downward: companies that never think about advertising probably sit on more unused proof, not less.

What this means if you run B2B ads

Three takeaways from the aggregate:

  1. Your proof inventory almost certainly exists. Across 20 audits and 113 engine sweeps, fewer than 4% of products came back without a single genuine-praise unit on the first pass. The question is rarely whether yours exists, it is how big it is and how fresh.

  2. Your creative supply problem is artificial. The accounts running one pain line six ways were sitting next to organic praise they had never amplified. The supply is there. What is missing is the pipeline: discovery, entity-checked grading, rights clearance, rotation. I wrote more about this in why your customers already wrote your best ads.

  3. The economics of looking are trivial. Five cents of discovery per product, a few cents of grading. The expensive part was never finding the proof. It is the operational discipline of clearing rights and running it properly, which is exactly the part a tool cannot do alone and an agency will not do continuously.

See your own numbers

The free Proof Audit applies this exact process to your product: a discovery sweep, a graded inventory, and a side-by-side with the ads you are running now, priced against your own CPC baseline rather than industry averages. The inventory is yours to keep either way. Book the audit here.

See your own proof inventory.

The free Proof Audit runs this exact process on your product: one sweep, a graded inventory, priced against your own baseline. Yours to keep either way.

Book the free Proof Audit