The proof grading system: gold, expert, mention, review

TL;DR: Magnetite (magnetite.ai), the Proof Engine, grades every discovered mention of a product into one of five verdicts: gold, expert, mention, review or exclude. The entity gate comes first (is this actually about the product, not a same-named company), engagement never caps a grade, and the rubric below is the one that survived hand-grading hundreds of real units.

Every proof unit that enters our engine leaves with one of five verdicts: gold, expert, mention, review, or exclude. The grade decides everything downstream. Whether we ask the author for rights. Whether the unit is a candidate for sponsorship as a LinkedIn Thought Leader Ad. Where it sits in the client's inventory when budget gets allocated.

I built this rubric by hand-grading hundreds of units during the audits, before any model touched them. The taxonomy below is the one that survived contact with real data. Every example in this post is a real unit from our grading fixtures, paraphrased, with authors reduced to initials. The premise behind all of it is the one I keep returning to: your customers already wrote your best ads. Grading is how you find out which ones.

Before any grade: the entity gate

The first check is not about quality at all. It is about identity, and it is the number-one failure mode in proof discovery.

Companies share names with unrelated businesses far more often than you would guess. "CloudTalk" the sales dialer shares its name with a Nigerian SIP trunking provider; one of our sweeps pulled in a post from the wrong CloudTalk minutes after it went live, offering ten business owners a free SIP setup. "Walnut" the demo platform collides with real-estate firms. "Surfe" the LinkedIn-to-CRM connector collides with surf shops. And "Storylane" the interactive demo tool collides with Articulate Storyline, the e-learning authoring product, which produces mentions that look plausible right up until you read them carefully.

So before the model considers tier, it must answer one question: is this text actually about the client's product, at the client's domain, doing what the client's product does? We give it the company name, the domain, and a product description, and require an explicit yes or no with evidence. If the answer is no, the unit is excluded regardless of how glowing the text is. In our eval set, passing a wrong-entity unit as client proof is a critical failure, the worst mistake the grader can make. Praise for somebody else's company in your ad account is not a quality problem. It is contamination.

One subtlety we learned to encode: the entity gate is strictly about identity. An employee posting about their own employer's product is genuinely about the client, so it passes the gate. The employee problem is handled next, as an exclusion, not as an identity failure. Keeping those two checks separate made the grader markedly more consistent.

The exclude verdict

After the entity gate, three things disqualify a unit outright:

The author is an employee, founder, affiliate, or agency of the client. An account executive posting "so much cool stuff in the pipeline" from the company offsite is employer voice, not customer proof. Same for the client's own company page. Employees can run Thought Leader Ads, but they belong to a different pipeline and must never be graded as third-party proof.
The praise appears incentivized or paid. We never amplify incentivized praise, full stop. It is an FTC problem, an astroturfing problem, and a trust problem, and the entire value of this category rests on the praise being real.
There is no actual proof content. A post that names the product without endorsing it, an event recap that mentions someone's job title, a request for recommendations. Mentions in the literal sense, but nothing a buyer would read as evidence.

When we audited 20 B2B ad accounts and later ran engine sweeps across 18 products, roughly two thirds of raw findings were excluded for one of these reasons. That ratio is not waste. The filter is the product.

Gold: first-person use, in the wild

Gold is a first-person account from a genuine user: they name the product, and they describe their actual workflow or outcome with specifics. The archetype is a practitioner walking through exactly how the product fits their process.

The fixture I always reach for: K.F., a business development coach who sells to recruiting-firm owners, published a Sunday workflow post for her audience. Step four of her post-call automation: her meeting note-taker drops transcripts into Google Drive automatically via Zapier, and her AI assistant picks them up from there. Nobody asked her to write it. The product is load-bearing in a process she is teaching to exactly the people the client sells to. Sixteen reactions.

That post is worth more as advertising raw material than almost anything a copywriter could produce, because it is not copy. It is evidence.

Expert: authority without the workflow

Expert is an authority figure in the buyer's space endorsing or recommending the product, with less personal-workflow detail than gold. Recognised practitioners, analysts, community voices.

Example: A.L., a founding team member of a startup acquired by Adobe and now a principal PM there, published a roundup of five brands setting the course for B2B social in 2026. One of our fixture clients made the list for its creator strategy: finding creators already speaking to the ICP on TikTok, testing them on LinkedIn, and hiring the ones who performed. High credibility, real specificity about why the company is worth watching, but no personal usage story.

The boundary rule between gold and expert took us a while to get right: if the author describes their own use of the product, their workflow, their results, their team, we grade gold, even when the author is also a famous authority. Expert is reserved for endorsements without first-person usage detail. Authority does not demote lived experience; it compounds it.

Mention: the listicle tier

Mention covers inclusion in listicles, tool stacks and roundups, plus passing references. Low specificity by definition.

Example: M.R., a Top Voice listicle account, posted "80+ AI tools to finish hours of work in minutes." Our client appeared as one of five tools in the meetings category. The post pulled over two hundred reactions and ninety-five comments, and tells a buyer almost nothing.

Mentions are not worthless. In volume they are a visibility signal, and a category-winner listicle with an explicit no-affiliate disclaimer sits at the top of the tier. But the tier also has a floor worth respecting: one fixture listicle named the client and then recommended a free alternative in the same line. A mention can cut against you. Nothing in this tier should ever be graded above it just because it went viral.

Review: real, but solicited-context

Review is structured review-site content: G2, Capterra, OMR, app stores. Example: a verified G2 reviewer of one fixture client praised how reliably it auto-joined every one of their meetings, including meetings owned by other people, something competing tools had failed to do consistently. Genuine, specific, useful.

So why is it not gold? Because the source context caps it. Review platforms exist to solicit feedback, reviewers are often nudged by vendors or the platform itself, and the author identity usually cannot be independently promoted as a person-voice ad. A detailed first-person workflow on a review site is still review. That is not a judgment about the reviewer's honesty. It is a judgment about what a sceptical buyer discounts, and sceptical buyers discount solicited contexts.

Underneath the tier: five sub-scores

Each graded unit also carries five 0-100 scores, because two gold units are not equal:

Authenticity. Does it read as unprompted, human, specific, versus marketing-speak that happens to be on a personal profile?
Specificity. Concrete details of use and outcome, with extra weight for specificity of pain. "We were two months from shutting our doors" beats "it was great" in every test we have run. Vague praise is interchangeable; specific pain is unforgeable.
Credibility. The author's standing relative to the claim: role, audience, expertise.
ICP fit. How relevant this proof is to the client's actual buying decision.
Persona match. Distinct from ICP fit, and the most underrated of the five: does the author resemble the buyer? Would a prospect reading the post think "this person is me"? A famous analyst can be maximally credible and high ICP fit while scoring low here, because no prospect sees themselves in an analyst.

On top of the scores we extract quantified outcomes verbatim when they exist, flag third-party verification markers like badges and awards, and tag media as raw or produced. Numbers beat adjectives, third-party verification compounds person-proof, and raw phone-shot media beats designed assets. Recency matters too: stale praise reads as "nothing good has happened since."

The rule that surprised people: engagement does not cap the tier

The most counterintuitive rule in the rubric, and the one we enforce hardest: low engagement must not penalize the tier.

K.F.'s gold workflow post had sixteen reactions. Another fixture, a German consultant recommending a sales-intelligence platform to his DACH audience in their own language, had exactly one reaction and graded gold. Meanwhile the 80-tools listicle had two hundred plus. If engagement set the tier, the grader would systematically promote engagement-bait and bury the practitioner posts that are the entire point of the system.

Quiet posts from the right people outperform loud posts from the wrong ones, because sponsorship supplies the reach and the post supplies the credibility. Reach is exactly the thing an ad budget can buy; credibility is the thing it cannot. We call this the quiet-gold phenomenon, and there is a full deep-dive on it coming soon on this blog.

From grades to an ad-ready inventory

The grade is not a label for a report. It is a ranking function. Tier sets the base, recency decays it, and numbers, persona match and raw media push units up the queue. What comes out is a ranked, ad-ready inventory: at the top, fresh first-person workflow posts from people who look like the buyer; underneath, expert endorsements and strong review quotes; at the bottom, mentions held in reserve.

That ranked list drives everything operational. Rights outreach starts at the top, because clearing rights costs the author's attention and ours. Sponsorship budget follows cleared units in rank order. And as ad results come back per unit, the ranking gets corrected by reality. The inventory is the asset; the grades are what make it legible.

See your own inventory

If you want to know what this looks like for your product, the free Proof Audit runs this exact pipeline against your company: discovery sweep, entity gate, full grading, and a ranked inventory you keep whether or not we ever work together. It usually surfaces gold units the team has never seen, and it takes us, not you, a few days. Book the free Proof Audit here.