Five Quality Gates That Score Content for AI Extractability Before Deploy
Five automated gates between written and deployed. If any gate fails, the piece gets blocked with a specific reason.

We publish 12+ articles per day across four domains. Every piece needs to pass structural and citation checks before it goes live. Doing this manually doesn't scale. Doing it after publish means broken content sits in production for hours.
The solution is a publish pipeline — a series of automated gates that run between "content is written" and "content is deployed." If any gate fails, the piece gets blocked with a specific failure reason. No human has to review every article. The pipeline catches what humans miss.
The problem with post-publish quality checks
Most content teams check quality after publishing. Someone reviews the live page, spots a missing citation, fixes it, redeploys. The problem with this approach at scale:
- The broken window is open. Between publish and fix, the page is live with errors. AI engines may crawl it during that window. First impressions in the index are hard to undo.
- The review bottleneck. One person reviewing 12+ articles per day will miss things. The error rate compounds with volume.
- No mechanical enforcement. Guidelines exist in documents. Compliance depends on the writer remembering them. At scale, memory is not a reliable enforcement mechanism.
Pre-publish gates solve all three problems by making quality mechanical rather than aspirational.
The pipeline architecture
The publish pipeline runs as a sequential chain of gates. Each gate receives the content markdown and returns either PASS or FAIL with a specific reason. If any gate returns FAIL, the pipeline halts and surfaces the failure.
Content Markdown
│
▼
┌─────────────────┐
│ Gate 1: Schema │ → validates frontmatter fields
└────────┬────────┘
│ PASS
▼
┌─────────────────┐
│ Gate 2: Structure│ → checks heading hierarchy, sections, word count
└────────┬────────┘
│ PASS
▼
┌─────────────────┐
│ Gate 3: Citations│ → verifies minimum citation count, URL validity
└────────┬────────┘
│ PASS
▼
┌─────────────────┐
│ Gate 4: Extract │ → tests AI extractability (citable blocks, tables)
└────────┬────────┘
│ PASS
▼
┌─────────────────┐
│ Gate 5: Dedup │ → checks for overlap with existing content
└────────┬────────┘
│ PASS
▼
Deploy to repo
Five gates. Each is independent and testable. The order matters — cheap checks run first, expensive checks run last.
Gate 1: schema validation
The cheapest gate. Parses the markdown frontmatter and validates required fields exist with correct types.
function validateSchema(frontmatter, contentType) {
const required = SCHEMA_RULES[contentType];
const failures = [];
for (const [field, rule] of Object.entries(required)) {
const value = frontmatter[field];
if (!value) {
failures.push(`Missing required field: ${field}`);
continue;
}
if (rule.maxLength && value.length > rule.maxLength) {
failures.push(`\({field} exceeds \){rule.maxLength} chars (got ${value.length})`);
}
if (rule.pattern && !rule.pattern.test(value)) {
failures.push(`${field} doesn't match expected format`);
}
}
return failures.length === 0
? { pass: true }
: { pass: false, failures };
}
This catches: missing titles, missing descriptions, descriptions exceeding platform limits, malformed dates, missing content type tags. Roughly 8% of content fails this gate on first attempt, usually from description length overflows.
Gate 2: structural validation
Parses the markdown into an AST and validates the document structure against rules that affect AI extractability.
Checks include:
- Heading hierarchy: Single H1, logical H2/H3 nesting, no skipped levels
- Section count: Minimum 4 H2 sections for long-form content
- Word count: Within range for content type (blog: 3,500-5,000; curated: 900-1,600)
- Paragraph length: No paragraph exceeds 150 words (long paragraphs produce bad embedding chunks)
- List/table presence: At least one structured element for content containing comparison data
function validateStructure(ast) {
const headings = ast.children.filter(n => n.type === 'heading');
const h1s = headings.filter(h => h.depth === 1);
const h2s = headings.filter(h => h.depth === 2);
const failures = [];
if (h1s.length !== 1) failures.push(`Expected 1 H1, found ${h1s.length}`);
if (h2s.length < 4) failures.push(`Expected ≥4 H2 sections, found ${h2s.length}`);
// Check for skipped heading levels
for (let i = 1; i < headings.length; i++) {
if (headings[i].depth - headings[i-1].depth > 1) {
failures.push(`Skipped heading level at "${getHeadingText(headings[i])}"`);
}
}
return failures.length === 0
? { pass: true }
: { pass: false, failures };
}
The paragraph length check is the most frequently triggered rule. Writers naturally produce 200+ word paragraphs. Retrieval systems chunk content at roughly 500 tokens. A 200-word paragraph might get split mid-claim, producing two chunks that are each incomplete. The 150-word limit forces clean chunk boundaries.
Gate 3: citation validation
Counts external citations and validates that URLs resolve. This is the gate that enforces data density — the single strongest predictor of AI extractability according to the GEO research (Aggarwal et al., SIGKDD 2024).
async function validateCitations(content, contentType) {
const urls = extractURLs(content);
const minimums = { 'blog': 12, 'curated': 5, 'research': 8 };
const minimum = minimums[contentType] || 5;
const failures = [];
if (urls.length < minimum) {
failures.push(`Found \({urls.length} citations, minimum is \){minimum}`);
}
// Validate URLs resolve (batch with concurrency limit)
const results = await checkURLs(urls, { concurrency: 5, timeout: 10000 });
const broken = results.filter(r => !r.ok);
if (broken.length > 0) {
broken.forEach(b => failures.push(`Broken citation: \({b.url} (\){b.status})`));
}
return failures.length === 0
? { pass: true, citationCount: urls.length }
: { pass: false, failures };
}
The minimum thresholds come from testing: content with fewer than 12 citations for long-form blog posts consistently scored lower on AI extractability audits. The Princeton GEO paper found that adding statistics improves AI visibility by 30-40%. Each citation is a potential statistic or claim that an AI engine can extract.
URL validation catches a surprisingly common failure: stale links. Academic papers move. Company blogs restructure. A citation that resolved last month might 404 today. Running this check pre-publish prevents deploying content with dead references.
Gate 4: extractability scoring
The most complex gate. It scores the content on six dimensions that predict whether AI engines will extract and cite claims from the page.
| Dimension | Weight | What it measures |
|---|---|---|
| Answer-first structure | 20% | Do the first 60 words define the core concept declaratively? |
| Citable blocks | 20% | Does every H2 section contain an independently extractable claim? |
| Data density | 20% | Does the piece meet minimum citation count? |
| Heading keywords | 15% | Do headings contain target query terms? |
| Entity attribution | 15% | Are key entities stated in third person? |
| FAQ coverage | 10% | Are direct Q&A pairs present? |
The gate computes a weighted score from 0 to 10. The pass threshold is 8.0.
function scoreExtractability(content, ast) {
const scores = {
answerFirst: scoreAnswerBlock(content),
citableBlocks: scoreCitableBlocks(ast),
dataDensity: scoreDataDensity(content),
headingKeywords: scoreHeadingKeywords(ast),
entityAttribution: scoreEntityAttribution(content),
faqCoverage: scoreFAQ(ast)
};
const weights = {
answerFirst: 0.20, citableBlocks: 0.20, dataDensity: 0.20,
headingKeywords: 0.15, entityAttribution: 0.15, faqCoverage: 0.10
};
const total = Object.entries(weights).reduce(
(sum, [key, weight]) => sum + (scores[key] * weight * 10), 0
);
return {
pass: total >= 8.0,
score: total,
breakdown: scores
};
}
This gate fails roughly 15% of content on first pass. The most common failure: weak answer-first blocks. Writers lead with narrative context when AI engines need a declarative definition in the first 60 words.
Gate 5: dedup check
The final gate checks whether the new content overlaps substantially with existing published content. This prevents publishing a second piece that covers the same topic from the same angle.
The implementation computes a TF-IDF similarity score between the new content and every existing piece in the registry for the same domain:
function checkDedup(newContent, registry, threshold = 0.35) {
const newTerms = extractTermVector(newContent);
for (const existing of registry) {
const similarity = cosineSimilarity(newTerms, existing.termVector);
if (similarity > threshold) {
return {
pass: false,
failures: [`\({(similarity * 100).toFixed(0)}% overlap with "\){existing.title}" (${existing.url})`]
};
}
}
return { pass: true };
}
The 0.35 threshold was calibrated by testing against known duplicate and non-duplicate pairs. Below 0.35, most content about the same broad topic passes. Above 0.35, the piece is covering substantially the same ground as something already published.
Running the pipeline
The full pipeline executes in under 10 seconds for most content. Schema and structure validation are instant. Citation URL checks run in parallel with a 5-concurrency limit. The extractability scorer does string analysis only — no LLM calls.
async function runPublishPipeline(markdown, contentType) {
const { frontmatter, content, ast } = parseMarkdown(markdown);
const gates = [
() => validateSchema(frontmatter, contentType),
() => validateStructure(ast),
() => validateCitations(content, contentType),
() => scoreExtractability(content, ast),
() => checkDedup(content, loadRegistry(contentType))
];
for (const [i, gate] of gates.entries()) {
const result = await gate();
if (!result.pass) {
return {
passed: false,
failedGate: i + 1,
failures: result.failures || [`Score: ${result.score}/10`]
};
}
}
return { passed: true, score: gates[3]().score };
}
The pipeline halts on first failure. This is deliberate. Fixing a schema error might change the structure, which might change the citation count. Running all gates when the first one fails produces misleading downstream results.
What the pipeline catches in practice
Over 30 days of production use across four domains and 360+ published pieces:
| Gate | Failure rate (first attempt) | Most common failure |
|---|---|---|
| Schema | 8% | Description length overflow |
| Structure | 12% | Missing H2 sections, long paragraphs |
| Citations | 18% | Below minimum count |
| Extractability | 15% | Weak answer-first block |
| Dedup | 3% | Topic overlap with recent publish |
Roughly 40% of content fails at least one gate on first attempt. After revision, 100% passes — the failures are specific enough to fix mechanically.
The insight that made this approach work: quality enforcement before deploy is cheaper than quality correction after deploy. Banafea (2026) describes this as treating editorial content as "persistent state rather than transient documents" — each piece is an asset that should be validated before entering the production index, not patched after.
This methodology runs on every piece of content published across four domains.
AuthorityTech is the first AI-native Machine Relations agency.






