How to trust the answers

An AI that talks confidently about your Salesforce org is easy to build. One you can trust is not. Here is exactly how sf-intelligence earns it — the test suite, the CI gate every change must pass, the per-answer trust labels, and the boundaries it refuses to cross.

3,100+
automated tests
9
tested packages
11
CI gates per change
0
org-data in the repo

automated testing

~3,100 tests across the stack

Every layer of the product has its own test suite — from the metadata extractors and the dependency-graph engine to the 148 MCP tools and the CLI. These run on every change.

mcp
The 148 read-only tools + router + grounding layer — ≈1,775 cases across 148 files.
extractors
Metadata parsing for every Salesforce component type — ≈707 cases.
graph
The DuckDB dependency-graph engine — ≈153 cases.
cli
The sfi command-line tool — ≈114 cases.
patterns
Heuristic recognizers (PII, naming, code quality) — ≈99 cases.
parsers
Formula / SOQL / source tokenizers — ≈90 cases.
renderers
Markdown vault rendering — ≈72 cases.
vault
Vault layout, manifest, freshness — ≈61 cases.
tooling-api
Optional Tooling-API enrichment — ≈49 cases.

Plus 7 integration suites that drive the real MCP server end-to-end — including post-refresh golden tests, a fixed reference-question set, and deep smoke tests.

the ship gate

Nothing ships unless all of this is green

Every change runs the full gate in CI. A single red check blocks the release.

Type-check

Strict TypeScript across every package.

Lint

ESLint, including import-order rules.

~3,100 unit tests

Every package suite must pass.

Integration + golden tests

The real MCP server, end-to-end.

End-to-end smoke

The server boots and answers over stdio.

Natural-language regression

A large question set is routed and checked, not just plumbing.

Analytical correctness eval

Answers are checked for the right verdict, not just a response.

Scale benchmarks

Import / refresh / resolve budgets enforced (below).

SAST

Static application security scanning.

Release guard

Validates the exact shipping set before publish.

Org-data leak scan

Fails the build if any real org identifier is found in the shipping set.

per-answer trust

Every answer is labelled

Testing keeps the code correct. These labels keep each answer honest — you always know where it came from, how it was derived, and how complete the evidence was.

confidence

declaredSalesforce states it directly — highest trust.
parsedFrom AST/XML parsing of source — high trust.
heuristicRegex / token analysis — spot-check it.

provenance

offline_snapshotThe last refresh's local vault — the default.
live_orgAn opt-in, capped, read-only SOQL read.
hybridFuses vault + live, discloses both.

completeness

completeEvery needed family was modeled.
partialA dependency wasn't retrieved — "not checked", not "none".
unknownCoverage couldn't be determined.

tested at scale

Budgets, gated in CI

Graph import

10,000 nodes in under 90 seconds.

Full refresh

1,000 object + field files in under 10 minutes.

Resolve

p95 under 2 seconds on the CI vault.

adversarial QA

Tested on real orgs

Beyond synthetic fixtures, the product is exercised against real production Salesforce orgs with a deliberately adversarial, failure-finding QA pass — not happy-path validation. Bugs found on real metadata are fixed and turned into regression tests so they stay fixed.

No customer org data is published or shipped — see the leak scan above.

the honest part

What testing can't promise

Trust also means stating the limits plainly.

!
Heuristic findings can be wrong.

Anything labelled heuristic (the Apex scanner, name-pattern detection) may have false positives. Spot-check before acting.

!
Static analysis has blind spots.

Dynamic SOQL and reflective Apex are invisible — "no references found" means "no static evidence", not "definitely unused".

!
Answers are as fresh as your last refresh.

The vault is a snapshot. health_check tells you when it's stale.

!
It's read-only by design.

No write path means no accidental changes — and no claim to fix anything for you.

Trust, but verify.

Every claim above is in the open-source repo and runs in CI. Point it at your own org and check the answers against reality.