Skip to content
LexBuild
On this page

LexBuild assigns every piece of legal content a canonical URI identifier. These identifiers appear in YAML frontmatter, _meta.json sidecar files, and cross-reference links throughout the output. They are stable, deterministic, and shared across all LexBuild tools. This page is the definitive reference for the identifier format and how cross-references are resolved.

Identifier Schemes

LexBuild uses three identifier schemes, one for each source type. All identifiers begin with /us/ (jurisdiction) followed by a code designating the source.

USC Identifiers

USC identifiers come directly from the identifier attribute on USLM XML elements published by the Office of the Law Revision Counsel (OLRC). The format uses path segments with prefixed level codes:

/us/usc/t{title}/s{section}/{subsection}/{paragraph}

For example, /us/usc/t1/s201/a/2 breaks down as:

SegmentMeaning
/usJurisdiction (United States)
/uscCode (United States Code)
/t1Title 1
/s201Section 201
/aSubsection (a)
/2Paragraph (2)

Big levels (title through section) use a prefix letter to indicate their level:

PrefixLevel
tTitle
stSubtitle
chChapter
schSubchapter
artArticle
pPart
spSubpart
dDivision
sdSubdivision
sSection

Small levels (subsection and below) use their value directly without a prefix: /us/usc/t1/s1/a, /us/usc/t1/s1/a/2/A/i.

USC Identifier Examples

IdentifierDescription
/us/usc/t1Title 1
/us/usc/t1/ch1Chapter 1 of Title 1
/us/usc/t26/sch1Subchapter 1 of Title 26
/us/usc/t1/s1Section 1 of Title 1
/us/usc/t1/s1/aSubsection (a) of Section 1 of Title 1
/us/usc/t1/s1/a/2/A/iClause (i) of subparagraph (A) of paragraph (2) of subsection (a)

CFR Identifiers

CFR identifiers are constructed by the eCFR builder from NODE and N attributes in the GPO/SGML XML. The format mirrors the USC scheme:

/us/cfr/t{title}/s{section}

Identifiers use /us/cfr/ (content type), not /us/ecfr/ (data source). Both eCFR and future annual CFR share the same identifier space since they represent the same regulatory content.

CFR Identifier Examples

IdentifierDescription
/us/cfr/t17Title 17
/us/cfr/t17/ch2Chapter II of Title 17 (Roman numeral converted to Arabic)
/us/cfr/t17/pt240Part 240 of Title 17
/us/cfr/t17/s240.10b-5Section 240.10b-5 of Title 17

Note that CFR chapter identifiers convert Roman numerals to Arabic numbers (e.g., Chapter II becomes ch2).

CFR Identifier Construction

The eCFR builder constructs identifiers from XML attributes during parsing:

Source XMLConstructed Identifier
DIV1 N="17" TYPE="TITLE"/us/cfr/t17
DIV3 N="II" TYPE="CHAPTER"/us/cfr/t17/ch2
DIV5 N="240" TYPE="PART"/us/cfr/t17/pt240
DIV8 N="§ 240.10b-5" TYPE="SECTION"/us/cfr/t17/s240.10b-5

The § prefix in section N values (present in govinfo bulk XML but not ecfr.gov API XML) is stripped during identifier construction.

FR Identifiers

FR identifiers use document numbers from the FederalRegister.gov API. Document numbers are unique, stable, and serve as the API’s primary key:

/us/fr/{document_number}

FR Identifier Examples

IdentifierDescription
/us/fr/2026-06029FR document 2026-06029
/us/fr/2026-06086FR document 2026-06086

FR identifiers use document numbers rather than FR citations (e.g., 91 FR 14523) because citations are human-readable but not reliably unique.

Non-Resolvable References

Some identifier schemes found in cross-references within USLM XML cannot be resolved to files or URLs. These are always rendered as plain text:

URI PrefixTreatment
/us/stat/...Plain text citation (Statutes at Large)
/us/pl/...Plain text citation (Public Laws)
/us/act/...Plain text citation (Acts)

When the link resolver encounters one of these prefixes, it returns null, signaling the renderer to output the reference text without a link.

When you convert with --link-style relative, LexBuild resolves cross-references embedded in the source XML into Markdown links. The resolver uses a three-tier priority chain.

Resolution Priority

  1. Exact match — The identifier is found in the link registry. This always works for same-title references and for cross-title references when the target title has already been converted. Returns a relative file path.

  2. Section-level fallback — If the exact identifier is not registered (common for subsection-level references like /us/usc/t1/s1/a/2), the resolver strips the subsection path and tries the parent section identifier (/us/usc/t1/s1). If found, returns a relative path to the section file.

  3. External URL fallback — If neither exact nor section-level lookup succeeds, the resolver generates a fallback URL for USC and FR identifiers, or returns null for all other identifier types.

Fallback URLs

When a cross-reference cannot be resolved within the converted corpus, the resolver produces external URLs for supported schemes:

Identifier SchemeFallback URL Pattern
/us/usc/t{N}/s{N}https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title{N}-section{N}
/us/usc/t{N}https://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title{N}
/us/fr/{doc_number}https://www.federalregister.gov/d/{doc_number}

Unresolved CFR references (/us/cfr/) are rendered as plain text. No automatic ecfr.gov fallback URLs are generated. Statutes at Large (/us/stat/) and Public Law (/us/pl/) references are always rendered as plain text.

The --link-style option controls how cross-references render in the output Markdown:

StyleOutputUse Case
plaintext (default)Reference text only, no linkRAG pipelines where links add noise
relative[text](../chapter-03/section-201.md)Local browsing, documentation sites
canonical[text](https://uscode.house.gov/...)External publication, standalone documents

In relative mode, the resolver computes paths relative to the current file:

Reference TypeExample Link
Same chaptersection-7.md
Cross-chapter../chapter-03/section-201.md
Cross-title../../title-02/chapter-05/section-100.md

Canonical Mode

In canonical mode, all resolvable USC references link to OLRC URLs regardless of whether the target has been converted locally. Non-USC/CFR references render as plain text.

Two-Pass Requirement

Link resolution requires that all section identifiers and output paths are known before rendering begins. Both forward references (section A cites section B, which appears later in the title) and backward references must resolve correctly.

This is why LexBuild uses a collect-then-write pattern:

  1. Parse phase — SAX events fire synchronously, collecting all sections into an array.
  2. Write phase, pass 1 — Compute output paths, detect duplicates, register all identifiers with the link resolver.
  3. Write phase, pass 2 — Render Markdown and write files. All cross-references can now resolve.