Bulk Download

This guide walks you through downloading and converting the full corpus of U.S. legal content. You can process all three sources or pick the ones you need.

Prerequisites

Node.js 22 or later
@lexbuild/cli installed globally or available via npx
Sufficient disk space (see requirements below)

npm install -g @lexbuild/cli

Disk Requirements

Source	XML Download	Converted Markdown (section)	Converted Markdown (title)
U.S. Code	~2 GB	~1.5 GB (~60k files)	~500 MB (54 files)
eCFR	~1.5 GB	~2 GB (~200k files)	~800 MB (50 files)
Federal Register	Varies by date range	Varies	N/A (one file per document)

XML downloads are stored in ./downloads/ and can be deleted after conversion. Converted Markdown goes to ./output/ by default.

Full Corpus Download

U.S. Code

Download all 54 titles from the OLRC, then convert to Markdown:

lexbuild download-usc --all
lexbuild convert-usc --all

Download takes 2-5 minutes depending on your connection. Conversion takes roughly 2 minutes for all 54 titles.

eCFR

Download all 50 titles from the eCFR API, then convert:

lexbuild download-ecfr --all
lexbuild convert-ecfr --all

Download takes 3-8 minutes. Conversion takes roughly 3 minutes for all 50 titles.

Federal Register

FR downloads are date-range based. To get the full historical archive from 2000 onward:

lexbuild download-fr --from 2000-01-01
lexbuild convert-fr --all

This downloads both XML and JSON metadata for every FR document since 2000 using the default fr-api source. The converter automatically uses the JSON sidecar to populate rich frontmatter fields (agencies, CFR references, docket IDs, citations, etc.), so no separate enrichment step is needed.

[!NOTE] The enrich-fr command is only needed when using --source govinfo for historical backfill. The govinfo source provides XML only without JSON metadata. See Federal Register CLI docs for details.

Everything at Once

# Download all sources
lexbuild download-usc --all
lexbuild download-ecfr --all
lexbuild download-fr --from 2000-01-01

# Convert all sources
lexbuild convert-usc --all
lexbuild convert-ecfr --all
lexbuild convert-fr --all

Single Title Processing

You do not need to download the entire corpus. Process individual titles:

# Download and convert USC Title 17 (Copyrights)
lexbuild download-usc --title 17
lexbuild convert-usc --title 17

# Download and convert eCFR Title 40 (Environmental Protection)
lexbuild download-ecfr --title 40
lexbuild convert-ecfr --title 40

Preview with Dry Run

Before running a large conversion, use --dry-run to preview the file count without writing anything:

lexbuild convert-usc --all --dry-run

This scans the XML and reports how many files would be generated, without writing to disk.

Incremental Updates

You do not need to re-download and re-convert the entire corpus to stay current.

USC Release Points

The OLRC publishes new release points after each session of Congress. Check what is available:

lexbuild list-release-points

Then download and convert only the titles that changed.

eCFR Point-in-Time

eCFR is updated daily. Use --date to download a specific point-in-time snapshot:

lexbuild download-ecfr --all --date 2026-04-01
lexbuild convert-ecfr --all

Without --date, you get the most current version.

FR Rolling Updates

Use --recent to download only the most recent documents:

# Download FR documents from the last 30 days (XML + JSON from FR API)
lexbuild download-fr --recent 30

# Convert only the new date range (not --all, which reconverts everything)
lexbuild convert-fr --from 2026-03-01

Update Scripts

A single orchestrator handles change detection, download, convert, and deploy across every source. Default is incremental from each source’s last checkpoint:

# All sources, incremental from each source's checkpoint
./scripts/update.sh

# Restrict to specific sources
./scripts/update.sh --source fr
./scripts/update.sh --source ecfr,fr

# Source-scoping
./scripts/update.sh --source ecfr --titles 1,17    # eCFR titles 1, 17 only
./scripts/update.sh --source fr --days 7           # FR last 7 days

# Force a full redownload + reconvert
./scripts/update.sh --source usc --force
./scripts/update.sh --force --from 2026-01-01      # All sources (FR force requires --from)

# Local only (no VPS deploy, no search reindex)
./scripts/update.sh --skip-deploy

# Preview without running
./scripts/update.sh --dry-run

# Verbose convert output
./scripts/update.sh -v

Checkpoints live in downloads/<source>/:

eCFR (.ecfr-titles-state.json) snapshots each title’s latestAmendedOn date. The script compares against the live eCFR API to detect which titles have new amendments.
USC (.usc-release-point) stores the latest OLRC release point ID; the pipeline runs only when the API returns a newer one.
FR (.fr-state.json) stores lastRun and lastDate. Default invocations use lastDate as the --from argument and update lastDate to today after a successful run.

If a checkpoint is missing, eCFR/USC bootstrap into a full first-run automatically. FR has no inherent “all” (decades of documents), so a missing checkpoint requires explicit --from YYYY-MM-DD or --days N.

All converters use writeFileIfChanged() internally, so unchanged sections keep their original file timestamps. Downstream tools (Shiki highlighting, Meilisearch indexing) automatically skip reprocessing unchanged content.

Output Granularity

The --granularity (or -g) flag controls how much content goes into each file. You can convert the same source at different granularity levels to different output directories:

Section Granularity (Default)

One Markdown file per legal section. Best for search indexing, RAG pipelines, and fine-grained retrieval.

lexbuild convert-usc --all -g section -o ./output/section

Produces ~60,000 files for USC, ~200,000 for eCFR. Each file is typically 1-50 KB.

Chapter / Part Granularity

One file per chapter (USC) or part (eCFR). Sections are inlined under their parent headings.

lexbuild convert-usc --all -g chapter -o ./output/chapter
lexbuild convert-ecfr --all -g part -o ./output/part

Produces 2,000-5,000 files. Each file is typically 50-500 KB.

Title Granularity

One file per title. The entire title hierarchy is rendered as nested headings.

lexbuild convert-usc --all -g title -o ./output/title

Produces 54 files for USC, 50 for eCFR. Files can be large (1-100 MB). Title-level files include extra frontmatter fields: chapter_count, section_count, and total_token_estimate.

All Granularities in One Pass

If you need more than one granularity, --granularities emits them from a single parse of the source XML (~40–50% faster than running convert-* N times):

# USC: section + chapter + title from one parse
lexbuild convert-usc --all \
  --granularities section,title,chapter \
  --output ./output \
  --output-title ./output-title \
  --output-chapter ./output-chapter

# eCFR: all four granularities from one parse
lexbuild convert-ecfr --all \
  --granularities section,title,chapter,part \
  --output ./output \
  --output-title ./output-title \
  --output-chapter ./output-chapter \
  --output-part ./output-part

--granularities is mutually exclusive with -g/--granularity.

[!NOTE] The -o flag appends source subdirectories automatically. convert-usc -o /some/path writes to /some/path/usc/, not /some/path/ directly.

API Alternative

If you do not need local files, you can access all content programmatically through the LexBuild API without downloading anything:

# List USC sections in Title 42
curl "https://lexbuild.dev/api/usc/documents?title_number=42&limit=100"

# Get a single section as raw Markdown
curl -H "Accept: text/markdown" \
  "https://lexbuild.dev/api/usc/documents/t42%2Fs1395"

# Get a single eCFR section
curl -H "Accept: text/markdown" \
  "https://lexbuild.dev/api/ecfr/documents/t17%2Fs240.10b-5"

The API supports pagination, filtering, and three response formats (JSON, Markdown, plaintext). See API Overview for authentication details.

Working with the Output

LexBuild output files are standalone Markdown. You can work with them using any text processing tool:

# Search for a term across the corpus
rg "due process" output/ --glob "*.md" -l

# Count total sections per source
find output/usc/sections -name "section-*.md" | wc -l
find output/ecfr/sections -name "section-*.md" | wc -l

# Extract all identifiers from frontmatter
rg "^identifier:" output/usc/ --glob "*.md"

# View a section
cat output/usc/sections/title-17/chapter-01/section-107.md

Each file can be imported into databases, fed to LLMs, indexed in search engines, or processed by any tool that reads text.

Directory Structure

After a full conversion at section granularity, the output directory looks like this:

output/
├── usc/
│   └── sections/
│       ├── title-01/
│       │   ├── _meta.json
│       │   ├── README.md
│       │   ├── chapter-01/
│       │   │   ├── _meta.json
│       │   │   ├── README.md
│       │   │   ├── section-1.md
│       │   │   └── section-2.md
│       │   └── chapter-02/
│       │       └── ...
│       └── title-02/
│           └── ...
├── ecfr/
│   └── sections/
│       ├── title-01/
│       │   └── ...
│       └── ...
└── fr/
    └── documents/
        ├── 2026/
        │   ├── 01/
        │   │   ├── 2026-00001.md
        │   │   └── ...
        │   └── ...
        └── ...

The _meta.json files in each directory provide a machine-readable index of all children, useful for building navigation or listing contents without parsing every Markdown file.

Next Steps

CLI Commands — Full command reference with all flags and options
Output Format — Frontmatter schema, sidecar files, and token estimates
RAG Pipeline Integration — Feed the corpus into AI systems
Legal Research — Cross-reference statutes, regulations, and FR documents