Tutorials¶

Each tutorial embeds the exact commands and trimmed output captured on 2026-01-03. Swap URLs as needed, but keep the verification steps so you know when to trust the scorer.

Audience: Developers running the CLI, Docker service, or Python API for the first time.
Prerequisites: Python 3.12+, outbound HTTPS, and uv on PATH (Docker optional).
Time: ~10-20 minutes depending on the path you pick.
What you'll learn: How to run the CLI, start the server, and validate results.

CLI Fast Path¶

Arrange - Python 3.12+, outbound HTTPS, and uv on PATH. - Writable ./tmp/ directory for scratch files.

Act

uv pip install article-extractor --upgrade
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia --output markdown > ./tmp/article-extractor-cli.md
head -n 12 ./tmp/article-extractor-cli.md

Sample output:

Resolved 9 packages in 425ms
Installed 1 package in 5ms
Title: Wikipedia - Wikipedia
Author: Unknown
Words: 33414
[<img alt="Page extended-confirmed-protected" ...]

Assert - Install command exits 0 and reports the new version. - CLI banner shows title/word count plus non-empty Markdown written to ./tmp/article-extractor-cli.md. - Use --output json whenever you need structured warnings for pipelines.

Docker Service¶

Arrange - Docker 24+, permission to pull from ghcr.io and bind a host port. - Optional host volume for Playwright storage ($HOME/.article-extractor).

Act

docker run --rm -d -p 3000:3000 --name article-extractor-docs ghcr.io/pankaj28843/article-extractor:latest
curl -sf http://localhost:3000/health
docker logs --tail 6 article-extractor-docs
curl -s -XPOST http://localhost:3000/ \
  -H "Content-Type: application/json" \
  -d '{"url":"https://en.wikipedia.org/wiki/Wikipedia"}' | jq '.title, .word_count'
docker stop article-extractor-docs

Observed output (2026-01-03):

INFO:     Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
"Wikipedia - Wikipedia"
33414

Assert - Container stays in running per docker ps and logs the uvicorn banner. - /health responds with HTTP 200 JSON. - POST returns a title + positive word count before you stop the container. - Default containers stay ephemeral; add -v $HOME/.article-extractor:/data plus -e ARTICLE_EXTRACTOR_STORAGE_STATE_FILE=/data/storage_state.json only when you want to persist cookies.

Python Embedding¶

Arrange - Same environment as the CLI fast path plus outbound HTTPS for live fetches.

Act

uv pip install article-extractor --upgrade
uv run python - <<'PY'
from article_extractor import extract_article
sample_html = """
<html><body><article><h1>Sample Title</h1><p>Docs content.</p></article></body></html>
"""
result = extract_article(sample_html, url="https://example.com/demo")
print("Local title:", result.title)
print("Local words:", result.word_count)
PY

uv run python - <<'PY'
import asyncio
from article_extractor import extract_article_from_url
async def fetch_remote():
    result = await extract_article_from_url("https://en.wikipedia.org/wiki/Wikipedia")
    print("Remote success:", result.success)
    print("Remote words:", result.word_count)
asyncio.run(fetch_remote())
PY

Assert - Inline HTML example prints the provided title and a positive word count. - Async fetch prints Remote success: True and a word count around 33k, matching CLI/Docker runs. - Pass NetworkOptions or FetchPreferences arguments whenever you need proxies, user-agents, or headed Playwright (see Networking Controls).

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search