Operations Runbook¶
Five Arrange/Act/Assert recipes keep cache tuning, networking, diagnostics, validation, and release automation in one place so you can work through incidents without hopping across multiple files.
Audience: Operators and contributors maintaining production deployments.
Prerequisites: Working CLI or Docker deployment and access to logs.
Time: ~15-30 minutes for a single workflow.
What you'll learn: How to tune caches, manage storage, and validate releases.
Cache and Playwright Storage¶
Arrange
- Running CLI or Docker deployment.
- Optional host directory when you want persistence (default runs remain ephemeral), e.g., $HOME/.article-extractor.
Act
mkdir -p $HOME/.article-extractor
export AE_STORAGE=$HOME/.article-extractor/storage_state.json
ARTICLE_EXTRACTOR_CACHE_SIZE=2000 \
ARTICLE_EXTRACTOR_THREADPOOL_SIZE=8 \
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia \
--output text \
--storage-state "$AE_STORAGE" | head -n 5
docker run --rm -d -p 3001:3000 --name article-extractor-tuned \
-e ARTICLE_EXTRACTOR_CACHE_SIZE=2000 \
-e ARTICLE_EXTRACTOR_THREADPOOL_SIZE=8 \
-e ARTICLE_EXTRACTOR_PREFER_PLAYWRIGHT=true \
-e ARTICLE_EXTRACTOR_STORAGE_STATE_FILE=/data/storage_state.json \
-v $HOME/.article-extractor:/data \
ghcr.io/pankaj28843/article-extractor:latest
curl -s -XPOST http://localhost:3001/ \
-H "Content-Type: application/json" \
-d '{"url":"https://en.wikipedia.org/wiki/Wikipedia"}' | jq '.title, .word_count'
ls -R $HOME/.article-extractor | head
docker stop article-extractor-tuned
Skip
--storage-state,ARTICLE_EXTRACTOR_STORAGE_STATE_FILE, and the--volumeflag when you prefer the default ephemeral Playwright session. See the Docker CLI reference for volume/env semantics.
Assert
- CLI logs show the increased cache size; Docker logs show Playwright fetching with persisted storage.
- POST response returns the same title/word count as the CLI tutorial.
- storage_state.json plus a storage_state.json.changes/processed/ tree exist under the mounted directory.
- Adjust queue thresholds via ARTICLE_EXTRACTOR_STORAGE_QUEUE_* when logs warn about backlog; values are listed in the Reference.
- uv run scripts/debug_docker_deployment.py --disable-storage --skip-build mimics the default ephemeral behavior, while the default harness run continues to verify persistence.
Networking Controls¶
Arrange - Proxy credentials if required. - Playwright extras installed for headed sessions.
Act
export HTTPS_PROXY=https://proxy.example.com:8443
export NO_PROXY=localhost,127.0.0.1
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia \
--output text --prefer-playwright | head -n 4
(uv run uvicorn article_extractor.server:app --host 127.0.0.1 --port 3002 \
> ./tmp/uvicorn.log 2>&1 & echo $! > ./tmp/uvicorn.pid) && sleep 3 && \
curl -s -XPOST http://127.0.0.1:3002/ \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","prefer_playwright":false,"network":{"user_agent":"DocsSample/1.0","random_user_agent":false,"headed":false,"proxy":null,"proxy_bypass":["metadata.internal"],"storage_state":null,"user_interaction_timeout":0}}' \
| jq '{title, word_count, warnings}' && \
kill $(cat ./tmp/uvicorn.pid) && rm ./tmp/uvicorn.pid
Assert
- CLI banner shows Fetched with Playwright (diagnostics) or prints the expected word count even when a proxy is active.
- Server response returns "Example Domain" with word_count: 19 and warnings: null.
- ./tmp/uvicorn.log includes a JSON line logging the resolved proxy, matching the overrides you posted.
- Unset HTTPS_PROXY/NO_PROXY after debugging to avoid surprises.
Diagnostics and Metrics¶
Arrange - CLI or Docker runtime. - Optional StatsD endpoint (defaults to UDP 8125 locally).
Act
ARTICLE_EXTRACTOR_LOG_LEVEL=info \
ARTICLE_EXTRACTOR_LOG_DIAGNOSTICS=1 \
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia --output text 2>&1 | head -n 10
docker run --rm -d -p 3002:3000 --name article-extractor-diag \
-e ARTICLE_EXTRACTOR_LOG_LEVEL=info \
-e ARTICLE_EXTRACTOR_LOG_FORMAT=json \
-e ARTICLE_EXTRACTOR_LOG_DIAGNOSTICS=1 \
ghcr.io/pankaj28843/article-extractor:latest
docker logs --tail 8 article-extractor-diag
docker stop article-extractor-diag
ARTICLE_EXTRACTOR_METRICS_ENABLED=1 \
ARTICLE_EXTRACTOR_METRICS_SINK=statsd \
ARTICLE_EXTRACTOR_METRICS_STATSD_HOST=127.0.0.1 \
ARTICLE_EXTRACTOR_METRICS_STATSD_PORT=8125 \
ARTICLE_EXTRACTOR_METRICS_NAMESPACE=article_extractor \
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia --output text | head -n 3
Assert
- CLI diagnostics show structured JSON with request_id, cache hints, and fetcher selection.
- Docker logs stream the same fields for every HTTP request until you stop the container.
- Enabling StatsD emits UDP packets (use tcpdump -i lo udp port 8125 if you need proof) containing counters like article_extractor.cli_extractions.success.
- Reset ARTICLE_EXTRACTOR_LOG_DIAGNOSTICS=0 and LOG_LEVEL=warning after collection to reduce noise.
Development Workflow and Validation¶
Arrange
- Cloned repo plus uv sync --all-extras.
- Follow .github/instructions/validation.instructions.md to the letter.
Act
uv run ruff format .
uv run ruff check --fix .
PYTHONPATH=src uv run pytest tests/ --cov=src/article_extractor --cov-report=term-missing
uv run article-extractor --help
uv run article-extractor https://en.wikipedia.org/wiki/Wikipedia
uv run mkdocs build --strict
uv run mkdocs build --strict --clean
Add uv run scripts/debug_docker_deployment.py --skip-build --tail-lines 120 whenever Docker or Playwright behavior changes.
Assert - Coverage stays ≥93%. - CLI help exits 0 and prints the latest options. - MkDocs builds succeed without nav/anchor warnings.
Release Automation¶
Arrange
- Access to GitHub Actions plus gh CLI (always pipe output; see .github/instructions/gh-cli.instructions.md).
- Docker installed if you need to run the smoke harness locally.
Act
uv run uvicorn article_extractor.server:app --host 0.0.0.0 --port 3000 &
sleep 2
curl -sf http://localhost:3000/health
kill %1
uv run scripts/debug_docker_deployment.py --skip-build --tail-lines 120
gh run list --workflow docs --limit 1 | head -n 10
gh api repos/:owner/:repo/pages | head -n 20
uv run mkdocs gh-deploy --remote-branch gh-pages
Assert
- Local server health check succeeds before tagging.
- Smoke harness logs Docker validation harness completed successfully and HTTP 200s.
- gh run list shows the latest docs workflow green; gh api repos/:owner/:repo/pages confirms Actions-based publishing.
- mkdocs gh-deploy serves as the documented fallback. Record the fallback in the PR description so the workflow can resume control on the next push.