url to markdown
Convert any webpage or document to clean Markdown. HTML, PDF, DOCX, XLSX, CSV, YouTube transcripts. MCP server for AI agents. TLS fingerprint impersonation. Built for RAG pipelines.
Why md.succ.ai
Clean content, not page cruft
MCP server for AI agents. 9 extraction passes. TLS fingerprint impersonation. Ad blocking. YouTube transcripts. Citation links. Structured extraction.
Architecture
Multi-tier conversion pipeline
9-pass extraction, quality scoring, automatic fallback through 3 tiers. YouTube transcripts, document formats, citation links, and fit_markdown.
URL
│
├─ YouTube? → Transcript extraction
│
├─ Document? (PDF, DOCX, XLSX, CSV)
│ └─ Document converter → Markdown
│
├─ TLS fingerprint impersonation (Chrome ciphers)
│
├─ Tier 1: 9-pass extraction (200-500ms)
│ 1. Readability (standard)
│ 2. Defuddle (Obsidian team)
│ 3. Article Extractor (alt heuristics)
│ 4. Readability on cleaned HTML
│ 5. CSS content selectors
│ 6. Schema.org / JSON-LD
│ 7. Open Graph / meta tags
│ 8. Text density analysis
│ 9. Cleaned body fallback
│ Quality ratio check (< 15% → skip)
│
│ Quality ≥ B? → return Markdown
│
├─ Tier 2: Camoufox browser (3-15s)
│ ├─ Ad/tracker blocking (~35 domains)
│ └─ Same 9-pass on rendered DOM
│
└─ Tier 2.5: LLM extraction
└─ Llama 3.3 70B via nano-gpt
Post-processing:
├─ ?links=citations → numbered references
├─ ?mode=fit → prune boilerplate
└─ ?max_tokens=N → truncate outputResponse Headers
Endpoints
Supported Formats
API
One URL, clean Markdown
No API key. No SDK. Just a URL. Works from curl, fetch, or any HTTP client.
Fetch any URL
Prepend md.succ.ai/ to any URL. YouTube links return transcripts automatically. Add query params for citation links or fit mode.
# Basic conversion
curl https://md.succ.ai/https://example.com
# YouTube transcript
curl https://md.succ.ai/https://youtube.com/watch?v=dQw4w9WgXcQ
# Citation-style links
curl "https://md.succ.ai/?url=https://en.wikipedia.org/wiki/Markdown&links=citations"
# LLM-optimized (pruned boilerplate)
curl "https://md.succ.ai/?url=https://htmx.org/docs/&mode=fit"Get clean Markdown
9-pass extraction finds the content. Turndown converts to Markdown. Navigation, ads, and sidebars are stripped. Quality scored A-F.
Title: Example Domain
URL Source: https://example.com
Description: This domain is for use in...
Markdown Content:
# Example Domain
This domain is for use in documentation
examples without needing permission.
[Learn more](https://iana.org/domains/example)Extract structured data
POST a JSON schema to /extract. The LLM extracts matching fields from any page. Validated with Ajv.
curl -X POST https://md.succ.ai/extract \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"heading": { "type": "string" }
}
}
}'
# → { "title": "Example Domain", "heading": "Example Domain" }Batch & async
Convert up to 50 URLs in one request with POST /batch. Or use POST /async for background processing with optional webhook delivery.
# Batch: convert multiple URLs at once
curl -X POST https://md.succ.ai/batch \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com", "https://htmx.org"]}'
# Async with webhook callback
curl -X POST https://md.succ.ai/async \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"callback_url": "https://your-api.com/hook"
}'Connect via MCP
AI agents connect directly via Model Context Protocol. Streamable HTTP transport at /mcp. Works with Claude Desktop, Claude Code, and any MCP-compatible client.
# Claude Code (.mcp.json)
{
"mcpServers": {
"md-succ-ai": {
"type": "url",
"url": "https://md.succ.ai/mcp"
}
}
}
# Claude Desktop (claude_desktop_config.json)
{
"mcpServers": {
"md-succ-ai": {
"command": "npx",
"args": ["mcp-remote@latest", "https://md.succ.ai/mcp"]
}
}
}Self-Hosting
Run your own instance
Docker image with Camoufox included. One command to deploy. No external dependencies, no API keys, no accounts.
Docker (recommended)
Clone the repo and start the container. Camoufox is bundled in the image. API at localhost:3100, MCP at localhost:3300.
git clone https://github.com/vinaes/md-succ-ai.git
cd md-succ-ai
docker compose up -dLocal (no Docker)
Install dependencies and start the server. Requires Node.js 20+ and Camoufox binaries.
npm install
npx camoufox-js fetch
npm start