Live|Open Source
md.succ.ai

url to markdown

Convert any webpage or document to clean Markdown. HTML, PDF, DOCX, XLSX, CSV, YouTube transcripts. MCP server for AI agents. TLS fingerprint impersonation. Built for RAG pipelines.

md.succ.ai/
MCP Server
200-500ms
9-pass extraction
PDF / DOCX / YouTube

Why md.succ.ai

Clean content, not page cruft

MCP server for AI agents. 9 extraction passes. TLS fingerprint impersonation. Ad blocking. YouTube transcripts. Citation links. Structured extraction.

MCP Server
Native Model Context Protocol support. AI agents connect directly via Streamable HTTP at /mcp. Works with Claude Desktop, Claude Code, and any MCP client.
9-Pass Extraction
Readability, Defuddle, article-extractor, CSS selectors, Schema.org, Open Graph, text density — 9 extraction passes with quality ratio checks.
TLS Fingerprint Impersonation
Chrome TLS cipher ordering, full browser header profiles (Sec-Ch-Ua, Sec-Fetch-*). 10 rotating profiles across Chrome, Firefox, Edge, Safari.
YouTube Transcripts
YouTube URLs are automatically detected and transcripts extracted with timestamps. No browser needed — direct innertube API.
Ad & Tracker Blocking
Browser tier blocks ~35 ad/tracker domains and 7 resource types (images, fonts, media). Faster page loads, cleaner content extraction.
Citation Links
?links=citations converts inline links to numbered references with a footer. Saves tokens on repeated URLs. Academic-style output for LLMs.
fit_markdown Mode
?mode=fit prunes boilerplate sections — navigation, footers, low-value content. Smaller context for LLMs without losing signal.
Schema Extraction
POST /extract with a JSON schema. Returns structured data extracted by LLM. Any page, any schema — validated with Ajv.
Security Hardened
SSRF protection with DNS validation, private IP blocking, redirect validation. Prompt injection hardening. Schema field whitelist.
Batch Conversion
POST /batch with up to 50 URLs. Parallel processing with 10-worker concurrency. Per-URL errors and timeout protection.
Async + Webhooks
POST /async for background jobs. Poll /job/:id or receive results via HTTPS webhook callback. SSRF-protected callbacks.
RSS/Atom Feeds
Feed URLs auto-detected by content-type. Parsed into structured JSON with titles, links, dates, and descriptions.

Architecture

Multi-tier conversion pipeline

9-pass extraction, quality scoring, automatic fallback through 3 tiers. YouTube transcripts, document formats, citation links, and fit_markdown.

conversion pipeline
URL
 │
 ├─ YouTube? → Transcript extraction
 │
 ├─ Document? (PDF, DOCX, XLSX, CSV)
 │  └─ Document converter → Markdown
 │
 ├─ TLS fingerprint impersonation (Chrome ciphers)
 │
 ├─ Tier 1: 9-pass extraction (200-500ms)
 │  1. Readability (standard)
 │  2. Defuddle (Obsidian team)
 │  3. Article Extractor (alt heuristics)
 │  4. Readability on cleaned HTML
 │  5. CSS content selectors
 │  6. Schema.org / JSON-LD
 │  7. Open Graph / meta tags
 │  8. Text density analysis
 │  9. Cleaned body fallback
 │  Quality ratio check (< 15% → skip)
 │
 │  Quality ≥ B? → return Markdown
 │
 ├─ Tier 2: Camoufox browser (3-15s)
 │  ├─ Ad/tracker blocking (~35 domains)
 │  └─ Same 9-pass on rendered DOM
 │
 └─ Tier 2.5: LLM extraction
    └─ Llama 3.3 70B via nano-gpt

Post-processing:
 ├─ ?links=citations → numbered references
 ├─ ?mode=fit         → prune boilerplate
 └─ ?max_tokens=N    → truncate output

Response Headers

x-markdown-tokensToken count
x-conversion-tierfetch | browser | llm | youtube | document:*
x-extraction-methodreadability | defuddle | pdf | ...
x-quality-score0-1
x-quality-gradeA-F
x-cachehit | miss

Endpoints

GET/{url}
GET/?url={url}
POST/extract
POST/batch
POST/async
GET/job/:id
GET/health
GET/openapi.json
POST/mcp
GET/docs

Supported Formats

HTML9-pass + Turndown
PDFunpdf text extraction
DOCXmammoth → Turndown
XLSX / CSVSheetJS → tables
YouTubetranscript + timestamps
RSS / Atomrss-parser → JSON

API

One URL, clean Markdown

No API key. No SDK. Just a URL. Works from curl, fetch, or any HTTP client.

01

Fetch any URL

Prepend md.succ.ai/ to any URL. YouTube links return transcripts automatically. Add query params for citation links or fit mode.

terminal
# Basic conversion
curl https://md.succ.ai/https://example.com

# YouTube transcript
curl https://md.succ.ai/https://youtube.com/watch?v=dQw4w9WgXcQ

# Citation-style links
curl "https://md.succ.ai/?url=https://en.wikipedia.org/wiki/Markdown&links=citations"

# LLM-optimized (pruned boilerplate)
curl "https://md.succ.ai/?url=https://htmx.org/docs/&mode=fit"
02

Get clean Markdown

9-pass extraction finds the content. Turndown converts to Markdown. Navigation, ads, and sidebars are stripped. Quality scored A-F.

response (text/markdown)
Title: Example Domain
URL Source: https://example.com
Description: This domain is for use in...

Markdown Content:
# Example Domain

This domain is for use in documentation
examples without needing permission.

[Learn more](https://iana.org/domains/example)
03

Extract structured data

POST a JSON schema to /extract. The LLM extracts matching fields from any page. Validated with Ajv.

terminal
curl -X POST https://md.succ.ai/extract \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "heading": { "type": "string" }
      }
    }
  }'

# → { "title": "Example Domain", "heading": "Example Domain" }
04

Batch & async

Convert up to 50 URLs in one request with POST /batch. Or use POST /async for background processing with optional webhook delivery.

terminal
# Batch: convert multiple URLs at once
curl -X POST https://md.succ.ai/batch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com", "https://htmx.org"]}'

# Async with webhook callback
curl -X POST https://md.succ.ai/async \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "callback_url": "https://your-api.com/hook"
  }'
05

Connect via MCP

AI agents connect directly via Model Context Protocol. Streamable HTTP transport at /mcp. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

mcp config
# Claude Code (.mcp.json)
{
  "mcpServers": {
    "md-succ-ai": {
      "type": "url",
      "url": "https://md.succ.ai/mcp"
    }
  }
}

# Claude Desktop (claude_desktop_config.json)
{
  "mcpServers": {
    "md-succ-ai": {
      "command": "npx",
      "args": ["mcp-remote@latest", "https://md.succ.ai/mcp"]
    }
  }
}

Self-Hosting

Run your own instance

Docker image with Camoufox included. One command to deploy. No external dependencies, no API keys, no accounts.

01

Docker (recommended)

Clone the repo and start the container. Camoufox is bundled in the image. API at localhost:3100, MCP at localhost:3300.

terminal
git clone https://github.com/vinaes/md-succ-ai.git
cd md-succ-ai
docker compose up -d
02

Local (no Docker)

Install dependencies and start the server. Requires Node.js 20+ and Camoufox binaries.

terminal
npm install
npx camoufox-js fetch
npm start

Environment Variables

PORTServer port (default: 3000)
ENABLE_BROWSERCamoufox browser fallback (default: true)
NANOGPT_API_KEYLLM tier API key (optional)
NANOGPT_MODELLLM model (default: llama-3.3-70b-instruct)
REDIS_URLRedis connection (default: redis://redis:6379)
REDIS_PASSWORDRedis auth (required for caching + async)
BROWSER_BLOCK_RESOURCESAd/tracker blocking (default: true)
PROXY_URLSProxy rotation URLs (optional, comma-separated)