How do I convert a URL to Markdown?

Prepend md.succ.ai/ to any URL. For example: curl https://md.succ.ai/https://example.com. No API key, no authentication needed.

Does md.succ.ai work with single-page applications (SPAs)?

Yes. md.succ.ai uses a multi-tier pipeline: fast HTTP fetch with 9-pass extraction for static pages (200-500ms), with automatic Camoufox headless Firefox fallback for JavaScript-heavy and SPA sites (3-15s), and LLM extraction as a last resort.

How is md.succ.ai different from other HTML to Markdown converters?

md.succ.ai uses a 9-pass extraction pipeline (Readability, Defuddle, article-extractor, and more) to extract only the article content. It also offers YouTube transcript extraction, citation-style links, fit_markdown mode, and structured data extraction via JSON schema.

Can I self-host md.succ.ai?

Yes. Clone the GitHub repo and run docker compose up -d. The Docker image includes Camoufox. No external dependencies, no API keys, no accounts needed.

Does md.succ.ai count tokens?

Yes. Every response includes a cl100k_base token count in the x-markdown-tokens response header. JSON responses include a tokens field.

Live|Open Source

md.succ.ai

url to markdown

Convert any webpage or document to clean Markdown. HTML, PDF, DOCX, XLSX, CSV, YouTube transcripts. MCP server for AI agents. TLS fingerprint impersonation. Built for RAG pipelines.

API Docs View Source

MCP Server

200-500ms

9-pass extraction

PDF / DOCX / YouTube

Why md.succ.ai

Clean content, not page cruft

MCP server for AI agents. 9 extraction passes. TLS fingerprint impersonation. Ad blocking. YouTube transcripts. Citation links. Structured extraction.

MCP Server

Native Model Context Protocol support. AI agents connect directly via Streamable HTTP at /mcp. Works with Claude Desktop, Claude Code, and any MCP client.

9-Pass Extraction

Readability, Defuddle, article-extractor, CSS selectors, Schema.org, Open Graph, text density — 9 extraction passes with quality ratio checks.

TLS Fingerprint Impersonation

Chrome TLS cipher ordering, full browser header profiles (Sec-Ch-Ua, Sec-Fetch-*). 10 rotating profiles across Chrome, Firefox, Edge, Safari.

YouTube Transcripts

YouTube URLs are automatically detected and transcripts extracted with timestamps. No browser needed — direct innertube API.

Ad & Tracker Blocking

Browser tier blocks ~35 ad/tracker domains and 7 resource types (images, fonts, media). Faster page loads, cleaner content extraction.

Citation Links

?links=citations converts inline links to numbered references with a footer. Saves tokens on repeated URLs. Academic-style output for LLMs.

fit_markdown Mode

?mode=fit prunes boilerplate sections — navigation, footers, low-value content. Smaller context for LLMs without losing signal.

Schema Extraction

POST /extract with a JSON schema. Returns structured data extracted by LLM. Any page, any schema — validated with Ajv.

Security Hardened

SSRF protection with DNS validation, private IP blocking, redirect validation. Prompt injection hardening. Schema field whitelist.

Batch Conversion

POST /batch with up to 50 URLs. Parallel processing with 10-worker concurrency. Per-URL errors and timeout protection.

Async + Webhooks

POST /async for background jobs. Poll /job/:id or receive results via HTTPS webhook callback. SSRF-protected callbacks.

RSS/Atom Feeds

Feed URLs auto-detected by content-type. Parsed into structured JSON with titles, links, dates, and descriptions.

Architecture

Multi-tier conversion pipeline

9-pass extraction, quality scoring, automatic fallback through 3 tiers. YouTube transcripts, document formats, citation links, and fit_markdown.

conversion pipeline

URL
 │
 ├─ YouTube? → Transcript extraction
 │
 ├─ Document? (PDF, DOCX, XLSX, CSV)
 │  └─ Document converter → Markdown
 │
 ├─ TLS fingerprint impersonation (Chrome ciphers)
 │
 ├─ Tier 1: 9-pass extraction (200-500ms)
 │  1. Readability (standard)
 │  2. Defuddle (Obsidian team)
 │  3. Article Extractor (alt heuristics)
 │  4. Readability on cleaned HTML
 │  5. CSS content selectors
 │  6. Schema.org / JSON-LD
 │  7. Open Graph / meta tags
 │  8. Text density analysis
 │  9. Cleaned body fallback
 │  Quality ratio check (< 15% → skip)
 │
 │  Quality ≥ B? → return Markdown
 │
 ├─ Tier 2: Camoufox browser (3-15s)
 │  ├─ Ad/tracker blocking (~35 domains)
 │  └─ Same 9-pass on rendered DOM
 │
 └─ Tier 2.5: LLM extraction
    └─ Llama 3.3 70B via nano-gpt

Post-processing:
 ├─ ?links=citations → numbered references
 ├─ ?mode=fit         → prune boilerplate
 └─ ?max_tokens=N    → truncate output

Response Headers

x-markdown-tokensToken count

x-conversion-tierfetch | browser | llm | youtube | document:*

x-extraction-methodreadability | defuddle | pdf | ...

x-quality-score0-1

x-quality-gradeA-F

x-cachehit | miss

Endpoints

GET/{url}

GET/?url={url}

POST/extract

POST/batch

POST/async

GET/job/:id

GET/health

GET/openapi.json

POST/mcp

GET/docs

Supported Formats

HTML9-pass + Turndown

PDFunpdf text extraction

DOCXmammoth → Turndown

XLSX / CSVSheetJS → tables

YouTubetranscript + timestamps

RSS / Atomrss-parser → JSON

API

One URL, clean Markdown

No API key. No SDK. Just a URL. Works from curl, fetch, or any HTTP client.

Fetch any URL

Prepend md.succ.ai/ to any URL. YouTube links return transcripts automatically. Add query params for citation links or fit mode.

terminal

# Basic conversion
curl https://md.succ.ai/https://example.com

# YouTube transcript
curl https://md.succ.ai/https://youtube.com/watch?v=dQw4w9WgXcQ

# Citation-style links
curl "https://md.succ.ai/?url=https://en.wikipedia.org/wiki/Markdown&links=citations"

# LLM-optimized (pruned boilerplate)
curl "https://md.succ.ai/?url=https://htmx.org/docs/&mode=fit"

Get clean Markdown

9-pass extraction finds the content. Turndown converts to Markdown. Navigation, ads, and sidebars are stripped. Quality scored A-F.

response (text/markdown)

Title: Example Domain
URL Source: https://example.com
Description: This domain is for use in...

Markdown Content:
# Example Domain

This domain is for use in documentation
examples without needing permission.

[Learn more](https://iana.org/domains/example)

Extract structured data

POST a JSON schema to /extract. The LLM extracts matching fields from any page. Validated with Ajv.

terminal

curl -X POST https://md.succ.ai/extract \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "heading": { "type": "string" }
      }
    }
  }'

# → { "title": "Example Domain", "heading": "Example Domain" }

Batch & async

Convert up to 50 URLs in one request with POST /batch. Or use POST /async for background processing with optional webhook delivery.

terminal

# Batch: convert multiple URLs at once
curl -X POST https://md.succ.ai/batch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com", "https://htmx.org"]}'

# Async with webhook callback
curl -X POST https://md.succ.ai/async \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "callback_url": "https://your-api.com/hook"
  }'

Connect via MCP

AI agents connect directly via Model Context Protocol. Streamable HTTP transport at /mcp. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

mcp config

# Claude Code (.mcp.json)
{
  "mcpServers": {
    "md-succ-ai": {
      "type": "url",
      "url": "https://md.succ.ai/mcp"
    }
  }
}

# Claude Desktop (claude_desktop_config.json)
{
  "mcpServers": {
    "md-succ-ai": {
      "command": "npx",
      "args": ["mcp-remote@latest", "https://md.succ.ai/mcp"]
    }
  }
}

Self-Hosting

Run your own instance

Docker image with Camoufox included. One command to deploy. No external dependencies, no API keys, no accounts.

Docker (recommended)

Clone the repo and start the container. Camoufox is bundled in the image. API at localhost:3100, MCP at localhost:3300.

terminal

git clone https://github.com/vinaes/md-succ-ai.git
cd md-succ-ai
docker compose up -d

Local (no Docker)

Install dependencies and start the server. Requires Node.js 20+ and Camoufox binaries.

terminal

npm install
npx camoufox-js fetch
npm start

Environment Variables

PORTServer port (default: 3000)

ENABLE_BROWSERCamoufox browser fallback (default: true)

NANOGPT_API_KEYLLM tier API key (optional)

NANOGPT_MODELLLM model (default: llama-3.3-70b-instruct)

REDIS_URLRedis connection (default: redis://redis:6379)

REDIS_PASSWORDRedis auth (required for caching + async)

BROWSER_BLOCK_RESOURCESAd/tracker blocking (default: true)

PROXY_URLSProxy rotation URLs (optional, comma-separated)