Show HN: Local_faiss_MCP – A tiny MCP server for local RAG (FAISS and MiniLM)
I built this because I got frustrated with the current state of "local" RAG. It felt like I had to spin up a Docker container, configure a vector DB, and manage an ingestion pipeline just to let Claude ask questions about a few PDFs in a folder.
We seem to have turned "grep with semantics" into a microservices architecture problem.
What this is: local_faiss_mcp is a minimal implementation of the Model Context Protocol (MCP) that wraps FAISS and sentence-transformers. It runs entirely locally (no API keys, no external services) and connects to Claude Desktop via stdio.
How it works:
You run server.py (Claude runs this automatically via config).
It uses all-MiniLM-L6-v2 (on CPU) to embed text.
It stores the vectors in a flat FAISS index on disk alongside a JSON metadata file.
It exposes two tools to the LLM: ingest_document and query_rag_store.
The stack:
Python
mcp (Python SDK)
faiss-cpu
sentence-transformers
It’s intended for personal workflows (notes, logs, specs) where you want persistent memory for an agent without the infrastructure overhead.
Repo: https://github.com/nonatofabio/local_faiss_mcp
I’d love feedback on the implementation—specifically if anyone has ideas on better handling the chunking logic without bloating the dependencies, or if you run into performance issues with larger indices (10k+ vectors).