Show HN: SafeKey – PII redaction for LLM inputs (text, image, audio, video)

safekeylab.com

4 points by safekeylab 4 hours ago

Hey HN, I built SafeKey because I was handling patient data as an Army medic, then doing AI research at Cornell. Every time we tried to use LLMs with sensitive data, something leaked. Existing tools only covered text at ~85% accuracy. Nothing worked across modalities. SafeKey is an AI input firewall. It sits between your app and the model, redacting PII before data leaves your environment. What we built:

PII Guard: 99%+ accuracy across text, images, audio, video AI Guard: Blocks prompt injection and jailbreaks (95%+ F1, zero false positives) Agent Security: Protects autonomous AI workflows RAG Security: Secures retrieval-augmented generation pipelines

Sub-30ms latency. Drop-in SDK for OpenAI, Anthropic, Azure, AWS Bedrock. Runs in your VPC or our cloud.

Would love feedback on the approach. Happy to answer questions.

Thanks, Sukin

tonetegeatinst 3 hours ago

Awesome tool and team you have.

Few questions I though of, and I apologize if they seem stupid as ML is not my focus of study.

1.Has your team ever considered formal verification of code to show how reliable the process you have is?

2. If data has been removed via your pipeline, is it possible to still infer the type of data based on position or format? (Names of people being located in certain places of a sentence, or say the fact the data is formated a certain way could reveal its a date or timestamp?)

3. You mentioned clients can deploy via VPS, does that mean this is a fedramp ready product? (Do you see this tool being offered to public institutions?)

4. Do you have any internship openings for college students in the summer of 2026?

  • safekeylab 3 hours ago

    Thanks! Great questions.

    Formal verification: We've validated through pilot deployments and CS/DevOps teams who've stress-tested the pipeline in production.

    Positional inference: Good catch. We replace PII with type-consistent tokens (e.g., [NAME], [DATE]) so format is preserved for downstream tasks, but the actual value is gone. For higher security, we offer synthetic replacement (fake but realistic values) so position and format don't leak information.

    FedRAMP: Not yet certified, but the architecture supports it — runs inside customer VPC, no data leaves the environment, full audit logging. FedRAMP and StateRAMP are on our compliance roadmap after SOC2 and HIPAA. Yes, public sector is a major target market.

    Internships: Not formally open yet, but email me at sukin@safekeylab.com — always interested in students working on AI security.

vunderba 3 hours ago

The only link on the site to a source repository is a 404 Github repository.

https://github.com/safekeylab

EDIT: Manually searching Github leads to this https://github.com/sukincornell/safekeylab (assuming that is the correct one)

  • safekeylab 3 hours ago

    Thanks for flagging. We're not open-source — the GitHub link shouldn't have been on the site. Removing it now. We offer a private SDK for customers. If you want to test it, you can go to the website and create your account or ping me at sukin@safekeylab.com