We have developed DocStrange to create LLM-ready data from images and PDFs. We have open-sourced a 3B finetuned model also. You can try both the open-sourced and private models from the demo.
This model is an improvement over our last open-source model. We have fixed some of the issues that the community faced and some of the features that were requested (handwritten, multi-lingual).
The models are trained on 3 million documents, including handwritten documents, financial reports, complex tables, documents with watermarks, and stamps. Feel free to try it and share feedback.
We have developed DocStrange to create LLM-ready data from images and PDFs. We have open-sourced a 3B finetuned model also. You can try both the open-sourced and private models from the demo.
HF: https://huggingface.co/nanonets/Nanonets-OCR2-3B Demo: https://docstrange.nanonets.com/ Blog: https://nanonets.com/research/nanonets-ocr-2/
This model is an improvement over our last open-source model. We have fixed some of the issues that the community faced and some of the features that were requested (handwritten, multi-lingual).
The models are trained on 3 million documents, including handwritten documents, financial reports, complex tables, documents with watermarks, and stamps. Feel free to try it and share feedback.
Do you guys provide api support also? I am processing documents for a project
Yeah, we do have api support. Currently, you can process 10k documents per month free. Let me know if you face any issues.
Is it better than gemini pro or flash? do you have any benchmarking data? I want to use it for markdown from scanned pdfs.
We have evaluated against Gemini-2.5-flash. You can check the benchmarks here https://nanonets.com/research/nanonets-ocr-2/#markdown-evalu...
[dead]