Curlie web directory download – 2.9M editor approved websites for your AI

KnowledgeWeaver an hour ago

I initiated the directory download, because we had a huge trove of data about the entire internet that could until now only be accessed by thousands of manual clicks.

Now, the global community behind the Curlie.org internet directory offers the entire directory for download (open source, CC-BY).

The directory contains editorial descriptions of 2.9 million hand-picked websites, sorted into categories.

Vying to start your own search engine or LLM, but lacking the resources to find the quality training data? Just use the directory data, and crawl the included websites.

In fact, our scientific partner OpenWebIndex.eu already integrated the directory data into its open search index, so even the crawling part is already accomplished for you.

This is the contribution of the Curlie community to a knowledge infrastructure that is distributed, open source and under the users' control!

(Nitty gritty technical details in the download itself, or ask here.)