Show HN: E-commerce data from 100k stores that is refreshed daily

searchagora.com

9 points by astronautmonkey 11 hours ago

Hi HN! I'm building Agora, an AI search engine for e-commerce that returns results in under 300ms. We've indexed 30M products from 100k stores and made them easy to purchase using AI agents.

After launching here on HN, a large enterprise reached out to pay for access to the raw data. We serviced the contract manually to learn the exact workflow and then decided to productize the "Data Connector" to help us scale to more customers.

The Data Connector enables developers to select any of our 100k stores in the index, view sample data, format the output, and export the up-to-date data. Data can be exported as CSV or JSON.

We've built crawlers for Shopify, WooCommerce, Squarespace, Wix, and custom built stores to index the store information, product data, stock, reviews, and more. The primary technical challenge is to recrawl the entire dataset every 24 hours. We do this with a series of servers that "recrawl" different store-types with rotating local proxies and then add changes to a queue to be updated in our search index. Our primary database is Mongo and our search runs on self-hosted Meilisearch on high RAM servers.

My vision is to index the world's e-commerce data. I believe this will create market efficiencies for customers, developers, and merchants.

I'd love your feedback!

eastbayjake 4 hours ago

Couple thoughts for you:

(1) What are the use cases you envision? I can see the value for a really large marketplace in having a ton of pricing data, or the value to a hedge fund etc in having raw data to analyze macro trends... what is the use case for someone paying $200/month for the developer tier? (If I'm a retailer myself I probably only need data on my direct competitors, unless there's something cool you're imagining that I've failed to see.)

(2) You've got some logos on the store splash that don't show up in store search (eg Nike). Is that a data error or a coding error?

(3) You should probably think about how you scrape and categorize marketplace data... the Walmart tab has a lot of products that are clearly third-party sellers transacting via walmart.com, which pollutes quite a bit of the data value if I primarily want to know what a big retailer is doing on products where they actually set the prices.

(4) Have you looked at grocery data? Have wished someone would build a grocery prices API for like a decade now... lots of cool consumer and hedge-fund monetization opportunities if you can show the price of strawberries in every store across the US (and graph the trendlines over time).

  • astronautmonkey 3 hours ago

    Thanks for checking it out!

    1. Here are the use-cases we've seen so far: marketplaces, search apps, fashion try-on apps, shopping agents, general purpose agents, web search for LLMs, e-commerce aggregators, hedge funds, etc. The most surprising has been new discovery experiences. Here's an example of an app that uses our data: https://www.forbes.com/sites/charliefink/2025/06/04/glance-a...

    2. Great catch. We need to make this more clear on the site but we provide ~100k stores out of the box but keep the bigger brands behind an Enterprise paywall. We're working on fixing this.

    3. Absolutely. We have purposely separated out search on the home page between our core index vs searching on Amazon, Walmart, etc. from within Agora. We haven't indexed products from the major marketplaces yet because of this challenge. Generally, we also focus on direct sellers and have filters in place with our crawler to parse out resellers.

    4. Haven't looked at this but sounds interesting. And similar to how we think about storing e-commerce data with price history over time.

    I'd love to chat more. I'm at param [at] searchagora.com if you want to reach out.

amcunicorns 11 hours ago

Nice idea! Sounds like a lot of servers are needed to pull this off.

  • astronautmonkey 11 hours ago

    Thank you! And yes, the number of servers needed to scale from 100k to 1M stores (the next goal) will be significant.