Best vector DB benchmark I have seen, solid benchmark design, but would be good if you would have shown who are the competitors in the graphs instead of anonymizing the numbers.
Hi, I'm Jergus, one of the founders of TopK. We cannot share the results publicly but happy to share privately (@jerguslejko on twitter, or jergus@topk.io)
We didn’t include pgvector because we focused on managed services to keep things comparable — TopK is managed/serverless, so the fair match would be a managed Postgres. And pgvector just doesn’t really scale to the kinds of workloads we ran here.
We're actually not allowed to post head to head comparison with competitors and share their names, that's why :) Post contains the dataset, the tool and methodology how the data was collected, which hopefully gives confidence in fairness of the benchmark.
Best vector DB benchmark I have seen, solid benchmark design, but would be good if you would have shown who are the competitors in the graphs instead of anonymizing the numbers.
Hi, I'm Jergus, one of the founders of TopK. We cannot share the results publicly but happy to share privately (@jerguslejko on twitter, or jergus@topk.io)
My friends benchmarked managed vector databases under production-like conditions: high-throughput ingest, concurrent queries, filtering, and read–write mixed workloads.
The post includes the methodology, the dataset, and the open-source tool they published for running the benchmarks.
Is pgvector one of the systems you tested, or was it intentionally left out?
We didn’t include pgvector because we focused on managed services to keep things comparable — TopK is managed/serverless, so the fair match would be a managed Postgres. And pgvector just doesn’t really scale to the kinds of workloads we ran here.
Feels like a sales pitch only due to the abstraction of Provider A,B,C vs actually naming the products. Guess thats what you get for a vendor blog.
Hey, author of the post here.
We're actually not allowed to post head to head comparison with competitors and share their names, that's why :) Post contains the dataset, the tool and methodology how the data was collected, which hopefully gives confidence in fairness of the benchmark.