The state of "super fast inference" is frustrating
I am talking about the 3 providers I know of which claim super fast inference: Groq, Cerebras and Sambanova. Every one of those claim extremely fast multi hundred tokens per second inference speeds on reasonably large models. Every one of those also have a chat demo on their website which seems to confirm their proposed numbers
However, for many months now, each of those providers have literally the same API page where only the Free option with low rates is available. Everything else is "Coming Soon". No updates, no dates, no estimates, nothing.
Come to think of it, there is not a single good inference provider in the whole open source models space that offers a paid API without throttle in over 50 tps consistently. There's money to be made here and surprisingly nobody is doing it aggressively