Why would I care? I write libraries to help humans write code. If an LLM was actually good at writing code, it would engage with the library in the same way as a human. If it can't, that's not a problem with the library. I don't lose anything if an LLM doesn't use a library I've written.
Perhaps you shouldn't care, although I find this to be a short-sighted perspective.
Agents are users now, and agent-friendly docs and libraries will be standard practice for the tools that want to thrive through broader industry adoption.
Similarly, perhaps you wouldn't care if your website wasn't easily parsable by crawlers, but if you'd like your work to appear in search results you might like to include a sitemap or structure the HTML a little.
If coding agents are the new entry point to your library, how sure are you that they’re using it well?
I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.
Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.
So we built a simple app to test how well coding agents interact with libraries:
• Takes your library’s docs
• Automatically extracts usage examples
• Tasks AI agents (like Claude Code) with generating those examples from scratch
• Logs mistakes and analyzes performance
We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.
We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.
Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)
i mean that we wanted an email address to send the results to when they finish.
based on comments here, i do think we should allow users to run the audit first (and provide an email address if they want us to follow up with results later).
In Elixir land, the Ash Framework created a package called usage_rules[0] as an experimental attempt to solve this problem a few months ago. The latest version of the Phoenix Framework (1.8) includes it in their `mix phx.new` generator and in their own hex packages[1]. Library owners would need to add their own usage rules, but it seems to help even for just the core packages Phoenix includes.
What I often do, and completely works, is telling Claude Code to git-clone the library in question. Then I /add-dir it, so I can tell it to understand the code factually and as of the exact version I'm using.
You often don't even need cloning, for instance in the Ruby ecosystem, for the Pundit gem `bundle show pundit` tells you `/Users/vemv/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/pundit-2.3.2` which is a vanilla dir (non-zipped), ready to be added to the Claude context.
It's a neat idea. But if, as we're told, LLMs will get better and better, something like this, in theory, will be increasingly unnecessary.
I feel like most of the problems with AI using a library is how we mix code and implementation. C, C++ got it right (even if by accident) with separating specification from implementation.
Instead of lamenting the design trend of not maintaining this split, for my own code I wrote a utility to extract specifications from my existing code.
> It's a neat idea. But if, as we're told, LLMs will get better and better, something like this, in theory, will be increasingly unnecessary.
I don't think so. I think understanding the context of a project will always produce superior results. I think instead we'll just make it a lot easier to add to the training corpus the LLM pulls from.
Respectfully, I disagree. It is much faster and cheaper to direct an LLM to add a call to a battle-tested library that encapsulates complex logic than it is to design and implement that logic from scratch, even if it’s capable of that.
We’re betting on almost the exact opposite idea: we can make agentic software engineering cheaper and more reliable by making it easy for LLMs to write, find, and integrate libraries and other third party software.
I've done a lot of work recently to make my library more "LLM Friendly", but I'm not willing at this time to sign up to a service which I don't know I'd ever use again just to run a test on your behalf. If you want to run the test on my library then its GitHub can be found here: https://github.com/KaliedaRik/Scrawl-canvas
We’ve been working on this problem off and on for over a year now. Many models bake knowledge of particular tools/libraries/patterns into their weights very well and others quite poorly. In my experience Claude is quite good at integrating the dog.ceo API and noticeably ignorant when it comes to Postgres features, and it knows gcloud commands enough to very confidently and consistently hallucinate arguments.
We’ve baked a solution to this into our product, so if anybody is working on an API/SDK/etc feel free to contact me if your users are running into problems using LLMs to integrate them.
One thing we’ve noticed is that subtle changes to library/api integration prompts’ context can be surprisingly impactful. LLMs do very well with example commands and explicit instructions to consider X, Y, and Z. If you just dump an API reference and information that implicitly suggests that X, Y, and Z might be beneficial, they won’t reliably make the logical leaps you want them to unless you let them iterate or “think” (spend more tokens) more. But you can’t as easily provide an example for everything, and the ones you do will bias the models towards them, so you may need a bit of both.
I made a provisional patent this year, about how exactly I would solve this problem. Imagine hiring a "team of developers" who can learn your library and iterate 24/7, improving things, doing support, even letting the pointy-haired boss turn his ideas into reality in a forked sandbox on the weekend.
For the last 15 years I've been writing against software patents, and producing open source software that cost me about $1M to develop, but in the case of AI, I have started to make an exception. I have also rethought how I am going to do open source vs closed source in my AI business. A few weeks ago I posted on HN asking whether it's a good idea, and no one responded: https://news.ycombinator.com/item?id=44425545
(If anyone wants to work with me on this, hit me up, email is in my profile)
I guess that's why patents are annoying. I have been Mr. Open Source and against intellectual property for most of the past 15 years. But with AI companies rampantly taking everyone's work and repurposing it, and with VC companies not being very eager to invest in open source, I'm taking a different tack with my AI ventures.
My first two companies are radically open source, and no one cared:
Don't worry, we're not looking to get into it with some random other projects. It's mostly to protect our business model against the Big Tech and enterprises.
I think I gave you product feedback on Qbix at some point in the past. I also know several founders who’ve secured funding for open source products and built successful businesses off of them. Open-core is pretty popular out here in the Bay Area.
One thing I’ve learned since staring a company is that early on, your greatest asset is trust in your founder/brand, because it’s the only reason for someone to pay you for something until you get your shit together. I’ve personally had a hard time noticing it in myself sometimes, but I think it’s easy to overlook how outward signaling that might look like distrust (eg making users sign NDAs) damages your own ability to build trust. Since early startups tend to be considered untrustworthy by default it can be really counterproductive. Anyway, I appreciate your non-aggression policy
Would you consider arranging a call to discuss our respective projects? If you're building something along these lines, then I think we might end up joining forces.
I've always preferred collaboration and joining forces building on each other's work, than competition and incompatibility.
I'd actually consider your criticism seriously, if it was anything other than the usual HN "saw the word blockchain, did an immediate TDLR with the word grift" regardless of what was done or built.
If you had anything substantive to back up what you're saying, we could discuss it, but since you don't... well, I'm actually disappointed but w/e.
The skip-to-the-end answer: Context7 MCP is so good it seems like magic, even to many well-informed, highly capable hackers. Simply wildly good for libraries and SDKs. All it takes to start using it is to add the MCP provider to your agent config and save your arms, "Use Context7 for this".
I'm confused a bit by this. For instance, Gemini was struggling to write proper Java code for using Firebase Admin SDK. It would write Java code using methods that only exist in the JavaScript SDK. And when I would correct it, it would give other options that also were only in the JavaScript SDK or were invalid.
You're looking at a summary for chunks of code that are relevant to the given library. If you type what specifically you need documentation for and adjust output token count, it will give LLM relevant fragments.
The human desire for interoperability would prevent writing everything from scratch, but to the extent that is reasonably possible, yes. We would already do that now if humans weren't a limiting factor. But coding agents, as I know the term, is about removing humans from the equation of writing code. I did ask if there is another interpretation for "coding agent" being used.
Of course, distinct projects is a human construct that computers don't care about. As coding agents evolve (if they do), it isn't likely that the idea of projects will persist long-term.
Why did my engineering team handle payments through Stripe instead of building a custom payment processor? Aren’t they supposed to be engineering things?
Coding agents presumably don't know how to deal with non-coding things. Stripe's real value isn't in its technology, but it sorting out the complex human problems associated with payment processing. Sending a number over a network is not any great feat. Getting humans to agree that number has meaning is another matter.
Why would I care? I write libraries to help humans write code. If an LLM was actually good at writing code, it would engage with the library in the same way as a human. If it can't, that's not a problem with the library. I don't lose anything if an LLM doesn't use a library I've written.
Perhaps you shouldn't care, although I find this to be a short-sighted perspective.
Agents are users now, and agent-friendly docs and libraries will be standard practice for the tools that want to thrive through broader industry adoption.
Similarly, perhaps you wouldn't care if your website wasn't easily parsable by crawlers, but if you'd like your work to appear in search results you might like to include a sitemap or structure the HTML a little.
I couldn't agree more. I don't want my code included in crappy vibe coded software.
Thats not how FOSS works.
If coding agents are the new entry point to your library, how sure are you that they’re using it well?
I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.
Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.
So we built a simple app to test how well coding agents interact with libraries: • Takes your library’s docs • Automatically extracts usage examples • Tasks AI agents (like Claude Code) with generating those examples from scratch • Logs mistakes and analyzes performance
We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.
We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.
> If coding agents are the new entry point to your library, how sure are you that they’re using it well?
> I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.
Why should they even bother to answer such a loaded and hypothetical question?
im paraphrasing. the questions i asked to dev tool builders were more neutral.
If making dev tooling is selling shovels to the miners, then this is like selling sheet metal to the shovel makers.
Yeah. Feels like a data mining operation for training data.
I could be wrong.
Note that this comment is not hijacking. The author of this comment is also the author of the post.
That's the more likely assumption. Accounts with only self-promotion spam activity have become more of a rule here than an exception.
Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)
Why do we need to log in?
we send out an email when the tests are finished (takes about 30 mins)
That makes you sound like you are dodging the question.
i mean that we wanted an email address to send the results to when they finish.
based on comments here, i do think we should allow users to run the audit first (and provide an email address if they want us to follow up with results later).
IMO a tool like this doesn’t make sense until the hallucination problem is fixed
Was this vibe coded itself? Just wondering cause the login screen had this warning
Dev Keys One or more of your connections are currently using Auth0 development keys and should not be used in production. Learn More
In Elixir land, the Ash Framework created a package called usage_rules[0] as an experimental attempt to solve this problem a few months ago. The latest version of the Phoenix Framework (1.8) includes it in their `mix phx.new` generator and in their own hex packages[1]. Library owners would need to add their own usage rules, but it seems to help even for just the core packages Phoenix includes.
[0] https://hexdocs.pm/usage_rules/readme.html
[1] https://github.com/phoenixframework/phoenix/tree/main/usage-...
What I often do, and completely works, is telling Claude Code to git-clone the library in question. Then I /add-dir it, so I can tell it to understand the code factually and as of the exact version I'm using.
You often don't even need cloning, for instance in the Ruby ecosystem, for the Pundit gem `bundle show pundit` tells you `/Users/vemv/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/pundit-2.3.2` which is a vanilla dir (non-zipped), ready to be added to the Claude context.
It's a neat idea. But if, as we're told, LLMs will get better and better, something like this, in theory, will be increasingly unnecessary.
I feel like most of the problems with AI using a library is how we mix code and implementation. C, C++ got it right (even if by accident) with separating specification from implementation.
Instead of lamenting the design trend of not maintaining this split, for my own code I wrote a utility to extract specifications from my existing code.
> It's a neat idea. But if, as we're told, LLMs will get better and better, something like this, in theory, will be increasingly unnecessary.
I don't think so. I think understanding the context of a project will always produce superior results. I think instead we'll just make it a lot easier to add to the training corpus the LLM pulls from.
Respectfully, I disagree. It is much faster and cheaper to direct an LLM to add a call to a battle-tested library that encapsulates complex logic than it is to design and implement that logic from scratch, even if it’s capable of that.
We’re betting on almost the exact opposite idea: we can make agentic software engineering cheaper and more reliable by making it easy for LLMs to write, find, and integrate libraries and other third party software.
I get so much hallucination from gpt-5 about library APIs ...
Please write the Apis for people so we can go see when the LLM fails.
If you write it for the llm, the moment is starts lying the whole thing is broken, if its not for me I dont go digging in, I rather use something else
I've done a lot of work recently to make my library more "LLM Friendly", but I'm not willing at this time to sign up to a service which I don't know I'd ever use again just to run a test on your behalf. If you want to run the test on my library then its GitHub can be found here: https://github.com/KaliedaRik/Scrawl-canvas
We’ve been working on this problem off and on for over a year now. Many models bake knowledge of particular tools/libraries/patterns into their weights very well and others quite poorly. In my experience Claude is quite good at integrating the dog.ceo API and noticeably ignorant when it comes to Postgres features, and it knows gcloud commands enough to very confidently and consistently hallucinate arguments.
We’ve baked a solution to this into our product, so if anybody is working on an API/SDK/etc feel free to contact me if your users are running into problems using LLMs to integrate them.
One thing we’ve noticed is that subtle changes to library/api integration prompts’ context can be surprisingly impactful. LLMs do very well with example commands and explicit instructions to consider X, Y, and Z. If you just dump an API reference and information that implicitly suggests that X, Y, and Z might be beneficial, they won’t reliably make the logical leaps you want them to unless you let them iterate or “think” (spend more tokens) more. But you can’t as easily provide an example for everything, and the ones you do will bias the models towards them, so you may need a bit of both.
I made a provisional patent this year, about how exactly I would solve this problem. Imagine hiring a "team of developers" who can learn your library and iterate 24/7, improving things, doing support, even letting the pointy-haired boss turn his ideas into reality in a forked sandbox on the weekend.
For the last 15 years I've been writing against software patents, and producing open source software that cost me about $1M to develop, but in the case of AI, I have started to make an exception. I have also rethought how I am going to do open source vs closed source in my AI business. A few weeks ago I posted on HN asking whether it's a good idea, and no one responded: https://news.ycombinator.com/item?id=44425545
(If anyone wants to work with me on this, hit me up, email is in my profile)
I hope we don’t have to challenge it!
We’re trying to build a similar kind of experience but for both “sides” of the problem: software provider and software users/integrators.
I guess that's why patents are annoying. I have been Mr. Open Source and against intellectual property for most of the past 15 years. But with AI companies rampantly taking everyone's work and repurposing it, and with VC companies not being very eager to invest in open source, I'm taking a different tack with my AI ventures.
My first two companies are radically open source, and no one cared:
https://github.com/Qbix
https://github.com/Intercoin
And this is what we're doing now with AI, but it's not going to be as open: https://engageusers.ai/deck.pdf
Don't worry, we're not looking to get into it with some random other projects. It's mostly to protect our business model against the Big Tech and enterprises.
I think I gave you product feedback on Qbix at some point in the past. I also know several founders who’ve secured funding for open source products and built successful businesses off of them. Open-core is pretty popular out here in the Bay Area.
One thing I’ve learned since staring a company is that early on, your greatest asset is trust in your founder/brand, because it’s the only reason for someone to pay you for something until you get your shit together. I’ve personally had a hard time noticing it in myself sometimes, but I think it’s easy to overlook how outward signaling that might look like distrust (eg making users sign NDAs) damages your own ability to build trust. Since early startups tend to be considered untrustworthy by default it can be really counterproductive. Anyway, I appreciate your non-aggression policy
Would you consider arranging a call to discuss our respective projects? If you're building something along these lines, then I think we might end up joining forces.
I've always preferred collaboration and joining forces building on each other's work, than competition and incompatibility.
https://calendly.com/engageusers/meeting
Boooo software patents.
don't worry everybody, this guy's profile shows he was a blockchain booster five minutes ago, just another grifter, nothin to see here
I'd actually consider your criticism seriously, if it was anything other than the usual HN "saw the word blockchain, did an immediate TDLR with the word grift" regardless of what was done or built.
If you had anything substantive to back up what you're saying, we could discuss it, but since you don't... well, I'm actually disappointed but w/e.
Why on earth would we adapt libraries to the LLMs, rather than improving the LLMs to do their jobs correctly? This seems completely backwards to me.
i think you prob need a bit of both. would you add llms.txt to your docs or make it crawl like a human?
Nah. I write software and docs for humans.
The skip-to-the-end answer: Context7 MCP is so good it seems like magic, even to many well-informed, highly capable hackers. Simply wildly good for libraries and SDKs. All it takes to start using it is to add the MCP provider to your agent config and save your arms, "Use Context7 for this".
https://context7.com/
I'm confused a bit by this. For instance, Gemini was struggling to write proper Java code for using Firebase Admin SDK. It would write Java code using methods that only exist in the JavaScript SDK. And when I would correct it, it would give other options that also were only in the JavaScript SDK or were invalid.
So I thought this is where context7 would be useful, but I'm confused what I'm looking at in the detail page: https://context7.com/firebase/firebase-admin-java
I was expecting some sort of dump of all the admin methods, but it gives a single example of one library function and info on how to build javadoc.
You're looking at a summary for chunks of code that are relevant to the given library. If you type what specifically you need documentation for and adjust output token count, it will give LLM relevant fragments.
It lets you emulate RAG.
I think the main problem is that the source GP is using, https://github.com/firebase/firebase-admin-java, contained almost nothing that context7 extracted as "docs".
It looks like https://firebase.google.com/docs/ is being refreshed as I type this, I imagine that using that as a source and including "Java" in the topic filter might give more results (or maybe the https://github.com/firebase/firebase-docs has the same content).
What is the best approach to have something like context7 for internal tools and libraries?
context7 is open source: https://github.com/upstash/context7
If LLM coverage is going to be as important as documentation for future API adoption, it’d better be good, sadly…
I'd use this if this was an open source tool.
Needing too sign up before I can see or do anything made me close the tab immediately.
good to know. i think we're likely to move towards running coding agents locally in the next iteration.
Inline comments and docs help a lot with this.
What do coding agents need my library for?
Don't they know how to write their own code? Isn't that a coding agent's entire purpose in life?
There must be conflicting definitions out there. What does "coding agent" mean in this context?
do we want every coding agent writing everything from scratch for every project?
or would reusability/modularity across projects and teams be beneficial?
The human desire for interoperability would prevent writing everything from scratch, but to the extent that is reasonably possible, yes. We would already do that now if humans weren't a limiting factor. But coding agents, as I know the term, is about removing humans from the equation of writing code. I did ask if there is another interpretation for "coding agent" being used.
Of course, distinct projects is a human construct that computers don't care about. As coding agents evolve (if they do), it isn't likely that the idea of projects will persist long-term.
Why did my engineering team handle payments through Stripe instead of building a custom payment processor? Aren’t they supposed to be engineering things?
Coding agents presumably don't know how to deal with non-coding things. Stripe's real value isn't in its technology, but it sorting out the complex human problems associated with payment processing. Sending a number over a network is not any great feat. Getting humans to agree that number has meaning is another matter.
This is an extremely roundabout way of saying that you need a payment processor license.
AI doesn't need your library.