RAG is more than just embedding
90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a
90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a
Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your
Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you,
After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore
Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically,
The stages of an LLM app seem to go like this: * Hardcode the first prompt, get the end-to-end app working. * Realise that the answers are bad. * Do some
GPT APIs are slow. Just in the past week, the OpenAI community has had 20+ questions around that. And not only is it rare for users to tolerate
This is a short link-post to a new repo I just released. While working on Why AutoGPT fails and how to fix it I created a handy web-browsing
A couple weeks after AutoGPT came out we tried to make it actually usable. If you don't know yet, it looks amazing on first glance, but
AutoGPT (repo) went viral on Github and looks impressive on Twitter, but almost never works. In the process of trying to improve it I dug into how it