RAG is more than just embedding

90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a

21 Sep 2023 · 1 min read

Retrieval-augmented generation

Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your

21 Sep 2023 · 1 min read

Properties of a good memory

Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you,

21 Sep 2023 · 2 min read

Things I've underestimated - Sep 2023

After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore

21 Sep 2023 · 2 min read

Launching OpenCopilot

Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically,

23 Aug 2023 · 1 min read

Context engineering is information retrieval

The stages of an LLM app seem to go like this: * Hardcode the first prompt, get the end-to-end app working. * Realise that the answers are bad. * Do some

20 Jun 2023 · 1 min read

Making GPT API responses faster

GPT APIs are slow. Just in the past week, the OpenAI community has had 20+ questions around that. And not only is it rare for users to tolerate

30 May 2023 · 5 min read

Why AutoGPT fails and how to fix it

A couple weeks after AutoGPT came out we tried to make it actually usable. If you don't know yet, it looks amazing on first glance, but

29 May 2023 · 3 min read

Core innovations of AutoGPT

AutoGPT (repo) went viral on Github and looks impressive on Twitter, but almost never works. In the process of trying to improve it I dug into how it

19 May 2023 · 1 min read