LLM+API vs LMM+UI

The two most famous startups focused on making Agent middlewares seem to be Imbue and Adept. Both companies' goal is to have a large model use a

19 Oct 2023 · 1 min read

Diverge-converge cycles in LLMs

The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges

3 Oct 2023 · 1 min read

LLM latency is linear in output token count

All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running

21 Sep 2023 · 2 min read

RAG is more than just embedding

90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a

21 Sep 2023 · 1 min read

Retrieval-augmented generation

Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your

21 Sep 2023 · 1 min read

Properties of a good memory

Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you,

21 Sep 2023 · 2 min read

Things I've underestimated - Sep 2023

After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore

21 Sep 2023 · 2 min read

Launching OpenCopilot

Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically,

23 Aug 2023 · 1 min read

Context engineering is information retrieval

The stages of an LLM app seem to go like this: * Hardcode the first prompt, get the end-to-end app working. * Realise that the answers are bad. * Do some

20 Jun 2023 · 1 min read