I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver. On a quick benchmark I ran: * gpt-4-1106-preview ("gpt-4-turbo") runs in 18ms/token * gpt-3.5-turbo-1106 ("the newest version of gpt-3.5") runs in just 6.
The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges related to the problem. 2. Converging on problem (Define). Analyze and synthesize the gathered insights to define a clear and specific problem
90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a vector database like Chroma, but it doesn't have to be this way. Retrieval is a long-standing problem in computer science -- a couple of PhD students
AutoGPT (repo) went viral on Github and looks impressive on Twitter, but almost never works. In the process of trying to improve it I dug into how it works. Really there are two important parts to AutoGPT: a plan-and-execute workflow, and looped Chain-of-thought (CoT). Plan-and-execute Say the user input is
Software 1.0 -- the non-AI, non-ML sort -- extensively uses testing to validate things work. These tests are basically hand-written rules and assertion. For example, a regular expression can be easily tested with strings that should and should not give a match. In software 2.0, and specifically supervised
Chain-of-thought reasoning is surprisingly powerful when combined with tools. It feels like a natural programming pattern of LLMs: thinking by writing. And it's easy to see the analogy to humans: verbalizing your thoughts in written (journalling) or spoken ("talking things through") form is a good way to actually make progress.
There is a very simple, standardized way of solving the problem of too small GPT context windows. This is what to do when the context window gets full: Indexing: 1. Chunk up your context (book text, documents, messages, whatever). 2. Put each chunk through the text-embedding-ada-002 embedding model, and store
Microsoft recently held an event where they announced the "Office 365 Copilot". It was extremely impressive to me, to the extent that (when this launches) I will consider switching from Google's work suite to Microsoft's. Why is this announcement so impressive? In one word, integration. Pretty much every Office app
GPT-4 came out yesterday and overshadowed announcements, each of which would have been bombshell news otherwise: * Anthropic AI announcing their ChatGPT-like API -- likely the strongest competitor to OpenAI today (only waitlist) * Google announcing PaLM API (currently it's just marketing -- no public access yet) * Adept AI announcing their $350M