Taivo Pungas
  • Home
  • Videos
  • Stream
  • About
Sign in Subscribe
stream

LLM latency is linear in output token count

All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running behind an API as well as local or self-deployed models. Armed with this knowledge, we can make a very accurate model of what the
Sep 21, 2023 2 min read
stream

RAG is more than just embedding

90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a vector database like Chroma, but it doesn't have to be this way. Retrieval is a long-standing problem in computer science -- a couple of PhD students
Sep 21, 2023 1 min read
stream

Retrieval-augmented generation

Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your situation. They don't know the location of your user, or the most relevant passage from the knowledge base, or what the current list of
Sep 21, 2023 1 min read
stream

Properties of a good memory

Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you, the latter knows a lot. From UI preferences (dark or light mode?) to basic personal data (name?) to your friend list to
Sep 21, 2023 2 min read
stream

Things I've underestimated - Sep 2023

After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore more. Semantic Kernel I've gotten so used to langchain that I haven't really considered switching... all the while loving to hate it. As I
Sep 21, 2023 2 min read
stream

Launching OpenCopilot

Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically, for the past few months I have been building copilots, and doing so from scratch takes a bunch of work every time. So we decided to release OpenCopilot:
Aug 23, 2023 1 min read
stream

Context engineering is information retrieval

The stages of an LLM app seem to go like this: * Hardcode the first prompt, get the end-to-end app working. * Realise that the answers are bad. * Do some prompt engineering. * Realise the answers are still bad. * Do some more prompt engineering. * Discover vector databases!!!1 * Dump a ton of data
Jun 20, 2023 1 min read
stream

Making GPT API responses faster

GPT APIs are slow. Just in the past week, the OpenAI community has had 20+ questions around that. And not only is it rare for users to tolerate 30-second response times in any app, it is also extremely annoying to develop when even basic tests take several minutes to run.
May 30, 2023 5 min read
stream

agentreader - simple web browsing for your Langchain agent

This is a short link-post to a new repo I just released. While working on Why AutoGPT fails and how to fix it I created a handy web-browsing Tool for langchain Agents and now finally got around to open-sourcing it. Here is the repository: github.com/taivop/agentreader.
May 29, 2023
stream

Why AutoGPT fails and how to fix it

A couple weeks after AutoGPT came out we tried to make it actually usable. If you don't know yet, it looks amazing on first glance, but then completely fails because it creates elaborate plans that are completely unnecessary. Even asking it to do something simple like "find the turning circle
May 29, 2023 3 min read
stream

Core innovations of AutoGPT

AutoGPT (repo) went viral on Github and looks impressive on Twitter, but almost never works. In the process of trying to improve it I dug into how it works. Really there are two important parts to AutoGPT: a plan-and-execute workflow, and looped Chain-of-thought (CoT). Plan-and-execute Say the user input is
May 19, 2023 1 min read
stream Featured

GPT-3.5 and GPT-4 response times

Some of the LLM apps we've been experimenting with have been extremely slow, so we asked ourselves: what do GPT APIs' response times depend on? It turns out that response time mostly depends on the number of output tokens generated by the model. Why? Because LLM latency is linear in
May 11, 2023 3 min read
stream

How data is used for LLM programming

Software 1.0 -- the non-AI, non-ML sort -- extensively uses testing to validate things work. These tests are basically hand-written rules and assertion. For example, a regular expression can be easily tested with strings that should and should not give a match. In software 2.0, and specifically supervised
May 9, 2023 1 min read
stream

Hacky multimodality

GPT-4 supports images as an optional input, according to OpenAI's press release. As far as I can tell, only one company has access. Which makes you wonder: how can you get multimodality support already today? There are basically two ways for adding image support to an LLM: 1. Train a
May 4, 2023 1 min read
stream

First thoughts on AI moratorium

Context: first thoughts on Pause Giant AI experiments. I will refine my thinking over time. * I had not thought about AI safety much since ~2017, after thinking a lot about it in 2014-2017. In 2017, I defended my MSc thesis on an AI-safety-inspired topic (though very narrow and technical in
Mar 31, 2023 3 min read
stream

Agents are self-altering algorithms

Chain-of-thought reasoning is surprisingly powerful when combined with tools. It feels like a natural programming pattern of LLMs: thinking by writing. And it's easy to see the analogy to humans: verbalizing your thoughts in written (journalling) or spoken ("talking things through") form is a good way to actually make progress.
Mar 27, 2023 2 min read
stream

Index to reduce context limitations

There is a very simple, standardized way of solving the problem of too small GPT context windows. This is what to do when the context window gets full: Indexing: 1. Chunk up your context (book text, documents, messages, whatever). 2. Put each chunk through the text-embedding-ada-002 embedding model, and store
Mar 24, 2023 2 min read
stream

Langchain frustration

I was building an agent with langchain today, and was very frustrated with the developer experience. It felt hard to use, hard to get things right, and overall I was confused a lot while developing (for no good reason). Why is it so hard to use? Here are some of
Mar 22, 2023 1 min read
stream

Office 365 copilot is extremely impressive

Microsoft recently held an event where they announced the "Office 365 Copilot". It was extremely impressive to me, to the extent that (when this launches) I will consider switching from Google's work suite to Microsoft's. Why is this announcement so impressive? In one word, integration. Pretty much every Office app
Mar 21, 2023 1 min read
stream

Your non-AI moat

"What's the Moat of your AI company?" That seems to be top of mind for founders pitching their novel idea to VCs -- and the most common question they get. But I think it might not be the right question. Right now, nobody seems to have a good answer. a16z
Mar 15, 2023 1 min read
stream

GPT-4, race, and applications

GPT-4 came out yesterday and overshadowed announcements, each of which would have been bombshell news otherwise: * Anthropic AI announcing their ChatGPT-like API -- likely the strongest competitor to OpenAI today (only waitlist) * Google announcing PaLM API (currently it's just marketing -- no public access yet) * Adept AI announcing their $350M
Mar 15, 2023 1 min read
stream

Alpaca is not as cheap as you think

"Alpaca is just $100 and competitive with InstructGPT" -- takes like this are going around Twitter, adding to the (generally justified!) hype around AI models. It is indeed a very encouraging result. Specifically, that it took so little compute to train something that achieves competitive results on benchmarks and is
Mar 14, 2023 1 min read
stream

Chain-of-thought reasoning

Matt Rickard has a concise overview of Chain-of-thought: the design pattern of having an LLM think step by step. To summarize, the four mentioned approaches from simpler to more nuanced are: 1. Add "Let's think step-by-step" to the prompt. 2. Produce multiple solutions, have the LLM self-check on each one,
Mar 13, 2023 1 min read
stream

Define a vector space with words

I've been experimenting with embedding ideas into GPT-space, and using the resulting vectors to visualize. For example, you could plot different activities based on two axes: how safe they are, and how adrenaline-inducing they are. How does it work? The intuition is the following. In words, you can define an
Mar 12, 2023 3 min read
stream

AI product overhang

There is a massive AI overhang today. The term is analogous to snow overhang. When snow slides down a roof, sometimes a section goes over the edge and is seemingly supported by nothing. You know it will eventually fall; the tension is in the air. It's a question of time.
Mar 9, 2023 1 min read
Page 1 of 4 Older Posts →
Taivo Pungas © 2023
  • Contact
Powered by Ghost