Taivo Pungas
  • Home
  • Videos
  • Stream
  • About
Sign in Subscribe
video

Future of e-Estonia - a young engineer’s view

  • Taivo Pungas

Taivo Pungas

Sep 16, 2019
Future of e-Estonia - a young engineer’s view

Sign up for more like this.

Enter your email
Subscribe

LLM latency is linear in output token count

All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running behind an API as well as local or self-deployed models. Armed with this knowledge, we can make a very accurate model of what the
Sep 21, 2023 2 min read

RAG is more than just embedding

90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a vector database like Chroma, but it doesn't have to be this way. Retrieval is a long-standing problem in computer science -- a couple of PhD students
Sep 21, 2023 1 min read

Retrieval-augmented generation

Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your situation. They don't know the location of your user, or the most relevant passage from the knowledge base, or what the current list of
Sep 21, 2023 1 min read
Taivo Pungas © 2023
  • Contact
Powered by Ghost