stream - Taivo Pungas (Page 4)

Momentum

An object that has a lot of momentum is hard to stop. A bowling ball. An ocean liner. A person who will not allow themselves to be derailed.

29 Jan 2024 · 1 min read

Non-judgemental awareness is curative

Rather than try to fix things, all you have to do is notice, non-judgementally, that you’re doing them. Even while you’re involved in this non-judgemental noticing,

25 Jan 2024 · 1 min read

Curiosity arises from lack of feed

Here's a hypothesis: I think I'll always find something I am curious about. I am a naturally curious person, and I would actually guess

25 Jan 2024 · 2 min read

Are LLMs deterministic?

No. You can see for yourself: setting the temperature variable to 0 (meaning you always sample the most likely token from the output distribution), you'd expect

23 Jan 2024 · 2 min read

Recent LLM launches, and LLM rumors

Llama 3 is already training according to Zuck. There are conflicting sources & rumors, and the release date claims vary across all of 2024. For GPT-5 there are

23 Jan 2024 · 1 min read

A wild speed-up from OpenAI Dev Day

I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver.

7 Nov 2023 · 1 min read

LLM+API vs LMM+UI

The two most famous startups focused on making Agent middlewares seem to be Imbue and Adept. Both companies' goal is to have a large model use a

19 Oct 2023 · 1 min read

Simplicity is essential in a generative world

* "less is more" (proverb) * "perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to

18 Oct 2023 · 1 min read

Diverge-converge cycles in LLMs

The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges

3 Oct 2023 · 1 min read

LLM latency is linear in output token count

All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running

21 Sep 2023 · 2 min read