Taivo Pungas

Gemini 2.5 Pro isn't quite Claude 3.7 at coding

Gemini 2.5 Pro was just released and it could be a big deal, if its coding abilities pan out. The current positioning of Gemini has been roughly that it is a tad behind the OpenAI/Anthropic models of the same class, and far behind the coding capabilities of Claude

stream

Hiring as the marriage problem

At one point in my career, our company was hiring for role of a Head of Marketing. I got a strong referral from a friend, reached out, the match was great, and she passed the interview process with ease. And then we just... sat on it. The hiring manager said

stream

Which European institution is running a massive LLM hackathon?

Anthropic, OpenAI and possibly others are doing one in the US with 1,000 participants from public sector research institutes. It's called an "AI Jam" and the companies are obviously doing it out of commercial interest. Even so, the government sector must have such obvious use

stream

Will AI take us towards refinement of the self?

In Would You Rather Have Married Young?, Lillian Fishman writes about the "fundamental ethos that had long governed the young secular woman" in the last 50 years: Experience, we hoped, would broaden us. The new object seems to be the inverse: the contraction and refinement of the self,

stream

Half AI, half random curiosities

I rarely look at the search analytics for my blog, but today I did. I am really amused by the most common search queries that bring people to this website, which definitely show the variety of content I post here. Here are the top ones in the past month, with

stream

Vibe-code with stable infrastructure

Vibe-coding produces close-to-unmaintainable code right now. But it is an acceptable trade-off in places where it produces decent results and maintainability matters less -- for example making simple UIs. So maybe a good approach is to vibe-code with stable infrastructure. This riffs off of Facebook's engineering principle "

Negotiating against an LLM

Writing this post I got heavy AI assistance, starting off with my raw notes of the competition. I know it's not up to my usual quality – this was the only way I could find time to share this. Overall I consider the result not good enough, but I

stream

The GPU export limit might start to matter

White House's new chip rules seem to not have huge impact right now, in the "tier 2" countries like Estonia. It is unlikely we ever would have been able to build a $1B data center anyway. But this restriction could still become significant. I don'

stream

Good LLM devtools are also good human devtools

What does a well designed developer tool (programming language, library, API, CLI utility) look like -- for an LLM? The more we write code with Copilot and Cursor and chat-assistants, the more important this question is. Off the top of my head, a good tool for LLM programmers: * Chunks, abstracts

stream

LLMs highlight lack of progress in real world

With LLMs now multimodal and extremely cheap, they can do much of what a human can, when limited to only the virtual. This rapid progress in the world of bits, however, is a sore reminder of how slowly things improve in the physical world. I need an AI that can:

stream

Creating single-file apps with LLMs

To prototype a mobile-optimized writing app I used LLMs as code generators. The idea isn't new but I got inspiration from reading Simon Willison's experiments creating micro-UIs. The idea is to prompt for whatever functionality you need, and ask for a single HTML file containing CSS

stream

Writing on mobile is different

Whenever I write here, I do it on my laptop, almost never on the phone. I do have a Bluetooth keyboard that connects to my phone, but it's rare that I remember to bring it with me, and have a moment to take it out of the bag.

DIY parametric wall art: plans to production

Watching Youtube woodworking videos is my guilty pleasure. But until now, I'd had little opportunity to make something myself: no space for a workshop in my small apartment and a busy family/work schedule haven't really allowed it. Recently on summer vacation I had an idea

stream

Best pastry in Tallinn

1. Sumi by Põhjala Location: Kalamaja, Krulli quarter, Kopli 70a. Price: 3.5€ per bun. An offshoot of Põhjala Tap Room, this place is a dual concept of a bakery and open-fire grill dinner. Põhjala used to only offer pastry on Sundays, but now the amazing French/American inspired pastry

stream

Frontier LLMs come every 1.5-2 years

I have a theory that the LLM frontier moves every 1.5-2 years. Let me qualify. I mean that a major leap of the whatever is the best model (currently GPT-4) happens with that interval. Incremental improvements don't count: e.g. Claude 3 Opus is claimed to be

stream

Explaining quiet quitting

Quiet quitting is a recent name for something I am sure has been happening for all of time. It probably has something to do with the person's life circumstances, or something about the macro environment being depressing, or whatever. But let me propose a simple economic justification. Consider

stream

SAD lights

It seems like everyone I talk to recently is thinking in the same direction: managing their seasonal affective disorder, or SAD. If you live in Estonia, or anywhere with a long winter, or maybe even the less sunny parts of California, you'll know what I'm talking

stream

Strive for off-grid discipline

We lost a floorball game yesterday. I can't stop thinking about it. Maybe because it felt like a pointless loss, like we lost because of something fully within our grasp. What happened? Two periods into the game, we were ahead 3:1. We had kept our defence intact,

stream

Momentum

An object that has a lot of momentum is hard to stop. A bowling ball. An ocean liner. A person who will not allow themselves to be derailed. Physically, an object of any amount of momentum could be stopped in a very short time, if enormous force is exerted. And

stream

Non-judgemental awareness is curative

Rather than try to fix things, all you have to do is notice, non-judgementally, that you’re doing them. Even while you’re involved in this non-judgemental noticing, you will notice a barrage of impulses to try different things, to intervene, to try, to fix. That’s fine, just notice

stream

Curiosity arises from lack of feed

Here's a hypothesis: I think I'll always find something I am curious about. I am a naturally curious person, and I would actually guess that everyone is? It seems impossible that there would ever be a moment where nothing could interest me. But I think curiosity

stream

Are LLMs deterministic?

No. You can see for yourself: setting the temperature variable to 0 (meaning you always sample the most likely token from the output distribution), you'd expect GPT-3.5 and 4 to produce the same output every time you call them. However, they don't. Why do LLMs

stream

Recent LLM launches, and LLM rumors

Llama 3 is already training according to Zuck. There are conflicting sources & rumors, and the release date claims vary across all of 2024. For GPT-5 there are even no reliable rumors; if training started within the past few months then my back of napkin is that it may be

stream

A wild speed-up from OpenAI Dev Day

I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver. On a quick benchmark I ran: * gpt-4-1106-preview ("gpt-4-turbo") runs in 18ms/token * gpt-3.5-turbo-1106 ("the newest version of gpt-3.

stream

LLM+API vs LMM+UI

The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches. * Imbue: use LLMs (text-based) with API-based tools. * Adept: