Agents are self-altering algorithms
Chain-of-thought reasoning is surprisingly powerful when combined with tools. It feels like a natural programming pattern of LLMs: thinking by writing. And it's easy to see the analogy to humans: verbalizing your thoughts in written (journalling) or spoken ("talking things through") form is a good way to actually make progress.
To put another way: words are not just a way to transmit thoughts; generating sentences is intertwined with the process of thinking itself. Literally, prompting an LLM to output more text than needed could be said as giving it space to think in a divergent way before converging. Again, humans do this too: journalling is a good way to think, but you could probably condense the same message into 10% the words. The rest is your Space to think.
Less philosophically, langchain has a concept of Agents and Tools.
A tool is a function that can be called by an LLM; both the input and output of the function are text. A calculator could be a tool, or a Python interpreter. Internet search is a standard and valuable tool. You can also implement custom tools like "send SMS". Funnily enough, "ask a human" is a valid tool you could implement, and in fact was recently introduced to langchain.
An agent is an LLM that is prompted to complete a task and given the option to use tools in the process. In addition, it is prompted to use explicit Thought-Action-Observation loops, which are very effective.
The power of Agents becomes evident when compared to the less powerful option, fixed Chain-of-thought reasoning. In that paradigm, you write in advance a list of instructions which are called consecutively, like so:
- Write an outline of a blog post on {topic}.
- Write the blog post (given the previous step's output as input).
- Write the title for the blog post (given the previous step's output as input).
This is essentially an algorithm that the LLM is following: a fixed set of steps that it follows to produce the output.
Agents do essentially the same, but the steps are not defined in advance. The model can take any conceptual step (thought), use any of the available tools (action), and see what happens (observation) to then adjust its behaviour. This means a Agent can solve any problem, as long as it has the right tools and the base LLM is capable enough. It's an algorithm that generates itself during execution.
This is very human.
Also human is delegating to other Agents. You pay someone to bring your food, write your speech, test your code, etc. Similarly, text-in-text-out agents can use each other as tools. But not only within one company! These agents will be able to expose public interfaces and charge each other (like you are charged for calling the Twilio API). Far before AGI, might we have an Agent-to-Agent economy?