Multi-Agent Conversation Frameworks: The Gestalt Of AI
Multi-agent conversational frameworks could lead to organized networks of specialized LLMs that are more than the sum of their parts. The desired Artificial General Intelligence (AGI) may not be a single extremely powerful LLM, but an organism of specialized AIs.
In dynamical systems theory, the parts of a system are discrete, time-dependent functional units that are interconnected and can interact. The interconnections can be mechanical, electrical, biological or biochemical, or linguistic, resulting in all kinds of systems, from microscopic organisms to mechanical or electronic devices to entire economies. Such systems as a whole are also functional units and interact with their environment through signal inputs and outputs.
The properties and behaviors of a system are a result of the relationships between its components, not inherent in the components themselves. Aristotle recognized this supersummative nature of systems more than 2300 years ago, writing, "That which is composed of parts in such a way as to form a unified whole [...] is obviously more than the sum of its parts" (Metaphysics, Book 8.6. 1045a).
In the early 20th century, psychologists Max Wertheimer, Wolfgang Köhler, and Kurt Koffka founded Gestalt psychology, a theory of the perception of forms and patterns rather than individual parts (“Gestalt” is German for form); Gestalt as the properties of the whole system emerges from the meta-level of the relationship of the individual components. The Austrian philosopher Christian von Ehrenfels coined the term Gestalt earlier, using the example of a melody as a Gestalt: although it is made up of individual notes, a melody is obviously more than the sum of its parts. For example, if the melody is translated into another key, it remains the same, even though it is made up of different individual notes.
With Large Language Models (LLMs), such as the GPT-4o recently released by OpenAI, machines can for the first time map the patterns and configurations - the Gestalt - of enormously complex systems, such as entire books of literature. At the same time, the data LLMs can generate depends on the data they have been trained on; there is no evidence yet of true creativity. Critics of LLM-based AI are already talking about a coming peak of AI, i.e. a level of performance that individual LLMs will not exceed.
In 2023, Microsoft released AutoGen, a framework in which multiple agents (e.g., an LLM or a human) can form networks with each other to solve problems. In the accompanying paper, the authors report that such a multi-agent conversation framework can significantly improve problem-solving performance. For example, the authors streamlined automated code generation through a multi-agent conversation framework by implementing a Commander agent that manages user requests by coordinating Writer and Safeguard agents. It was further reported that this approach could save the user 3 times the time and reduce the number of interactions by 3-5 times compared to ChatGPT combined with a code interpreter, while generating more secure code.
So while today we expect future AI to be increasingly powerful LLMs, today's LLMs may already be the building blocks of tomorrow's AI. Multi-agent conversation frameworks could lead to organized networks of specialized LLMs whose performance in solving complex problems will exceed our wildest expectations. The desired Artificial General Intelligence (AGI) may not be a single extremely powerful LLM, but an organism of specialized LLMs.