In the evolving world of artificial intelligence, we foresee a parallel with human development: AI starting as ‘polymaths’ and gradually specializing into ‘experts’. As we approach the physical limits of network size and data capacity, a ‘divide and conquer’ strategy emerges as the most scalable solution. Moreover, we can create domain-specific models that are an order of magnitude more efficient and adept at integrating relevant knowledge and tools.
Leeroo orchestrator is the first step towards this goal. Leeroo orchestrator features a lightweight LLM that knows when and how to leverage underlying LLM experts to perform a task with the best outcome. Our experiments show we achieved new state-of-the-art:
-
State-of-the-art open-source: When leveraging open-source models, the Leeroo Orchestrator establishes itself as the top-performing open-source model on the MMLU benchmark, attaining 76% accuracy — with the same budget as Mixtral (70.6%).
-
Leeroo open-source vs. GPT3.5: Leeroo Orchestrator open-source achieves GPT3.5 accuracy on the MMLU at almost a fourth of the cost.
-
Achieving and beyond GPT4 with a fraction of cost: Combining open-source models with GPT4, the Orchestrator nearly matches GPT4’s accuracy at half the cost. Moreover, it even surpasses GPT4’s accuracy with 25% less cost.
-
Accessible: Leeroo can be served on accessible consumer hardware since it leverages underlying models of small sizes, 7b-34b (e.g., A10 and A100), and can be deployed on any cloud provider or on-prem.
Orchestration of Experts
At the heart of our innovation lies the Orchestrator, an advanced LLM-based system. This architecture is designed to intelligently deconstruct complex tasks into simpler sub-tasks, seamlessly identifying and engaging the most suitable ‘expert’ for each component and efficiently integrating their specialized knowledge to produce comprehensive and accurate responses. A key aspect of our Orchestrator is its strategic optimization based on predefined criteria such as speed, cost, and accuracy. For instance, when faced with a task that could be performed nearly equally well by a 7 billion parameter model or a more extensive 70 billion parameter model, the Orchestrator will opt for the former when factors like speed and cost-efficiency are prioritized. This approach ensures optimal resource utilization without compromising on quality.
Our architecture marks a significant departure from traditional Mixture of Experts (MoE) models. While MoE relies on gating over various expert sub-networks within each layer to predict the next token, it requires all expert parameters to be loaded onto a single, high-end machine. This limitation hinders scalability in the number of experts. In contrast, each ‘expert’ within our system operates independently and can be hosted on different machines, potentially utilizing varied neural network architectures. This flexibility means we can incorporate a vast array of experts, ranging from those specializing in system-level Java programming to those adept at curating travel experiences in London. Furthermore, our architecture facilitates easy integration of additional optimization criteria, including cost, speed, and privacy considerations.
Performance on MMLU benchmark (Massive Multitask Language Understanding)
At its inception, the Orchestrator is trained to estimate the effectiveness of responses from a diverse range of experts, selecting the most suitable model for each task based on size and efficiency. Our initial pool of experts encompasses open-source Large Language Models (LLMs) from Hugging Face, spanning sizes of 7B, 13B, and 34B.
In our experiments, to estimate the cost of each model, we used the minimum available cost across all providers for each model, thereby avoiding any bias in evaluating model costs.
We initially adjusted the Orchestrator’s cost to achieve performance parity with the best open-source model, Mixtral, which is 70.6. We accomplish this at just two-thirds of their cost. This significant cost-performance ratio marked our first success. We then further tuned our cost to match Mixtral’s (0.6$), resulting in our Orchestrator outperforming it by more than 5% on the MMLU benchmark and creating a new state-of-the-art open-source model. Additionally, we demonstrated that by permitting the Orchestrator to utilize GPT4 for tasks beyond the capabilities of open-source models, we could attain near-parity with GPT4’s performance but at half the cost.
A standout area of success is in STEM domains, such as mathematics and computer science, where the Orchestrator particularly excels. This impressive performance is largely attributed to the incorporation of specialized small models (around 7 billion parameters) that are fine-tuned for tasks in mathematics and coding by the community.
Deployment Flexibility
The Orchestrator is optimized to work with smaller models. This optimization is particularly beneficial for users without access to high-end computing resources (e.g., GPUs with 100G), as it allows them to leverage advanced AI technologies in a more cost-effective manner.
In scenarios where deploying a large number of experts is not feasible, users can limit the size of the expert pool. The Orchestrator intelligently selects the most complementary set of experts within these constraints. This ensures that, even with a smaller pool of experts, the Orchestrator remains highly effective, making the best use of the available resources. For example, we can almost achieve the MMLU results above by using a sparse number of experts as six models.
Leeroo Orchestrator V1: The Inaugural Step in a Progressive Series
Launching with Leeroo Orchestrator V1, we set in motion a series of advancements that promise significant growth in AI capabilities. The training process begins by generating a variety of queries and running them through the Orchestrator. Each final response produced is then evaluated. This data serves as a crucial training input for the Orchestrator’s own LLM, teaching it to discern which expert model is best suited for any given question. The true power of our system lies in its continuous learning loop inspired by self-play learning in RL: as more questions are generated, processed, and assessed, the Orchestrator accumulates an ever-expanding wealth of training signals. This ongoing cycle allows it to grasp even the most subtle nuances in the capabilities of different experts. As a result, the accuracy and efficiency of the Orchestrator’s responses improve markedly with iterations of this self-play loop.
Looking ahead, we anticipate unveiling vastly superior versions in the forthcoming releases. When choosing the best open-source model for each question in the MMLU benchmark, we would reach an accuracy of 98% at the computational cost of approximately a 13 billion parameter model. This upper bound indicates substantial room for growth and optimization.
Additionally, our system is designed to seamlessly incorporate and learn from new expert models as they become available. When new models are introduced, the Orchestrator evaluates their performance across various questions within the loop. This continuous integration and assessment process enables the Orchestrator to effectively utilize these new resources, further enhancing its performance and scope. This ongoing process of integration and evaluation ensures that the Orchestrator remains at the cutting edge of AI technology, continuously expanding its capabilities.
Our vision Unveiled
At its core, we build an LLM-based Operating System that seamlessly unites diverse expert models. This integration not only streamlines processes and enhances efficiency but also unlocks new possibilities across multiple expertise. Consider the finance sector, where a financial risk management expert model merges with legal compliance and market analysis models. This collaboration enables the creation of sophisticated investment strategies that adeptly navigate risk, regulatory compliance, and market volatility—previously unattainable achievements. This illustrative example serves to highlight how the Orchestrator’s ability to blend specialized expertise can lead to the generation of solutions that redefine what’s achievable, opening up new avenues for innovation and problem-solving.
This vision is driven by the principle that specialization within language models unlocks unprecedented levels of effectiveness and efficiency in the world of LLMs. Despite the broad capabilities of current LLMs, they often exhibit redundancy in knowledge and a lack of depth in specific domains. Our response to this challenge is a strategic pivot towards ultra-efficient, domain-specific models. These models are not just generalists; they are masters of specific domains —from algebra to financial risks—each a master in its own right.
This is where our Orchestrator steps into the spotlight. It’s not just a tool; it’s a guide, showing us precisely where to focus our efforts. The Orchestrator’s ability to assess and understand the performance of existing models allows us to identify gaps in the AI landscape. It pinpoints domains where no existing expert excels or where larger models are inefficiently juggling tasks. This insight is invaluable, enabling us to strategically develop domain-specific models where they are needed most.
Engage with Leeroo Orchestrator V1
As we gradually roll out access to Leeroo V1 API on a first-come, first-served basis, we’re eager for you to experience the capabilities of Leeroo firsthand and encourage interested parties to request access. We’re open to discussions about the potential for tailored Virtual Private Cloud (VPC) or on-premise installations. Our production system, currently based on VLLM on EC2 and SageMaker, is expanding to accommodate more cloud providers.
As we start training Leeroo V2, your insights are invaluable to us. Whether you’re a researcher in AI, an AI practitioner, or simply an enthusiastic user, we’re eager to integrate your ideas into the next iteration of Leeroo. We invite you to share your thoughts and suggestions with us and join us in shaping the future of AI together.
Additionally, the deployment of Leeroo V1, as well as the training of Leeroo V2, require significant GPU resources. If you are able to support us in this area, please reach out.
For a deeper dive into the Leeroo Orchestrator V1, refer to our publication, and github.
Don’t forget to register to our API Access form.