What do you think of Llama 3?

by @Pietro Casella written in September 2014

This article provides a point of view on Llama 3.0 specifically how it relates to GPT 4. I’ll approach this question from a few different angles, from formal to commercial. Was written after the launch of Llama 3 to help me answer the question “What do you think about Llama 3?”

What is new?

Llama 3 is Meta's latest model release, boasting state-of-the-art performance across numerous benchmarks. It is designed in a variety of shapes and sizes to suit different needs. Available on all major platforms, Llama 3 enhances production readiness through the inclusion of various trust and safety tools.

Why this matters?

As of its release we identify the following:

State of the Art for Open Source - According to the Llama 3 benchmark and its variants, these are arguably the best open-source options for each class. This implies a significant improvement in aspects such as reasoning.

<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" /> 💡 This means that developers can now build applications with more intelligence than ever before, using an open-source stack.

</aside>

Multiple Packagings - The plan to release in various sizes (70B, 8b) and forms (instruction fine tune, pre-trained), along with additional "packagings", has two main implications:

<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" /> 💡 Developers have the freedom to choose the best model-application fit, optimizing according to their needs, while maintaining core features like data foundation, language support, and general ability.

💡 It initiates multiple lineages of derivative models. Organizations can utilize the base version to create fine-tuned models specific to their businesses, starting from a superior baseline.

</aside>

Towards Production Readiness - Llama 3 is distributed with significant system-level AI-safety features as standard. This means the model has a lower likelihood of encountering common issues, and it comes equipped with tools to enhance safety. To elaborate:

<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" /> 💡 Developers can rely on this model more for their concerns. They will also receive a structured view on how to approach safety and security. While not perfect, it is unquestionably a step forward for the general application of AI.

</aside>

Data Focus - Llama 3 was trained on significantly more data, arguably the most crucial factor influencing its quality. Notably, while Llama 2 was trained on 2T tokens, Llama 3 was trained on 15 trillion tokens with higher quality and a more diverse data mix, including 5% from 30 non-English languages. This "chinchilla approach" of training on more data has proven to be a successful strategy for achieving better results. In this instance, it appears Meta has greatly surpassed the previous optimum for data versus performance, setting a new standard.

<aside> <img src="/icons/light-bulb_lightgray.svg" alt="/icons/light-bulb_lightgray.svg" width="40px" /> 💡 This emphasis on data will likely mean that developers can rely on improved abilities compared to before, and the mix of skills (e.g., reasoning, multilingual performance) will be superior.

</aside>

Virality and Network effects - Within the first few hours of the model's release, we observed thousands of descendant models. These models, trained or modified from the base Llama 3.0, include examples such as domain-specific fine-tunes, performance-tuned versions, platform ports, and more. In addition, the new licensing model promotes a network effect by requiring applications and derivatives to acknowledge their origin. This strategy will greatly boost Meta Llama 3's brand and cultivate a substantial following. A notable impact is that performance improvements driven by the performance-tuned versions are directly contributing to Llama 3's brand, even when these achievements are made by a third party.

Is it better than GPT4?

The comparison to GPT 4 is relevant in a variety of dimensions, I will dissect a few of the most important differentiation aspects: