Open source vs. commercial API-based: the philosophical frictions of AI foundation models
As the space of foundation models continues to evolve rapidly, two major philosophical frictions have surfaced: the friction between massively large and smaller models, and the friction between open source and commercial API-based distribution. In this blog post, let’s explore these frictions, their implications, and their future impact on the development and adoption of foundation models.
What are AI foundation models?
AI foundation models are machine learning models that serve as a general-purpose foundation for a wide range of artificial intelligence (AI) applications. These models, often based on deep learning architectures, are pre-trained on massive amounts of data and can be fine-tuned for specific tasks with relatively small amounts of task-specific data. The main advantage of foundation models is their ability to learn complex patterns and representations from data, enabling them to perform well across various tasks and domains. These models in current implementations break down into two types: large-scale and small-scale.
Foundation models are typically built using vast amounts of training data, often encompassing diverse information sources such as text, images, audio, and video. The large-scale nature of these models helps them to learn intricate patterns and representations, leading to better performance on a wide range of tasks.
Small-scale foundation models, as opposed to large-scale ones, are machine learning models that serve as the foundation for AI applications but are designed to be more compact and efficient in terms of computational resources and memory footprint. These models still leverage pre-training and fine-tuning techniques but are designed to be more accessible and easier to deploy, particularly on devices with limited resources, such as mobile phones and edge devices.
Some key characteristics of AI foundation models include:
- Pre-trained: AI foundation models undergo an extensive pre-training phase, during which they learn general-purpose knowledge from the training data. This pre-training enables the models to develop a strong understanding of various domains, making them capable of handling a wide array of tasks.
- Fine-tuning: After pre-training, foundation models can be fine-tuned for specific tasks or domains using a smaller amount of task-specific data. This fine-tuning process allows the models to adapt their general knowledge to the nuances of the target task, often leading to improved performance.
- Transfer learning: AI foundation models leverage transfer learning, which is the process of applying knowledge learned from one task to another, often related task. Transfer learning enables foundation models to perform well across various tasks and domains, even with limited task-specific data.
- Multimodal: Some foundation models are designed to handle multiple modalities, such as text, images, and audio, simultaneously. These multimodal models can learn complex relationships between different types of data, enabling them to perform tasks that require an understanding of multiple data modalities.
Evolution and emerging frictions
Innovation in the foundation models space is accelerating at an unprecedented rate, giving rise to new models that exhibit remarkable cognitive capabilities. As the market for these models evolves, two major frictions have emerged, driving philosophical divisions within the industry:
- The friction between massively large and smaller models
- The friction between open-source and commercial API-based distribution
The friction between massively large models and smaller models
Historically, larger foundation models have outperformed their smaller counterparts in terms of cognitive capabilities. Recently, however, the emergence of models like LLaMA and variations with RLHF has demonstrated that smaller models can achieve performance levels comparable to larger alternatives. This development raises several questions:
- What are the trade-offs between large and small models in terms of performance, efficiency, and cost?
- Can smaller models continue to close the performance gap with larger models, and under what conditions?
Examples of AI large-scale foundation models include OpenAI’s GPT-3 and 4, Google’s BERT, and Facebook’s RoBERTa. These models have demonstrated impressive performance in various tasks, such as natural language processing, computer vision, and speech recognition, among others. However, despite their impressive capabilities, AI foundation models also raise concerns about ethical issues, such as fairness, bias, and the potential for misuse. As a result, ongoing research and development efforts are focused on addressing these challenges while continuing to improve the capabilities of AI foundation models.
Examples of AI small-scale foundation models include Google’s DistilBERT, MobileBERT and TinyBERT; EfficientNet; SqueezeNet; Databrick’s LLaMA; Meta’s FastText and Mobile Detectron2. These models serve as the foundation for AI applications but are designed to be more compact and efficient in terms of computational resources and memory footprint. These models still leverage pre-training and fine-tuning techniques but are designed to be more accessible and easier to deploy, particularly on devices with limited resources, such as mobile phones and edge devices.
The friction between open source and commercial API-based distribution
The debate between open source and API-based distribution of foundation models is reminiscent of the iOS vs. Android debate. Commercial API models like GPT-4, LaMDA, and Claude are contrasted with open source models like Dolly 2 and Stable Diffusion. The rationale for this debate extends beyond the commercial model and encompasses concerns such as:
- Fairness and accessibility: Does open source distribution promote greater access to cutting-edge AI technologies for a broader range of stakeholders?
- Safety concerns: How do open source and commercial API-based distribution models address the potential risks associated with the misuse of powerful AI technologies?
The interplay between size and distribution models
Interestingly, the frictions between size and distribution models have given rise to two distinct camps. Vendors favoring large models also tend to rely on commercial API-based distribution, while open source models are often relatively smaller. Major players in these two camps include:
- Large models/commercial API-based distribution: OpenAI, Anthropic, Microsoft, Google
- Small models/open-source distribution: Databricks, Stability AI, Meta (formerly Facebook)
The future of foundation models
The philosophical frictions explored in this blog post are likely to evolve over time as new technologies and distribution models emerge. We may soon see open source distributions of large models or smaller models available exclusively via APIs. It is important to remember that generative AI is unlike any other market, and these frictions will continue to shape the development, distribution, and adoption of foundation models in unique and potentially unexpected ways.
The frictions between large and small models, and open source and commercial API-based distribution, are shaping the development and adoption of foundation models within the industry. Understanding the implications of these frictions, their interplay, and their potential impact on future models is essential for stakeholders to navigate the rapidly evolving landscape of generative AI technologies. As the market matures, it will be crucial for researchers, developers, and organizations to monitor these philosophical frictions and adapt their strategies accordingly.