In Natural Language Processing (NLP), developing Large Language Models (LLMs) has proven to be a transformative and revolutionary endeavor. These models, equipped with massive parameters and trained on extensive datasets, have demonstrated unprecedented proficiency across many NLP tasks. However, the exorbitant costs of training these models from scratch have prompted researchers to explore alternative strategies. A pioneering strategy that has emerged to enhance the capabilities of Large Language Models (LLMs) is knowledge fusion, a concept explored in-depth in the research paper titled Knowledge “Fusion of Large Language Models” by Wan, Huang, Cai, Quan, and others.
Recognizing the need to address redundancy in the functionalities of newly developed LLMs, this innovative approach offers a compelling solution. The paper delves into the intricate process of merging the knowledge of various LLMs, presenting a promising avenue to refine and amplify the performance of these language models.
The fundamental idea is to combine the strengths and capabilities of existing LLMs, transcending the limitations of individual models. By merging existing pre-trained LLMs, we can create a more powerful model that surpasses the individual strengths of each source model.
The paper begins by highlighting the challenges and costs of training LLMs from scratch. The authors propose knowledge fusion as an efficient and cost-effective alternative. Rather than merging weights directly, the approach focuses on externalizing the collective knowledge of source LLMs and transferring it to a target model. The research introduces FUSELLM, a method that leverages the generative distributions of source LLMs, aiming to enhance the target model’s capabilities beyond any individual source LLM.
The primary objective of LLMs fusion is to externalize the inherent knowledge embedded within multiple source LLMs and integrate their capabilities into a target LLM. The paper emphasizes stimulating LLMs to manifest knowledge by predicting the next token in a given text. The probabilistic distributions generated by different source LLMs for the same text are then fused into a single representation, creating a unified probabilistic understanding over the text.
The paper introduces two crucial implementation details to ensure effective knowledge fusion: token alignment and fusion strategies.
Token alignment is achieved through a Minimum Edit Distance (MinED) strategy, enhancing the success rate of aligning tokens from different LLMs.
Fusion strategies, namely MinCE and AvgCE, evaluate the quality of different LLMs and assign varying levels of importance to their distribution matrices based on cross-entropy scores.
The research conducts experiments on a challenging scenario of LLMs fusion, where the source models exhibit minimal commonalities. Three representative open-source models – Llama-2, OpenLLaMA, and MPT – are selected as source LLMs for fusion, with another Llama-2 serving as the target LLM. The experiments span benchmarks assessing reasoning, commonsense, and code generation capabilities.
The comprehensive evaluation of FUSELLM’s performance across various benchmarks provides valuable insights into its efficacy. Table 1 showcases the overall results of FUSELLM in comparison to baseline methods on the Big-Bench Hard (BBH). Notably, FUSELLM demonstrates an average relative performance gain of 5.16% over the original Llama-2 across all 27 tasks. The specific tasks, such as Hyperbaton, show substantial enhancements, underscoring FUSELLM’s ability to leverage collective knowledge for improved performance.
Moving on to the Common Sense (CS) benchmark in Table 2, FUSELLM consistently outperforms baselines across all tasks, achieving a relative performance improvement of 1.25% over Llama-2. This trend holds true even in challenging tasks like ARC-challenge and OpenBookQA, where FUSELLM exhibits significant improvements, highlighting its effectiveness in addressing intricate problems.
In the context of code generation, Table 3 illustrates the zero-shot performance of FUSELLM on the MultiPL-E (ME) benchmark. Outperforming Llama-2 in 9 out of 10 tasks, FUSELLM showcases a notable enhancement in the pass@1 score, particularly for specific programming languages like R. Despite a performance gap compared to OpenLLaMA or MPT, FUSELLM still achieves a remarkable average performance gain of 6.36%, surpassing the 1.37% improvement observed in Llama-2 CLM.
A crucial aspect of FUSELLM’s success lies in its ability to utilize fused probabilistic distributions from multiple LLMs. Figure 2 compares the few-shot Chain-of-Thought (CoT) performance between Llama-2 CLM and FUSELLM with varying scales of training data on BBH. FUSELLM significantly enhances the exact match (EM) accuracy by 2.5%, achieving the best performance of Llama-2 CLM within 0.52 billion tokens. This represents a 3.9× reduction in token requirements compared to Llama-2 CLM, indicating that the probabilistic distributions derived from LLMs contain more readily learnable knowledge than the original text sequences, thereby accelerating the optimization process.
Delving into the implementation details of FUSELLM reveals critical considerations for its success. The number of source LLMs, token alignment criteria, and the choice of fusion function play pivotal roles in shaping FUSELLM’s performance.
Comparative analyses with traditional techniques like knowledge distillation and ensemble/merging shed light on FUSELLM’s unique strengths.
Also read: Knowledge Distillation: Theory and End to End Case Study
You can find the code, model weights, and data public here: GitHub FUSELLM
The paper concludes with compelling results, showcasing the effectiveness of FUSELLM over individual source LLMs and established baselines. The study opens up a promising avenue for future exploration in LLMs fusion. The findings emphasize the potential of combining the diverse capabilities and strengths of structurally different LLMs, shedding light on a cost-effective and powerful approach to developing large language models.
The knowledge fusion of large language models is an innovative solution in a world where the demand for advanced natural language processing capabilities continues to rise. This research paves the way for future endeavors in creating unified models that harness the collective intelligence of diverse LLMs, pushing the boundaries of what is achievable in the realm of natural language understanding and generation.
I’m eager to know your opinions regarding the Knowledge Fusion of Large Language Models (LLMs). Feel free to share your insights on any other noteworthy and informative papers you may have encountered in the comments section.
Also read: A Comprehensive Guide to Fine-Tuning Large Language Models