MobileLLM

From MgmtWiki
Jump to: navigation, search

Full Title or Meme

MobileLLM is a research effort that optimizes sub-billion parameter language models for on-device Small Model Approachs.

Context

1. **Motivation**: The need for efficient large language models (LLMs) on mobile devices has grown due to increasing cloud costs and latency concerns. Mobile deployment often requires models with fewer than a billion parameters.

2. **Model Architecture Matters**: Contrary to the prevailing belief that data and parameter quantity solely determine model quality, the study emphasizes the significance of model architecture for sub-billion scale LLMs.

3. **MobileLLM Baseline**: Leveraging deep and thin architectures, along with embedding sharing and grouped-query attention mechanisms, the researchers establish a strong baseline network called **MobileLLM**. It achieves a remarkable 2.7% to 4.3% accuracy boost over preceding 125M/350M state-of-the-art models¹.

4. **Layer Sharing**: The paper proposes an immediate block-wise weight sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as **MobileLLM-LS**, demonstrate further accuracy enhancements compared to MobileLLM 125M/350M¹.

5. **Performance**: MobileLLM sets a new state-of-the-art (SOTA) performance for sub-billion parameter models. The MobileLLM-LS models, employing layer-sharing, further enhance accuracy while ensuring compatibility with on-device constraints².

6. **Comparison to Other Models**: It would be interesting to see a comparison with small encoder-decoder models like **instructionRoBERTa** or **flan-T5**¹.

Source: Conversation with Copilot, 7/7/2024

(1) MobileLLM: Optimizing Sub-billion Parameter Language Models for On .... https://huggingface.co/papers/2402.14905.
(2) MobileLLM: Optimizing Sub-billion Parameter Language Models for On .... https://www.emergentmind.com/papers/2402.14905.
(3) Abstract arXiv:2402.14905v2 [cs.LG] 27 Jun 2024. https://arxiv.org/pdf/2402.14905.
(4) MobileLLM:Optimizing Sub-billion Parameter Language Modelsfor On-Device .... https://cms.tinyml.org/wp-content/uploads/talks2023/GenAI_Forum_-Zechun-Liu_240327.pdf.

References