Experiment
BitNet
Developed by Microsoft Research, BitNet b1.58 2B4T is the first open-source, native 1-bit large language model (LLM) in which every parameter is ternary (i.e., -1, 0, 1), at a 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-bit LLMs can achieve performance comparable to leading open-weight, full-precision models of similar size, while offering substantial advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency.
Microsoft researchers have demonstrated that BitNet b1.58 2B4T achieves performance on par with leading open-weight, full-precision LLMs of similar size, while offering significant advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency. To facilitate further research and adoption, the model weights have been released along with open-source inference implementations for both GPU and CPU architectures.
A scatter plot graph showing the average score of 11 benchmarks on the y-axis and memory footprint in gigabytes (GB) on the x-axis for various open-weight large language models (LLMs). The Pareto Frontier of Open-weight LLMs is indicated by a blue dashed line. Data points include Qwen2-5.3B, BitNet b1.58 2B (marked with a red star as an outlier with low memory footprint), Qwen2-5.1-5B, SmolLM2-1.7B, MiniCPM-2B, LLaMa-2-13B, Gemma-3-13B, and Qwen2-0.5-8B. The image shows a comparison of different large language models based on their performance and memory usage, highlighting which models are more efficient or powerful relative to their memory footprint. This is relevant for understanding trade-offs in model design and deployment efficiency in machine learning applications.