← Back to Innovations
Model Language Experimental

BitNet b1.58 2B4T

Native 1-bit LLM at 2B Scale

15,908 USERS
Try BitNet b1.58 2B4T on Microsoft Foundry → Try on Microsoft Foundry →
BitNet b1.58 2B4T

About BitNet b1.58 2B4T

BitNet b1.58 2B4T is the first open-source, native 1-bit large language model trained from scratch at 2-billion-parameter scale, using W1.58A8 quantization (ternary weights, 8-bit activations) over 4 trillion tokens. Inference is served through the bitnet.cpp framework, which provides optimized CPU kernels that deliver 1.37×–5.07× speed-ups on ARM and 2.37×–6.17× on x86, with 55–82% energy reduction depending on hardware. The same approach scales to a 100B-parameter variant capable of generating 5–7 tokens per second on commodity CPUs, enabling local deployment scenarios previously infeasible for LLMs.

BitNet reframes extreme quantization as a training-time design choice rather than a post-hoc compression step, demonstrating that 1-bit models can match the quality of full-precision peers while dramatically reducing memory, energy, and latency. By open-sourcing both the model weights and the bitnet.cpp inference stack, Microsoft enables a new class of deployments on edge devices, offline workflows, and resource-constrained environments. The research challenges the long-held assumption that model expressiveness scales tightly with weight bitwidth and points toward a future in which serious language-model capability fits on devices that today cannot host even a 7B model.

Key capabilities

  • Native ternary weights {-1, 0, +1} trained from scratch
  • W1.58A8 quantization across 4T training tokens
  • 2B-parameter open-source 1-bit LLM
  • Matches full-precision peers on memory, energy, and latency
  • Optimized CPU/GPU inference via bitnet.cpp kernels
Technology Stack
PyTorch bitnet.cpp Custom CUDA kernels
Technology Stack
PyTorch bitnet.cpp Custom CUDA kernels