When Tsinghua University’s Knowledge Engineering Group and Zhipu AI open-sourced ChatGLM-6B on March 14, 2023, they noted that with quantization “users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level).” The bilingual 6.2-billion-parameter chat model could therefore run on an ordinary gaming GPU rather than a data-center accelerator, which was a large part of why it spread so quickly.
ChatGLM-6B could run on a 6 GB consumer GPU
Sources
Last verified June 7, 2026