Google’s TurboQuant has the internet joking about Pied Piper from HBO's "Silicon Valley." The compression algorithm promises ...
Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...
Pruna AI, a European startup that has been working on compression algorithms for AI models, is making its optimization framework open source on Thursday. Pruna AI has been creating a framework that ...
Researchers working on text-to-image AI have introduced a pair of techniques that could bring high-quality image generation out of the cloud and onto smartphones. SANA-Sprint, a one-step diffusion ...