Neural Magic(@neuralmagic) 's Twitter Profileg
Neural Magic

@neuralmagic

Deploy the fastest ML on CPUs and GPUs using only software. GitHub: https://t.co/99a5S2627M #sparsity #opensource

ID:997536616481722369

linkhttps://neuralmagic.com calendar_today18-05-2018 17:57:17

721 Tweets

4,8K Followers

1,6K Following

Follow People
vLLM(@vllm_project) 's Twitter Profile Photo

One major production feature you should try out is Automatic Prefix Caching (APC). Thanks to the contribution from Neural Magic, vLLM can _automatically_ cache long system prompts or common prefixes. This drastically saves on the compute needed for multi-turn convos as well.

One major production feature you should try out is Automatic Prefix Caching (APC). Thanks to the contribution from @neuralmagic, vLLM can _automatically_ cache long system prompts or common prefixes. This drastically saves on the compute needed for multi-turn convos as well.
account_circle
vLLM(@vllm_project) 's Twitter Profile Photo

We are doubling our committer base for vLLM to ensure it is best-in-class and a truly community effort. This is just a start. Let's welcome Kaichao You, Philipp Moritz, Nick Hill, Roger Wang, Cade Daniel 🇺🇸, Robert Shaw as committers and thank you for your great work! 👏

We are doubling our committer base for vLLM to ensure it is best-in-class and a truly community effort. This is just a start. Let's welcome @KaichaoYou, @pcmoritz, @nickhill33, @rogerw0108, @cdnamz, @robertshaw21 as committers and thank you for your great work! 👏
account_circle
ridgelinevc(@ridgelinevc) 's Twitter Profile Photo

Generative AI is a treasure trove of opportunity. Has @NeuralMagic struck gold?

Dive into a Q&A with CEO Brian Stevens in TechBullion covering 👉

1️⃣ Neural Magic's technology story
2️⃣ Future generative AI capabilities
3️⃣ New strategic partnerships

twitter.com/TechBullion/st…

account_circle
Neural Magic(@neuralmagic) 's Twitter Profile Photo

What a busy 2024 for our product and engineering teams! They summarize it all in our Q1 product release blog: neuralmagic.com/blog/neural-ma…

From substantial advancements in AI model training, optimization, and deployment for LLMs on CPUs... to the launch of nm-vllm, which enables GPU…

account_circle
Neural Magic(@neuralmagic) 's Twitter Profile Photo

Had a great AI Day with Cerebras! Our CTOs, Mark Kurtz and Sean Lie, with their teams, released expertly-optimized Llama 2 7B models that have been sparsified for superior performance and memory: huggingface.co/collections/ne…

🙏 James Wang (in NYC) and Julie Choi for including us!

Had a great AI Day with @CerebrasSystems! Our CTOs, @markurtz_ and Sean Lie, with their teams, released expertly-optimized Llama 2 7B models that have been sparsified for superior performance and memory: huggingface.co/collections/ne… 🙏 @draecomino and Julie Choi for including us!
account_circle
Derrick Mwiti(@_mwitiderrick) 's Twitter Profile Photo

Have you had a chance to check out Marlin?

It's a state-of-the-art 4-bit quantized inference kernel that can deliver close to 4x speedups up to batch sizes of 16-32 tokens.

It is well-suited for larger-scale serving, speculative decoding, or advanced multi-inference schemes.

Have you had a chance to check out Marlin? It's a state-of-the-art 4-bit quantized inference kernel that can deliver close to 4x speedups up to batch sizes of 16-32 tokens. It is well-suited for larger-scale serving, speculative decoding, or advanced multi-inference schemes.
account_circle
TechBullion(@TechBullion) 's Twitter Profile Photo

Democratize AI Using Optimized CPUs As The Onramp To Generative AI: Interview with Neural Magic CEO Brian Stevens Neural Magic techbullion.com/democratize-ai… TechBullion

account_circle
Robert Blumofe(@RobertBlumofe) 's Twitter Profile Photo

Doing things smarter with math, algorithms, and software... That's what Akamai Technologies is all about. Akamai and Neural Magic team up to accelerate AI workloads on edge CPU servers siliconangle.com/2024/03/12/aka… via SiliconANGLE

account_circle
Dan Alistarh(@DAlistarh) 's Twitter Profile Photo

Happy to release AQLM, a new SOTA method for low-bitwidth LLM quantization, targeted to the “extreme” 2-3bit / parameter range.
With: black_samorez @efrantar and collaborators
Arxiv: arxiv.org/abs/2401.06118
Compression & Inference Code: github.com/Vahe1994/AQLM
Snapshots:
[1/n]

Happy to release AQLM, a new SOTA method for low-bitwidth LLM quantization, targeted to the “extreme” 2-3bit / parameter range. With: @black_samorez @efrantar and collaborators Arxiv: arxiv.org/abs/2401.06118 Compression & Inference Code: github.com/Vahe1994/AQLM Snapshots: [1/n]
account_circle
Neural Magic(@neuralmagic) 's Twitter Profile Photo

We've been taking popular fine-tuned LLMs from Hugging Face and applying to compress them 50% with sparsity and quantization to save on memory and compute during inference. Here are two examples:

Llama 2 7b chat: huggingface.co/neuralmagic/Ll…

Hermes 2 - Solar 10.7B:…

account_circle