原推:Love how a 2.5x per annum cost decline expectation gets ~delivered within the first 3 days of the new year
(for inference anyway) https://t.co/clYHpDEti1
A much needed paper. GPT-family models can be pruned 50%+ sparsity in one-shot, without any retraining and minimal loss of accuracy:
– Achieves 60% sparsity on OPT-175B and BLOOM-176B
– 100 billion weights can be ignored at inference time
? Paper: arxiv.org/abs/2301.00774 https://t.co/z0fk39oRwO