Lossless LLM compression for efficient GPU inference via dynamic-length float

Lossless LLM compression for efficient GPU inference via dynamic-length float (arxiv.org)

×1.58 #5 | 326 points by CharlesW 13 hours ago | 106 comments | ▲ ▼ ??? ???

Story Stats

This chart shows the history of this story's rank on the Hacker News "Top" (Front) Page, "New" Page, and "Best" Page, as well as its raw rank given the Hacker News ranking formula.

This chart shows the history of this story's upvotes compared to the expected upvotes for stories shown at the same ranks and times.

This chart shows the history of this story's estimated true upvote rate: the predicted long-term ratio of upvotes to expected upvotes.