AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference

Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, Janghwan, Park, Jiwoong, Kim, Jinseok, Kim, Yongjik, Oh, Jungju, Oh, Jinwook, Choi, Jungwook
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lee, Janghwan Park, Jiwoong Kim, Jinseok Kim, Yongjik Oh, Jungju Oh, Jinwook Choi, Jungwook
description	Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.
doi_str_mv	10.48550/arxiv.2411.09909
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_09909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_09909</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_099093</originalsourceid><addsrcrecordid>eNqFzrEOgjAUheEuDkZ9ACfvC4BFIRE3YiSaQGRgcCO1afEm0Jq2ory9QtydzvKf5CNkGVA_3EURXTPzxs7fhEHg0zim8ZTIJL-mRbiHkrWoaki4w4451AouT9egMBZe6O6Q2L5thTPIIUdutOWsGQ5po7-5qr1Co3IgtYHQu6GDLMvhrKQwQnExJxPJGisWv52RVXosDydvFFUPgy0zfTXIqlG2_V98AIAAQ5E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><source>arXiv.org</source><creator>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</creator><creatorcontrib>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</creatorcontrib><description>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</description><identifier>DOI: 10.48550/arxiv.2411.09909</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.09909$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.09909$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Janghwan</creatorcontrib><creatorcontrib>Park, Jiwoong</creatorcontrib><creatorcontrib>Kim, Jinseok</creatorcontrib><creatorcontrib>Kim, Yongjik</creatorcontrib><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Oh, Jinwook</creatorcontrib><creatorcontrib>Choi, Jungwook</creatorcontrib><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><description>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzrEOgjAUheEuDkZ9ACfvC4BFIRE3YiSaQGRgcCO1afEm0Jq2ory9QtydzvKf5CNkGVA_3EURXTPzxs7fhEHg0zim8ZTIJL-mRbiHkrWoaki4w4451AouT9egMBZe6O6Q2L5thTPIIUdutOWsGQ5po7-5qr1Co3IgtYHQu6GDLMvhrKQwQnExJxPJGisWv52RVXosDydvFFUPgy0zfTXIqlG2_V98AIAAQ5E</recordid><startdate>20241114</startdate><enddate>20241114</enddate><creator>Lee, Janghwan</creator><creator>Park, Jiwoong</creator><creator>Kim, Jinseok</creator><creator>Kim, Yongjik</creator><creator>Oh, Jungju</creator><creator>Oh, Jinwook</creator><creator>Choi, Jungwook</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241114</creationdate><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><author>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_099093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Janghwan</creatorcontrib><creatorcontrib>Park, Jiwoong</creatorcontrib><creatorcontrib>Kim, Jinseok</creatorcontrib><creatorcontrib>Kim, Yongjik</creatorcontrib><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Oh, Jinwook</creatorcontrib><creatorcontrib>Choi, Jungwook</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Janghwan</au><au>Park, Jiwoong</au><au>Kim, Jinseok</au><au>Kim, Yongjik</au><au>Oh, Jungju</au><au>Oh, Jinwook</au><au>Choi, Jungwook</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</atitle><date>2024-11-14</date><risdate>2024</risdate><abstract>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</abstract><doi>10.48550/arxiv.2411.09909</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.09909
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_09909
source	arXiv.org
subjects	Computer Science - Artificial Intelligence
title	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T02%3A04%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AMXFP4:%20Taming%20Activation%20Outliers%20with%20Asymmetric%20Microscaling%20Floating-Point%20for%204-bit%20LLM%20Inference&rft.au=Lee,%20Janghwan&rft.date=2024-11-14&rft_id=info:doi/10.48550/arxiv.2411.09909&rft_dat=%3Carxiv_GOX%3E2411_09909%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true