AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference

Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Janghwan, Park, Jiwoong, Kim, Jinseok, Kim, Yongjik, Oh, Jungju, Oh, Jinwook, Choi, Jungwook
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lee, Janghwan
Park, Jiwoong
Kim, Jinseok
Kim, Yongjik
Oh, Jungju
Oh, Jinwook
Choi, Jungwook
description Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.
doi_str_mv 10.48550/arxiv.2411.09909
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_09909</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_09909</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_099093</originalsourceid><addsrcrecordid>eNqFzrEOgjAUheEuDkZ9ACfvC4BFIRE3YiSaQGRgcCO1afEm0Jq2ory9QtydzvKf5CNkGVA_3EURXTPzxs7fhEHg0zim8ZTIJL-mRbiHkrWoaki4w4451AouT9egMBZe6O6Q2L5thTPIIUdutOWsGQ5po7-5qr1Co3IgtYHQu6GDLMvhrKQwQnExJxPJGisWv52RVXosDydvFFUPgy0zfTXIqlG2_V98AIAAQ5E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><source>arXiv.org</source><creator>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</creator><creatorcontrib>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</creatorcontrib><description>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</description><identifier>DOI: 10.48550/arxiv.2411.09909</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.09909$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.09909$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Janghwan</creatorcontrib><creatorcontrib>Park, Jiwoong</creatorcontrib><creatorcontrib>Kim, Jinseok</creatorcontrib><creatorcontrib>Kim, Yongjik</creatorcontrib><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Oh, Jinwook</creatorcontrib><creatorcontrib>Choi, Jungwook</creatorcontrib><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><description>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzrEOgjAUheEuDkZ9ACfvC4BFIRE3YiSaQGRgcCO1afEm0Jq2ory9QtydzvKf5CNkGVA_3EURXTPzxs7fhEHg0zim8ZTIJL-mRbiHkrWoaki4w4451AouT9egMBZe6O6Q2L5thTPIIUdutOWsGQ5po7-5qr1Co3IgtYHQu6GDLMvhrKQwQnExJxPJGisWv52RVXosDydvFFUPgy0zfTXIqlG2_V98AIAAQ5E</recordid><startdate>20241114</startdate><enddate>20241114</enddate><creator>Lee, Janghwan</creator><creator>Park, Jiwoong</creator><creator>Kim, Jinseok</creator><creator>Kim, Yongjik</creator><creator>Oh, Jungju</creator><creator>Oh, Jinwook</creator><creator>Choi, Jungwook</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241114</creationdate><title>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</title><author>Lee, Janghwan ; Park, Jiwoong ; Kim, Jinseok ; Kim, Yongjik ; Oh, Jungju ; Oh, Jinwook ; Choi, Jungwook</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_099093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Janghwan</creatorcontrib><creatorcontrib>Park, Jiwoong</creatorcontrib><creatorcontrib>Kim, Jinseok</creatorcontrib><creatorcontrib>Kim, Yongjik</creatorcontrib><creatorcontrib>Oh, Jungju</creatorcontrib><creatorcontrib>Oh, Jinwook</creatorcontrib><creatorcontrib>Choi, Jungwook</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Janghwan</au><au>Park, Jiwoong</au><au>Kim, Jinseok</au><au>Kim, Yongjik</au><au>Oh, Jungju</au><au>Oh, Jinwook</au><au>Choi, Jungwook</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference</atitle><date>2024-11-14</date><risdate>2024</risdate><abstract>Scaling Large Language Models (LLMs) with extended context lengths has increased the need for efficient low-bit quantization to manage their substantial computational demands. However, reducing precision to 4 bits frequently degrades performance due to activation outliers. To address this, we propose Asymmetric Microscaling 4-bit Floating-Point (AMXFP4) for efficient LLM inference. This novel data format leverages asymmetric shared scales to mitigate outliers while naturally capturing the asymmetry introduced by group-wise quantization. Unlike conventional 4-bit quantization methods that rely on data rotation and costly calibration, AMXFP4 uses asymmetric shared scales for direct 4-bit casting, achieving near-ideal quantization accuracy across various LLM tasks, including multi-turn conversations, long-context reasoning, and visual question answering. Our AMXFP4 format significantly outperforms MXFP4 and other leading quantization techniques, enabling robust, calibration-free 4-bit inference.</abstract><doi>10.48550/arxiv.2411.09909</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.09909
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_09909
source arXiv.org
subjects Computer Science - Artificial Intelligence
title AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T02%3A04%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AMXFP4:%20Taming%20Activation%20Outliers%20with%20Asymmetric%20Microscaling%20Floating-Point%20for%204-bit%20LLM%20Inference&rft.au=Lee,%20Janghwan&rft.date=2024-11-14&rft_id=info:doi/10.48550/arxiv.2411.09909&rft_dat=%3Carxiv_GOX%3E2411_09909%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true