Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs
While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a fea...
Gespeichert in:
Veröffentlicht in: | Neural networks 2024-11, Vol.179, p.106567, Article 106567 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 106567 |
container_title | Neural networks |
container_volume | 179 |
creator | Tian, Yingjie Xu, Shaokai Li, Muyang |
description | While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD. |
doi_str_mv | 10.1016/j.neunet.2024.106567 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3087353080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S089360802400491X</els_id><sourcerecordid>3087353080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</originalsourceid><addsrcrecordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3087353080</pqid></control><display><type>article</type><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creator><creatorcontrib>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creatorcontrib><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><identifier>ISSN: 0893-6080</identifier><identifier>ISSN: 1879-2782</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2024.106567</identifier><identifier>PMID: 39089155</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Algorithms ; Decoupling ; Graph knowledge distillation ; Graph neural networks ; Knowledge ; Logistic Models ; Machine Learning ; Multi-layer perceptrons ; Neural Networks, Computer</subject><ispartof>Neural networks, 2024-11, Vol.179, p.106567, Article 106567</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</cites><orcidid>0000-0002-4675-0398</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neunet.2024.106567$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39089155$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><subject>Algorithms</subject><subject>Decoupling</subject><subject>Graph knowledge distillation</subject><subject>Graph neural networks</subject><subject>Knowledge</subject><subject>Logistic Models</subject><subject>Machine Learning</subject><subject>Multi-layer perceptrons</subject><subject>Neural Networks, Computer</subject><issn>0893-6080</issn><issn>1879-2782</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>Tian, Yingjie</creator><creator>Xu, Shaokai</creator><creator>Li, Muyang</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></search><sort><creationdate>202411</creationdate><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><author>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Decoupling</topic><topic>Graph knowledge distillation</topic><topic>Graph neural networks</topic><topic>Knowledge</topic><topic>Logistic Models</topic><topic>Machine Learning</topic><topic>Multi-layer perceptrons</topic><topic>Neural Networks, Computer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tian, Yingjie</au><au>Xu, Shaokai</au><au>Li, Muyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2024-11</date><risdate>2024</risdate><volume>179</volume><spage>106567</spage><pages>106567-</pages><artnum>106567</artnum><issn>0893-6080</issn><issn>1879-2782</issn><eissn>1879-2782</eissn><abstract>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>39089155</pmid><doi>10.1016/j.neunet.2024.106567</doi><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0893-6080 |
ispartof | Neural networks, 2024-11, Vol.179, p.106567, Article 106567 |
issn | 0893-6080 1879-2782 1879-2782 |
language | eng |
recordid | cdi_proquest_miscellaneous_3087353080 |
source | MEDLINE; Access via ScienceDirect (Elsevier) |
subjects | Algorithms Decoupling Graph knowledge distillation Graph neural networks Knowledge Logistic Models Machine Learning Multi-layer perceptrons Neural Networks, Computer |
title | Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A27%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decoupled%20graph%20knowledge%20distillation:%20A%20general%20logits-based%20method%20for%20learning%20MLPs%20on%20graphs&rft.jtitle=Neural%20networks&rft.au=Tian,%20Yingjie&rft.date=2024-11&rft.volume=179&rft.spage=106567&rft.pages=106567-&rft.artnum=106567&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2024.106567&rft_dat=%3Cproquest_cross%3E3087353080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3087353080&rft_id=info:pmid/39089155&rft_els_id=S089360802400491X&rfr_iscdi=true |