Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a fea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2024-11, Vol.179, p.106567, Article 106567
Hauptverfasser: Tian, Yingjie, Xu, Shaokai, Li, Muyang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 106567
container_title Neural networks
container_volume 179
creator Tian, Yingjie
Xu, Shaokai
Li, Muyang
description While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.
doi_str_mv 10.1016/j.neunet.2024.106567
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3087353080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S089360802400491X</els_id><sourcerecordid>3087353080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</originalsourceid><addsrcrecordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3087353080</pqid></control><display><type>article</type><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creator><creatorcontrib>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creatorcontrib><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><identifier>ISSN: 0893-6080</identifier><identifier>ISSN: 1879-2782</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2024.106567</identifier><identifier>PMID: 39089155</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Algorithms ; Decoupling ; Graph knowledge distillation ; Graph neural networks ; Knowledge ; Logistic Models ; Machine Learning ; Multi-layer perceptrons ; Neural Networks, Computer</subject><ispartof>Neural networks, 2024-11, Vol.179, p.106567, Article 106567</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</cites><orcidid>0000-0002-4675-0398</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neunet.2024.106567$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39089155$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><subject>Algorithms</subject><subject>Decoupling</subject><subject>Graph knowledge distillation</subject><subject>Graph neural networks</subject><subject>Knowledge</subject><subject>Logistic Models</subject><subject>Machine Learning</subject><subject>Multi-layer perceptrons</subject><subject>Neural Networks, Computer</subject><issn>0893-6080</issn><issn>1879-2782</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>Tian, Yingjie</creator><creator>Xu, Shaokai</creator><creator>Li, Muyang</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></search><sort><creationdate>202411</creationdate><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><author>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Decoupling</topic><topic>Graph knowledge distillation</topic><topic>Graph neural networks</topic><topic>Knowledge</topic><topic>Logistic Models</topic><topic>Machine Learning</topic><topic>Multi-layer perceptrons</topic><topic>Neural Networks, Computer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tian, Yingjie</au><au>Xu, Shaokai</au><au>Li, Muyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2024-11</date><risdate>2024</risdate><volume>179</volume><spage>106567</spage><pages>106567-</pages><artnum>106567</artnum><issn>0893-6080</issn><issn>1879-2782</issn><eissn>1879-2782</eissn><abstract>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>39089155</pmid><doi>10.1016/j.neunet.2024.106567</doi><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0893-6080
ispartof Neural networks, 2024-11, Vol.179, p.106567, Article 106567
issn 0893-6080
1879-2782
1879-2782
language eng
recordid cdi_proquest_miscellaneous_3087353080
source MEDLINE; Access via ScienceDirect (Elsevier)
subjects Algorithms
Decoupling
Graph knowledge distillation
Graph neural networks
Knowledge
Logistic Models
Machine Learning
Multi-layer perceptrons
Neural Networks, Computer
title Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A27%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decoupled%20graph%20knowledge%20distillation:%20A%20general%20logits-based%20method%20for%20learning%20MLPs%20on%20graphs&rft.jtitle=Neural%20networks&rft.au=Tian,%20Yingjie&rft.date=2024-11&rft.volume=179&rft.spage=106567&rft.pages=106567-&rft.artnum=106567&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2024.106567&rft_dat=%3Cproquest_cross%3E3087353080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3087353080&rft_id=info:pmid/39089155&rft_els_id=S089360802400491X&rfr_iscdi=true