Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a fea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural networks 2024-11, Vol.179, p.106567, Article 106567
Hauptverfasser:	Tian, Yingjie, Xu, Shaokai, Li, Muyang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Decoupling Graph knowledge distillation Graph neural networks Knowledge Logistic Models Machine Learning Multi-layer perceptrons Neural Networks, Computer
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	106567
container_title	Neural networks
container_volume	179
creator	Tian, Yingjie Xu, Shaokai Li, Muyang
description	While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.
doi_str_mv	10.1016/j.neunet.2024.106567
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3087353080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S089360802400491X</els_id><sourcerecordid>3087353080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</originalsourceid><addsrcrecordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3087353080</pqid></control><display><type>article</type><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><source>MEDLINE</source><source>Access via ScienceDirect (Elsevier)</source><creator>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creator><creatorcontrib>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</creatorcontrib><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><identifier>ISSN: 0893-6080</identifier><identifier>ISSN: 1879-2782</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2024.106567</identifier><identifier>PMID: 39089155</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Algorithms ; Decoupling ; Graph knowledge distillation ; Graph neural networks ; Knowledge ; Logistic Models ; Machine Learning ; Multi-layer perceptrons ; Neural Networks, Computer</subject><ispartof>Neural networks, 2024-11, Vol.179, p.106567, Article 106567</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</cites><orcidid>0000-0002-4675-0398</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neunet.2024.106567$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39089155$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</description><subject>Algorithms</subject><subject>Decoupling</subject><subject>Graph knowledge distillation</subject><subject>Graph neural networks</subject><subject>Knowledge</subject><subject>Logistic Models</subject><subject>Machine Learning</subject><subject>Multi-layer perceptrons</subject><subject>Neural Networks, Computer</subject><issn>0893-6080</issn><issn>1879-2782</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1P3DAQhi1E1d0u_Qeo8pFLtv6IHYdDJQT0Q1pED-VsOfEk623W3toJiH9foywcOY1m5n3f0TwInVOypoTKr7u1h8nDuGaElXkkhaxO0JKqqi5YpdgpWhJV80ISRRboU0o7QohUJf-IFrzOKyrEEpkbaMN0GMDiPprDFv_14Sl3PWDr0uiGwYwu-Et8hXvwEM2Ah9C7MRWNSdm0h3EbLO5CxAOY6J3v8d3md8LBz4HpDH3ozJDg87Gu0MP32z_XP4vN_Y9f11ebomUlHQvBO8OZEI21VIKqbQ2GVlIa3oG1tm244IoJKhVnpWhKQi1IwlTdmDo7KF-hizn3EMO_CdKo9y61kB_wEKakOVFVzsg0srScpW0MKUXo9CG6vYnPmhL9Alfv9AxXv8DVM9xs-3K8MDV7sG-mV5pZ8G0WQP7z0UHUqXXgW7AuQjtqG9z7F_4Dte-NDA</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>Tian, Yingjie</creator><creator>Xu, Shaokai</creator><creator>Li, Muyang</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></search><sort><creationdate>202411</creationdate><title>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</title><author>Tian, Yingjie ; Xu, Shaokai ; Li, Muyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-53fa3255bdd16e89d9ea1766a3fedddcb3538251683245b401de60289ba9d1613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Decoupling</topic><topic>Graph knowledge distillation</topic><topic>Graph neural networks</topic><topic>Knowledge</topic><topic>Logistic Models</topic><topic>Machine Learning</topic><topic>Multi-layer perceptrons</topic><topic>Neural Networks, Computer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tian, Yingjie</creatorcontrib><creatorcontrib>Xu, Shaokai</creatorcontrib><creatorcontrib>Li, Muyang</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tian, Yingjie</au><au>Xu, Shaokai</au><au>Li, Muyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2024-11</date><risdate>2024</risdate><volume>179</volume><spage>106567</spage><pages>106567-</pages><artnum>106567</artnum><issn>0893-6080</issn><issn>1879-2782</issn><eissn>1879-2782</eissn><abstract>While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN’s prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>39089155</pmid><doi>10.1016/j.neunet.2024.106567</doi><orcidid>https://orcid.org/0000-0002-4675-0398</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0893-6080
ispartof	Neural networks, 2024-11, Vol.179, p.106567, Article 106567
issn	0893-6080 1879-2782 1879-2782
language	eng
recordid	cdi_proquest_miscellaneous_3087353080
source	MEDLINE; Access via ScienceDirect (Elsevier)
subjects	Algorithms Decoupling Graph knowledge distillation Graph neural networks Knowledge Logistic Models Machine Learning Multi-layer perceptrons Neural Networks, Computer
title	Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T22%3A27%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decoupled%20graph%20knowledge%20distillation:%20A%20general%20logits-based%20method%20for%20learning%20MLPs%20on%20graphs&rft.jtitle=Neural%20networks&rft.au=Tian,%20Yingjie&rft.date=2024-11&rft.volume=179&rft.spage=106567&rft.pages=106567-&rft.artnum=106567&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2024.106567&rft_dat=%3Cproquest_cross%3E3087353080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3087353080&rft_id=info:pmid/39089155&rft_els_id=S089360802400491X&rfr_iscdi=true