Mixture of Weak & Strong Experts on Graphs

Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a lig...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zeng, Hanqing, Lyu, Hanjia, Hu, Diyi, Xia, Yinglong, Luo, Jiebo
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zeng, Hanqing Lyu, Hanjia Hu, Diyi Xia, Yinglong Luo, Jiebo
description	Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).
doi_str_mv	10.48550/arxiv.2311.05185
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_05185</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_05185</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-84fd58cd1614e535e41801739f647acc17c1d1abb24522dbe285afadb4e0fce43</originalsourceid><addsrcrecordid>eNotzrFOwzAQgGEvDKj0AZjqqQNSgs_2JWasqtIiFXVopY7RxT6XCGgiJ6Dw9qiF6d9-fULcg8qtQ1SPlMbmO9cGIFcIDm_Fw2szDl-JZRvlkeldzuV-SO35JFdjx2noZXuW60TdW38nbiJ99Dz970QcnleH5Sbb7tYvy8U2o6LEzNkY0PkABVhGg2zBKSjNUyxsSd5D6SEA1bW2qHWoWTukSKG2rKJnayZi9re9YqsuNZ-UfqoLurqizS_gujsj</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mixture of Weak & Strong Experts on Graphs</title><source>arXiv.org</source><creator>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</creator><creatorcontrib>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</creatorcontrib><description>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</description><identifier>DOI: 10.48550/arxiv.2311.05185</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.05185$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.05185$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zeng, Hanqing</creatorcontrib><creatorcontrib>Lyu, Hanjia</creatorcontrib><creatorcontrib>Hu, Diyi</creatorcontrib><creatorcontrib>Xia, Yinglong</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><title>Mixture of Weak & Strong Experts on Graphs</title><description>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFOwzAQgGEvDKj0AZjqqQNSgs_2JWasqtIiFXVopY7RxT6XCGgiJ6Dw9qiF6d9-fULcg8qtQ1SPlMbmO9cGIFcIDm_Fw2szDl-JZRvlkeldzuV-SO35JFdjx2noZXuW60TdW38nbiJ99Dz970QcnleH5Sbb7tYvy8U2o6LEzNkY0PkABVhGg2zBKSjNUyxsSd5D6SEA1bW2qHWoWTukSKG2rKJnayZi9re9YqsuNZ-UfqoLurqizS_gujsj</recordid><startdate>20231109</startdate><enddate>20231109</enddate><creator>Zeng, Hanqing</creator><creator>Lyu, Hanjia</creator><creator>Hu, Diyi</creator><creator>Xia, Yinglong</creator><creator>Luo, Jiebo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231109</creationdate><title>Mixture of Weak & Strong Experts on Graphs</title><author>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-84fd58cd1614e535e41801739f647acc17c1d1abb24522dbe285afadb4e0fce43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeng, Hanqing</creatorcontrib><creatorcontrib>Lyu, Hanjia</creatorcontrib><creatorcontrib>Hu, Diyi</creatorcontrib><creatorcontrib>Xia, Yinglong</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeng, Hanqing</au><au>Lyu, Hanjia</au><au>Hu, Diyi</au><au>Xia, Yinglong</au><au>Luo, Jiebo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mixture of Weak & Strong Experts on Graphs</atitle><date>2023-11-09</date><risdate>2023</risdate><abstract>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</abstract><doi>10.48550/arxiv.2311.05185</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2311.05185
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2311_05185
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Mixture of Weak & Strong Experts on Graphs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T13%3A14%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mixture%20of%20Weak%20&%20Strong%20Experts%20on%20Graphs&rft.au=Zeng,%20Hanqing&rft.date=2023-11-09&rft_id=info:doi/10.48550/arxiv.2311.05185&rft_dat=%3Carxiv_GOX%3E2311_05185%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true