Mixture of Weak & Strong Experts on Graphs

Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a lig...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zeng, Hanqing, Lyu, Hanjia, Hu, Diyi, Xia, Yinglong, Luo, Jiebo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zeng, Hanqing
Lyu, Hanjia
Hu, Diyi
Xia, Yinglong
Luo, Jiebo
description Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).
doi_str_mv 10.48550/arxiv.2311.05185
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_05185</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_05185</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-84fd58cd1614e535e41801739f647acc17c1d1abb24522dbe285afadb4e0fce43</originalsourceid><addsrcrecordid>eNotzrFOwzAQgGEvDKj0AZjqqQNSgs_2JWasqtIiFXVopY7RxT6XCGgiJ6Dw9qiF6d9-fULcg8qtQ1SPlMbmO9cGIFcIDm_Fw2szDl-JZRvlkeldzuV-SO35JFdjx2noZXuW60TdW38nbiJ99Dz970QcnleH5Sbb7tYvy8U2o6LEzNkY0PkABVhGg2zBKSjNUyxsSd5D6SEA1bW2qHWoWTukSKG2rKJnayZi9re9YqsuNZ-UfqoLurqizS_gujsj</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mixture of Weak &amp; Strong Experts on Graphs</title><source>arXiv.org</source><creator>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</creator><creatorcontrib>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</creatorcontrib><description>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</description><identifier>DOI: 10.48550/arxiv.2311.05185</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.05185$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.05185$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zeng, Hanqing</creatorcontrib><creatorcontrib>Lyu, Hanjia</creatorcontrib><creatorcontrib>Hu, Diyi</creatorcontrib><creatorcontrib>Xia, Yinglong</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><title>Mixture of Weak &amp; Strong Experts on Graphs</title><description>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrFOwzAQgGEvDKj0AZjqqQNSgs_2JWasqtIiFXVopY7RxT6XCGgiJ6Dw9qiF6d9-fULcg8qtQ1SPlMbmO9cGIFcIDm_Fw2szDl-JZRvlkeldzuV-SO35JFdjx2noZXuW60TdW38nbiJ99Dz970QcnleH5Sbb7tYvy8U2o6LEzNkY0PkABVhGg2zBKSjNUyxsSd5D6SEA1bW2qHWoWTukSKG2rKJnayZi9re9YqsuNZ-UfqoLurqizS_gujsj</recordid><startdate>20231109</startdate><enddate>20231109</enddate><creator>Zeng, Hanqing</creator><creator>Lyu, Hanjia</creator><creator>Hu, Diyi</creator><creator>Xia, Yinglong</creator><creator>Luo, Jiebo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231109</creationdate><title>Mixture of Weak &amp; Strong Experts on Graphs</title><author>Zeng, Hanqing ; Lyu, Hanjia ; Hu, Diyi ; Xia, Yinglong ; Luo, Jiebo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-84fd58cd1614e535e41801739f647acc17c1d1abb24522dbe285afadb4e0fce43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeng, Hanqing</creatorcontrib><creatorcontrib>Lyu, Hanjia</creatorcontrib><creatorcontrib>Hu, Diyi</creatorcontrib><creatorcontrib>Xia, Yinglong</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeng, Hanqing</au><au>Lyu, Hanjia</au><au>Hu, Diyi</au><au>Xia, Yinglong</au><au>Luo, Jiebo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mixture of Weak &amp; Strong Experts on Graphs</atitle><date>2023-11-09</date><risdate>2023</risdate><abstract>Realistic graphs contain both (1) rich self-features of nodes and (2) informative structures of neighborhoods, jointly handled by a Graph Neural Network (GNN) in the typical setup. We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN. To adapt the experts' collaboration to different target nodes, we propose a "confidence" mechanism based on the dispersion of the weak expert's prediction logits. The strong expert is conditionally activated in the low-confidence region when either the node's classification relies on neighborhood information, or the weak expert has low model quality. We reveal interesting training dynamics by analyzing the influence of the confidence function on loss: our training algorithm encourages the specialization of each expert by effectively generating soft splitting of the graph. In addition, our "confidence" design imposes a desirable bias toward the strong expert to benefit from GNN's better generalization capability. Mowst is easy to optimize and achieves strong expressive power, with a computation cost comparable to a single GNN. Empirically, Mowst on 4 backbone GNN architectures show significant accuracy improvement on 6 standard node classification benchmarks, including both homophilous and heterophilous graphs (https://github.com/facebookresearch/mowst-gnn).</abstract><doi>10.48550/arxiv.2311.05185</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2311.05185
ispartof
issn
language eng
recordid cdi_arxiv_primary_2311_05185
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title Mixture of Weak & Strong Experts on Graphs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T13%3A14%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mixture%20of%20Weak%20&%20Strong%20Experts%20on%20Graphs&rft.au=Zeng,%20Hanqing&rft.date=2023-11-09&rft_id=info:doi/10.48550/arxiv.2311.05185&rft_dat=%3Carxiv_GOX%3E2311_05185%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true