Effects of additional data on Bayesian clustering

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural networks 2017-10, Vol.94, p.86-95
1. Verfasser:	Yamazaki, Keisuke
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayes Theorem Cluster Analysis Hierarchical parametric models Latent variable estimation Models, Statistical Semi-supervised learning Supervised Machine Learning Unsupervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	95
container_issue
container_start_page	86
container_title	Neural networks
container_volume	94
creator	Yamazaki, Keisuke
description	Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.
doi_str_mv	10.1016/j.neunet.2017.06.015
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1924589733</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S089360801730151X</els_id><sourcerecordid>1924589733</sourcerecordid><originalsourceid>FETCH-LOGICAL-c428t-43e3dc01da435dd87d328340951e8e144b45379d9680e18a120368dcd5897a943</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EoqXwBwhlySZhHDt-bJCgKg-pEhtYW649Qa7SpNgJUv-eVC0sWc3m3Hs1h5BrCgUFKu7WRYtDi31RApUFiAJodUKmVEmdl1KVp2QKSrNcgIIJuUhpDQBCcXZOJqWSVSWonBK6qGt0fcq6OrPehz50rW0yb3ubdW32aHeYgm0z1wypxxjaz0tyVtsm4dXxzsjH0-J9_pIv355f5w_L3PFS9TlnyLwD6i1nlfdKelYqxkFXFBVSzle8YlJ7LRQgVZaWwITyzldKS6s5m5HbQ-82dl8Dpt5sQnLYNLbFbkiG6pLvWcZGlB9QF7uUItZmG8PGxp2hYPayzNocZJm9LAPCjLLG2M1xYVht0P-Ffu2MwP0BwPHP74DRJBewdehDHKUZ34X_F34A7E96ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1924589733</pqid></control><display><type>article</type><title>Effects of additional data on Bayesian clustering</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Yamazaki, Keisuke</creator><creatorcontrib>Yamazaki, Keisuke</creatorcontrib><description>Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2017.06.015</identifier><identifier>PMID: 28755617</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Bayes Theorem ; Cluster Analysis ; Hierarchical parametric models ; Latent variable estimation ; Models, Statistical ; Semi-supervised learning ; Supervised Machine Learning ; Unsupervised learning</subject><ispartof>Neural networks, 2017-10, Vol.94, p.86-95</ispartof><rights>2017 Elsevier Ltd</rights><rights>Copyright © 2017 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c428t-43e3dc01da435dd87d328340951e8e144b45379d9680e18a120368dcd5897a943</citedby><cites>FETCH-LOGICAL-c428t-43e3dc01da435dd87d328340951e8e144b45379d9680e18a120368dcd5897a943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S089360801730151X$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28755617$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yamazaki, Keisuke</creatorcontrib><title>Effects of additional data on Bayesian clustering</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.</description><subject>Bayes Theorem</subject><subject>Cluster Analysis</subject><subject>Hierarchical parametric models</subject><subject>Latent variable estimation</subject><subject>Models, Statistical</subject><subject>Semi-supervised learning</subject><subject>Supervised Machine Learning</subject><subject>Unsupervised learning</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kMtOwzAQRS0EoqXwBwhlySZhHDt-bJCgKg-pEhtYW649Qa7SpNgJUv-eVC0sWc3m3Hs1h5BrCgUFKu7WRYtDi31RApUFiAJodUKmVEmdl1KVp2QKSrNcgIIJuUhpDQBCcXZOJqWSVSWonBK6qGt0fcq6OrPehz50rW0yb3ubdW32aHeYgm0z1wypxxjaz0tyVtsm4dXxzsjH0-J9_pIv355f5w_L3PFS9TlnyLwD6i1nlfdKelYqxkFXFBVSzle8YlJ7LRQgVZaWwITyzldKS6s5m5HbQ-82dl8Dpt5sQnLYNLbFbkiG6pLvWcZGlB9QF7uUItZmG8PGxp2hYPayzNocZJm9LAPCjLLG2M1xYVht0P-Ffu2MwP0BwPHP74DRJBewdehDHKUZ34X_F34A7E96ug</recordid><startdate>201710</startdate><enddate>201710</enddate><creator>Yamazaki, Keisuke</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>201710</creationdate><title>Effects of additional data on Bayesian clustering</title><author>Yamazaki, Keisuke</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c428t-43e3dc01da435dd87d328340951e8e144b45379d9680e18a120368dcd5897a943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bayes Theorem</topic><topic>Cluster Analysis</topic><topic>Hierarchical parametric models</topic><topic>Latent variable estimation</topic><topic>Models, Statistical</topic><topic>Semi-supervised learning</topic><topic>Supervised Machine Learning</topic><topic>Unsupervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yamazaki, Keisuke</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yamazaki, Keisuke</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects of additional data on Bayesian clustering</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2017-10</date><risdate>2017</risdate><volume>94</volume><spage>86</spage><epage>95</epage><pages>86-95</pages><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>28755617</pmid><doi>10.1016/j.neunet.2017.06.015</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0893-6080
ispartof	Neural networks, 2017-10, Vol.94, p.86-95
issn	0893-6080 1879-2782
language	eng
recordid	cdi_proquest_miscellaneous_1924589733
source	MEDLINE; Elsevier ScienceDirect Journals
subjects	Bayes Theorem Cluster Analysis Hierarchical parametric models Latent variable estimation Models, Statistical Semi-supervised learning Supervised Machine Learning Unsupervised learning
title	Effects of additional data on Bayesian clustering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T05%3A35%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20of%20additional%20data%20on%20Bayesian%20clustering&rft.jtitle=Neural%20networks&rft.au=Yamazaki,%20Keisuke&rft.date=2017-10&rft.volume=94&rft.spage=86&rft.epage=95&rft.pages=86-95&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2017.06.015&rft_dat=%3Cproquest_cross%3E1924589733%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1924589733&rft_id=info:pmid/28755617&rft_els_id=S089360801730151X&rfr_iscdi=true