Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data

High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemometrics 2016-01, Vol.30 (1), p.37-45
Hauptverfasser:	Xu, Jian, Xu, Qingsong, Yi, Lunzhao, Chan, Chi-On, Mok, Daniel Kam-Wah
Format:	Artikel
Sprache:	eng
Schlagworte:	Centroids Chemometrics Classification Classifiers Correlation Data processing Mathematical models principal component analysis soft independent modeling of class analogy Spectra
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	45
container_issue	1
container_start_page	37
container_title	Journal of chemometrics
container_volume	30
creator	Xu, Jian Xu, Qingsong Yi, Lunzhao Chan, Chi-On Mok, Daniel Kam-Wah
description	High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd. CA‐NSC combines class centroid and correlation structure information to generate the classification. The method constructs PCA models on different subsets of variables to depict different classes. It is able to calculate the probabilities of a sample being assigned to every class.
doi_str_mv	10.1002/cem.2768
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1800492731</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1800492731</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4068-9c0f35d61709c45e3c999698654559adab1fc25eccb994fe8e0b4e641d178c113</originalsourceid><addsrcrecordid>eNp1kN9LwzAQx4MoOKfgn5BHXzqTJm2TRylzEzYFUdxbyNKrjesvk465_97WieiDT3fc93MH90HokpIJJSS8NlBNwiQWR2hEiZQBDcXqGI2IEHEgmWCn6Mz7N0L6jPERqtPGOSh1Z5s60N5b30GGa9AOfId94bb1BmpsoO5cYzNsygHKLTi8s12BdduW1nyte5w3Dhf2tcCZraD2_UyX2LdgOtc3me70OTrJdenh4ruO0fPt9CmdB4uH2V16swgMJ7EIpCE5i7KYJkQaHgEzUspYijjiUSR1ptc0N2EExqyl5DkIIGsOMacZTYShlI3R1eFu65r3bf-Kqqw3UJa6hmbrFRWEcBkm7BdqXOO9g1y1zlba7RUlalCqeqVqUNqjwQHd2RL2_3IqnS7_8oPVjx9eu42KE5ZE6uV-plaPbBHR5UzN2Sc_n4lQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1800492731</pqid></control><display><type>article</type><title>Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data</title><source>Access via Wiley Online Library</source><creator>Xu, Jian ; Xu, Qingsong ; Yi, Lunzhao ; Chan, Chi-On ; Mok, Daniel Kam-Wah</creator><creatorcontrib>Xu, Jian ; Xu, Qingsong ; Yi, Lunzhao ; Chan, Chi-On ; Mok, Daniel Kam-Wah</creatorcontrib><description>High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd. CA‐NSC combines class centroid and correlation structure information to generate the classification. The method constructs PCA models on different subsets of variables to depict different classes. It is able to calculate the probabilities of a sample being assigned to every class.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.2768</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>Centroids ; Chemometrics ; Classification ; Classifiers ; Correlation ; Data processing ; Mathematical models ; principal component analysis ; soft independent modeling of class analogy ; Spectra</subject><ispartof>Journal of chemometrics, 2016-01, Vol.30 (1), p.37-45</ispartof><rights>Copyright © 2015 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4068-9c0f35d61709c45e3c999698654559adab1fc25eccb994fe8e0b4e641d178c113</citedby><cites>FETCH-LOGICAL-c4068-9c0f35d61709c45e3c999698654559adab1fc25eccb994fe8e0b4e641d178c113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.2768$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.2768$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Xu, Jian</creatorcontrib><creatorcontrib>Xu, Qingsong</creatorcontrib><creatorcontrib>Yi, Lunzhao</creatorcontrib><creatorcontrib>Chan, Chi-On</creatorcontrib><creatorcontrib>Mok, Daniel Kam-Wah</creatorcontrib><title>Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data</title><title>Journal of chemometrics</title><addtitle>J. Chemometrics</addtitle><description>High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd. CA‐NSC combines class centroid and correlation structure information to generate the classification. The method constructs PCA models on different subsets of variables to depict different classes. It is able to calculate the probabilities of a sample being assigned to every class.</description><subject>Centroids</subject><subject>Chemometrics</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Correlation</subject><subject>Data processing</subject><subject>Mathematical models</subject><subject>principal component analysis</subject><subject>soft independent modeling of class analogy</subject><subject>Spectra</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp1kN9LwzAQx4MoOKfgn5BHXzqTJm2TRylzEzYFUdxbyNKrjesvk465_97WieiDT3fc93MH90HokpIJJSS8NlBNwiQWR2hEiZQBDcXqGI2IEHEgmWCn6Mz7N0L6jPERqtPGOSh1Z5s60N5b30GGa9AOfId94bb1BmpsoO5cYzNsygHKLTi8s12BdduW1nyte5w3Dhf2tcCZraD2_UyX2LdgOtc3me70OTrJdenh4ruO0fPt9CmdB4uH2V16swgMJ7EIpCE5i7KYJkQaHgEzUspYijjiUSR1ptc0N2EExqyl5DkIIGsOMacZTYShlI3R1eFu65r3bf-Kqqw3UJa6hmbrFRWEcBkm7BdqXOO9g1y1zlba7RUlalCqeqVqUNqjwQHd2RL2_3IqnS7_8oPVjx9eu42KE5ZE6uV-plaPbBHR5UzN2Sc_n4lQ</recordid><startdate>201601</startdate><enddate>201601</enddate><creator>Xu, Jian</creator><creator>Xu, Qingsong</creator><creator>Yi, Lunzhao</creator><creator>Chan, Chi-On</creator><creator>Mok, Daniel Kam-Wah</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201601</creationdate><title>Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data</title><author>Xu, Jian ; Xu, Qingsong ; Yi, Lunzhao ; Chan, Chi-On ; Mok, Daniel Kam-Wah</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4068-9c0f35d61709c45e3c999698654559adab1fc25eccb994fe8e0b4e641d178c113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Centroids</topic><topic>Chemometrics</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Correlation</topic><topic>Data processing</topic><topic>Mathematical models</topic><topic>principal component analysis</topic><topic>soft independent modeling of class analogy</topic><topic>Spectra</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Jian</creatorcontrib><creatorcontrib>Xu, Qingsong</creatorcontrib><creatorcontrib>Yi, Lunzhao</creatorcontrib><creatorcontrib>Chan, Chi-On</creatorcontrib><creatorcontrib>Mok, Daniel Kam-Wah</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Jian</au><au>Xu, Qingsong</au><au>Yi, Lunzhao</au><au>Chan, Chi-On</au><au>Mok, Daniel Kam-Wah</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data</atitle><jtitle>Journal of chemometrics</jtitle><addtitle>J. Chemometrics</addtitle><date>2016-01</date><risdate>2016</risdate><volume>30</volume><issue>1</issue><spage>37</spage><epage>45</epage><pages>37-45</pages><issn>0886-9383</issn><eissn>1099-128X</eissn><abstract>High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd. CA‐NSC combines class centroid and correlation structure information to generate the classification. The method constructs PCA models on different subsets of variables to depict different classes. It is able to calculate the probabilities of a sample being assigned to every class.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cem.2768</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0886-9383
ispartof	Journal of chemometrics, 2016-01, Vol.30 (1), p.37-45
issn	0886-9383 1099-128X
language	eng
recordid	cdi_proquest_miscellaneous_1800492731
source	Access via Wiley Online Library
subjects	Centroids Chemometrics Classification Classifiers Correlation Data processing Mathematical models principal component analysis soft independent modeling of class analogy Spectra
title	Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T13%3A15%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Correlation-assisted%20nearest%20shrunken%20centroid%20classifier%20with%20applications%20for%20high%20dimensional%20spectral%20data&rft.jtitle=Journal%20of%20chemometrics&rft.au=Xu,%20Jian&rft.date=2016-01&rft.volume=30&rft.issue=1&rft.spage=37&rft.epage=45&rft.pages=37-45&rft.issn=0886-9383&rft.eissn=1099-128X&rft_id=info:doi/10.1002/cem.2768&rft_dat=%3Cproquest_cross%3E1800492731%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1800492731&rft_id=info:pmid/&rfr_iscdi=true