Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection
Land cover mapping in complex environments can be challenging due to their landscape heterogeneity. With the increasing availability of various open-access remotely sensed datasets, more images acquired by different sensors and on different dates tend to be used to improve land cover classification...
Gespeichert in:
Veröffentlicht in: | Remote sensing of environment 2020-12, Vol.251, p.112105, Article 112105 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 112105 |
container_title | Remote sensing of environment |
container_volume | 251 |
creator | Zhang, Fang Yang, Xiaojun |
description | Land cover mapping in complex environments can be challenging due to their landscape heterogeneity. With the increasing availability of various open-access remotely sensed datasets, more images acquired by different sensors and on different dates tend to be used to improve land cover classification accuracy. Selecting an appropriate feature domain with the best landscape separability is therefore crucial in meeting the requirement of computational efficiency and model interpretability. Variable selection is widely used in pattern recognition to enhance model parsimony. This study focused on the variable selection process and proposed a series of methods to select the optimal feature domain to improve land cover classification in a complex urbanized coastal area. Two decision tree models (CART-Classification and Regression Tree and CIT-Conditional Inference Tree) and five variable importance measures (GINI, PVIM-Permutated Variable Importance Measure, MD- Minimum Depth, IPM-Intervention of Prediction Measure, and CPVIM-Conditional Permutation Variable Importance Measure) based on random forests were considered. Variable importance measures were applied to a set of spectral, spatial and temporal features derived from medium-resolution satellite images. Backward elimination methods were used to select the optimal feature subset. It is found that compared to the traditional band-only model, the variable selection process can significantly improve the model parsimony and computational efficiency. The CPVIM based on CIT decision tree model was more reliable in selecting relevant features regardless their correlations, but CART tended to generate higher classification accuracy. Therefore, the combination of the CART model and the ranking from the CPVIM variable measure is recommended to achieve higher classification accuracy and better data interpretability. The novelty of our work is with the insight into the merits of integrating variable selection in the land cover classification process over complex environments.
•Variable selection can significantly improve coastal land cover classification.•The selection of variable importance measures may vary by data types.•Conditional permutated variable importance measure was reliable for correlated data.•Conditional Inference Tree took more time but did not necessarily improve accuracy. |
doi_str_mv | 10.1016/j.rse.2020.112105 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2478111032</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0034425720304788</els_id><sourcerecordid>2478111032</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-7d2f46bf68074e67d00405f90debf6beee61f4230795c8b1e6f279fc48320d0b3</originalsourceid><addsrcrecordid>eNp9UMtOwzAQtBBIlMcHcLPEOWXtOHECJ1TxqITEBc6W46zBVRrDuq1Uvh5X4cxpH5rZnRnGrgTMBYj6ZjWnhHMJMs9CCqiO2Ew0ui1AgzpmM4BSFUpW-pSdpbQCEFWjxYzRcv1FcRfGDz7Ysecu7pC4G2xKwQdnNyGOPIzcjnxLnR3DDx5ANm3swC2h5d2eU2bGNfeRMG3SLX_7RE5xQB4931kKtst9wgHd4dwFO_F2SHj5V8_Z--PD2-K5eHl9Wi7uXwpXtmJT6F56VXe-bkArrHUPoKDyLfSYlx0i1sIrWYJuK9d0AmsvdeudakoJPXTlObue7maD39uszKzilsb80kilGyEElDKjxIRyFFMi9OaLwtrS3ggwh2jNyuRozSFaM0WbOXcTB7P8XUAyyQUcHfaBskfTx_AP-xdsrII1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2478111032</pqid></control><display><type>article</type><title>Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection</title><source>Access via ScienceDirect (Elsevier)</source><creator>Zhang, Fang ; Yang, Xiaojun</creator><creatorcontrib>Zhang, Fang ; Yang, Xiaojun</creatorcontrib><description>Land cover mapping in complex environments can be challenging due to their landscape heterogeneity. With the increasing availability of various open-access remotely sensed datasets, more images acquired by different sensors and on different dates tend to be used to improve land cover classification accuracy. Selecting an appropriate feature domain with the best landscape separability is therefore crucial in meeting the requirement of computational efficiency and model interpretability. Variable selection is widely used in pattern recognition to enhance model parsimony. This study focused on the variable selection process and proposed a series of methods to select the optimal feature domain to improve land cover classification in a complex urbanized coastal area. Two decision tree models (CART-Classification and Regression Tree and CIT-Conditional Inference Tree) and five variable importance measures (GINI, PVIM-Permutated Variable Importance Measure, MD- Minimum Depth, IPM-Intervention of Prediction Measure, and CPVIM-Conditional Permutation Variable Importance Measure) based on random forests were considered. Variable importance measures were applied to a set of spectral, spatial and temporal features derived from medium-resolution satellite images. Backward elimination methods were used to select the optimal feature subset. It is found that compared to the traditional band-only model, the variable selection process can significantly improve the model parsimony and computational efficiency. The CPVIM based on CIT decision tree model was more reliable in selecting relevant features regardless their correlations, but CART tended to generate higher classification accuracy. Therefore, the combination of the CART model and the ranking from the CPVIM variable measure is recommended to achieve higher classification accuracy and better data interpretability. The novelty of our work is with the insight into the merits of integrating variable selection in the land cover classification process over complex environments.
•Variable selection can significantly improve coastal land cover classification.•The selection of variable importance measures may vary by data types.•Conditional permutated variable importance measure was reliable for correlated data.•Conditional Inference Tree took more time but did not necessarily improve accuracy.</description><identifier>ISSN: 0034-4257</identifier><identifier>EISSN: 1879-0704</identifier><identifier>DOI: 10.1016/j.rse.2020.112105</identifier><language>eng</language><publisher>New York: Elsevier Inc</publisher><subject>Accuracy ; Classification ; Coastal zone ; Coasts ; Complex environments ; Computational efficiency ; Computer applications ; Computing time ; Decision trees ; Domains ; Feature selection ; Heterogeneity ; Image acquisition ; Land cover ; Land cover classification ; Land use ; Landscape ; Pattern recognition ; Permutations ; Random forests ; Regression analysis ; Remote sensing ; Remote sensors ; Satellite imagery ; Temporal variations ; Variable selection</subject><ispartof>Remote sensing of environment, 2020-12, Vol.251, p.112105, Article 112105</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright Elsevier BV Dec 15, 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-7d2f46bf68074e67d00405f90debf6beee61f4230795c8b1e6f279fc48320d0b3</citedby><cites>FETCH-LOGICAL-c391t-7d2f46bf68074e67d00405f90debf6beee61f4230795c8b1e6f279fc48320d0b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.rse.2020.112105$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Zhang, Fang</creatorcontrib><creatorcontrib>Yang, Xiaojun</creatorcontrib><title>Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection</title><title>Remote sensing of environment</title><description>Land cover mapping in complex environments can be challenging due to their landscape heterogeneity. With the increasing availability of various open-access remotely sensed datasets, more images acquired by different sensors and on different dates tend to be used to improve land cover classification accuracy. Selecting an appropriate feature domain with the best landscape separability is therefore crucial in meeting the requirement of computational efficiency and model interpretability. Variable selection is widely used in pattern recognition to enhance model parsimony. This study focused on the variable selection process and proposed a series of methods to select the optimal feature domain to improve land cover classification in a complex urbanized coastal area. Two decision tree models (CART-Classification and Regression Tree and CIT-Conditional Inference Tree) and five variable importance measures (GINI, PVIM-Permutated Variable Importance Measure, MD- Minimum Depth, IPM-Intervention of Prediction Measure, and CPVIM-Conditional Permutation Variable Importance Measure) based on random forests were considered. Variable importance measures were applied to a set of spectral, spatial and temporal features derived from medium-resolution satellite images. Backward elimination methods were used to select the optimal feature subset. It is found that compared to the traditional band-only model, the variable selection process can significantly improve the model parsimony and computational efficiency. The CPVIM based on CIT decision tree model was more reliable in selecting relevant features regardless their correlations, but CART tended to generate higher classification accuracy. Therefore, the combination of the CART model and the ranking from the CPVIM variable measure is recommended to achieve higher classification accuracy and better data interpretability. The novelty of our work is with the insight into the merits of integrating variable selection in the land cover classification process over complex environments.
•Variable selection can significantly improve coastal land cover classification.•The selection of variable importance measures may vary by data types.•Conditional permutated variable importance measure was reliable for correlated data.•Conditional Inference Tree took more time but did not necessarily improve accuracy.</description><subject>Accuracy</subject><subject>Classification</subject><subject>Coastal zone</subject><subject>Coasts</subject><subject>Complex environments</subject><subject>Computational efficiency</subject><subject>Computer applications</subject><subject>Computing time</subject><subject>Decision trees</subject><subject>Domains</subject><subject>Feature selection</subject><subject>Heterogeneity</subject><subject>Image acquisition</subject><subject>Land cover</subject><subject>Land cover classification</subject><subject>Land use</subject><subject>Landscape</subject><subject>Pattern recognition</subject><subject>Permutations</subject><subject>Random forests</subject><subject>Regression analysis</subject><subject>Remote sensing</subject><subject>Remote sensors</subject><subject>Satellite imagery</subject><subject>Temporal variations</subject><subject>Variable selection</subject><issn>0034-4257</issn><issn>1879-0704</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9UMtOwzAQtBBIlMcHcLPEOWXtOHECJ1TxqITEBc6W46zBVRrDuq1Uvh5X4cxpH5rZnRnGrgTMBYj6ZjWnhHMJMs9CCqiO2Ew0ui1AgzpmM4BSFUpW-pSdpbQCEFWjxYzRcv1FcRfGDz7Ysecu7pC4G2xKwQdnNyGOPIzcjnxLnR3DDx5ANm3swC2h5d2eU2bGNfeRMG3SLX_7RE5xQB4931kKtst9wgHd4dwFO_F2SHj5V8_Z--PD2-K5eHl9Wi7uXwpXtmJT6F56VXe-bkArrHUPoKDyLfSYlx0i1sIrWYJuK9d0AmsvdeudakoJPXTlObue7maD39uszKzilsb80kilGyEElDKjxIRyFFMi9OaLwtrS3ggwh2jNyuRozSFaM0WbOXcTB7P8XUAyyQUcHfaBskfTx_AP-xdsrII1</recordid><startdate>20201215</startdate><enddate>20201215</enddate><creator>Zhang, Fang</creator><creator>Yang, Xiaojun</creator><general>Elsevier Inc</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SN</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TG</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>C1K</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>JG9</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope></search><sort><creationdate>20201215</creationdate><title>Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection</title><author>Zhang, Fang ; Yang, Xiaojun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-7d2f46bf68074e67d00405f90debf6beee61f4230795c8b1e6f279fc48320d0b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Classification</topic><topic>Coastal zone</topic><topic>Coasts</topic><topic>Complex environments</topic><topic>Computational efficiency</topic><topic>Computer applications</topic><topic>Computing time</topic><topic>Decision trees</topic><topic>Domains</topic><topic>Feature selection</topic><topic>Heterogeneity</topic><topic>Image acquisition</topic><topic>Land cover</topic><topic>Land cover classification</topic><topic>Land use</topic><topic>Landscape</topic><topic>Pattern recognition</topic><topic>Permutations</topic><topic>Random forests</topic><topic>Regression analysis</topic><topic>Remote sensing</topic><topic>Remote sensors</topic><topic>Satellite imagery</topic><topic>Temporal variations</topic><topic>Variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Fang</creatorcontrib><creatorcontrib>Yang, Xiaojun</creatorcontrib><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Ecology Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>Remote sensing of environment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Fang</au><au>Yang, Xiaojun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection</atitle><jtitle>Remote sensing of environment</jtitle><date>2020-12-15</date><risdate>2020</risdate><volume>251</volume><spage>112105</spage><pages>112105-</pages><artnum>112105</artnum><issn>0034-4257</issn><eissn>1879-0704</eissn><abstract>Land cover mapping in complex environments can be challenging due to their landscape heterogeneity. With the increasing availability of various open-access remotely sensed datasets, more images acquired by different sensors and on different dates tend to be used to improve land cover classification accuracy. Selecting an appropriate feature domain with the best landscape separability is therefore crucial in meeting the requirement of computational efficiency and model interpretability. Variable selection is widely used in pattern recognition to enhance model parsimony. This study focused on the variable selection process and proposed a series of methods to select the optimal feature domain to improve land cover classification in a complex urbanized coastal area. Two decision tree models (CART-Classification and Regression Tree and CIT-Conditional Inference Tree) and five variable importance measures (GINI, PVIM-Permutated Variable Importance Measure, MD- Minimum Depth, IPM-Intervention of Prediction Measure, and CPVIM-Conditional Permutation Variable Importance Measure) based on random forests were considered. Variable importance measures were applied to a set of spectral, spatial and temporal features derived from medium-resolution satellite images. Backward elimination methods were used to select the optimal feature subset. It is found that compared to the traditional band-only model, the variable selection process can significantly improve the model parsimony and computational efficiency. The CPVIM based on CIT decision tree model was more reliable in selecting relevant features regardless their correlations, but CART tended to generate higher classification accuracy. Therefore, the combination of the CART model and the ranking from the CPVIM variable measure is recommended to achieve higher classification accuracy and better data interpretability. The novelty of our work is with the insight into the merits of integrating variable selection in the land cover classification process over complex environments.
•Variable selection can significantly improve coastal land cover classification.•The selection of variable importance measures may vary by data types.•Conditional permutated variable importance measure was reliable for correlated data.•Conditional Inference Tree took more time but did not necessarily improve accuracy.</abstract><cop>New York</cop><pub>Elsevier Inc</pub><doi>10.1016/j.rse.2020.112105</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0034-4257 |
ispartof | Remote sensing of environment, 2020-12, Vol.251, p.112105, Article 112105 |
issn | 0034-4257 1879-0704 |
language | eng |
recordid | cdi_proquest_journals_2478111032 |
source | Access via ScienceDirect (Elsevier) |
subjects | Accuracy Classification Coastal zone Coasts Complex environments Computational efficiency Computer applications Computing time Decision trees Domains Feature selection Heterogeneity Image acquisition Land cover Land cover classification Land use Landscape Pattern recognition Permutations Random forests Regression analysis Remote sensing Remote sensors Satellite imagery Temporal variations Variable selection |
title | Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T07%3A49%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20land%20cover%20classification%20in%20an%20urbanized%20coastal%20area%20by%20random%20forests:%20The%20role%20of%20variable%20selection&rft.jtitle=Remote%20sensing%20of%20environment&rft.au=Zhang,%20Fang&rft.date=2020-12-15&rft.volume=251&rft.spage=112105&rft.pages=112105-&rft.artnum=112105&rft.issn=0034-4257&rft.eissn=1879-0704&rft_id=info:doi/10.1016/j.rse.2020.112105&rft_dat=%3Cproquest_cross%3E2478111032%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2478111032&rft_id=info:pmid/&rft_els_id=S0034425720304788&rfr_iscdi=true |