Data mining applied to feature selection methods for aboveground carbon stock modelling

Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Carvalho, Mônica Canaan, Gomide, Lucas Rezende, Scolforo, José Roberto Soares, Páscoa, Kalill José Viana da, Araújo, Laís Almeida, Lopes, Isáira Leite e
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Carvalho, Mônica Canaan
Gomide, Lucas Rezende
Scolforo, José Roberto Soares
Páscoa, Kalill José Viana da
Araújo, Laís Almeida
Lopes, Isáira Leite e
description Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.
doi_str_mv 10.6084/m9.figshare.21679161
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_6084_m9_figshare_21679161</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_6084_m9_figshare_21679161</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_6084_m9_figshare_216791613</originalsourceid><addsrcrecordid>eNqdjjsOwjAQRN1QIOAGFHsBQgxRIDUfcQAkSmsTrxMLfyLbQeL2BIlcgGqK0bx5jK15npX5sdjaKlO6jR0Gyna8PFS85HP2OGNCsNpp1wL2vdEkIXlQhGkIBJEMNUl7B5ZS52UE5QNg7V_UBj84CQ2Geqxj8s0TrJdkzMhasplCE2n1ywUrrpf76baR41-jE4k-aIvhLXguvn7CVmLyE5Pf_s_ZB3KRT78</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Data mining applied to feature selection methods for aboveground carbon stock modelling</title><source>DataCite</source><creator>Carvalho, Mônica Canaan ; Gomide, Lucas Rezende ; Scolforo, José Roberto Soares ; Páscoa, Kalill José Viana da ; Araújo, Laís Almeida ; Lopes, Isáira Leite e</creator><creatorcontrib>Carvalho, Mônica Canaan ; Gomide, Lucas Rezende ; Scolforo, José Roberto Soares ; Páscoa, Kalill José Viana da ; Araújo, Laís Almeida ; Lopes, Isáira Leite e</creatorcontrib><description>Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.</description><identifier>DOI: 10.6084/m9.figshare.21679161</identifier><language>eng</language><publisher>SciELO journals</publisher><subject>Agricultural Biotechnology not elsewhere classified ; FOS: Agricultural biotechnology</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1892</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.6084/m9.figshare.21679161$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Carvalho, Mônica Canaan</creatorcontrib><creatorcontrib>Gomide, Lucas Rezende</creatorcontrib><creatorcontrib>Scolforo, José Roberto Soares</creatorcontrib><creatorcontrib>Páscoa, Kalill José Viana da</creatorcontrib><creatorcontrib>Araújo, Laís Almeida</creatorcontrib><creatorcontrib>Lopes, Isáira Leite e</creatorcontrib><title>Data mining applied to feature selection methods for aboveground carbon stock modelling</title><description>Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.</description><subject>Agricultural Biotechnology not elsewhere classified</subject><subject>FOS: Agricultural biotechnology</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2022</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqdjjsOwjAQRN1QIOAGFHsBQgxRIDUfcQAkSmsTrxMLfyLbQeL2BIlcgGqK0bx5jK15npX5sdjaKlO6jR0Gyna8PFS85HP2OGNCsNpp1wL2vdEkIXlQhGkIBJEMNUl7B5ZS52UE5QNg7V_UBj84CQ2Geqxj8s0TrJdkzMhasplCE2n1ywUrrpf76baR41-jE4k-aIvhLXguvn7CVmLyE5Pf_s_ZB3KRT78</recordid><startdate>20221206</startdate><enddate>20221206</enddate><creator>Carvalho, Mônica Canaan</creator><creator>Gomide, Lucas Rezende</creator><creator>Scolforo, José Roberto Soares</creator><creator>Páscoa, Kalill José Viana da</creator><creator>Araújo, Laís Almeida</creator><creator>Lopes, Isáira Leite e</creator><general>SciELO journals</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20221206</creationdate><title>Data mining applied to feature selection methods for aboveground carbon stock modelling</title><author>Carvalho, Mônica Canaan ; Gomide, Lucas Rezende ; Scolforo, José Roberto Soares ; Páscoa, Kalill José Viana da ; Araújo, Laís Almeida ; Lopes, Isáira Leite e</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_6084_m9_figshare_216791613</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agricultural Biotechnology not elsewhere classified</topic><topic>FOS: Agricultural biotechnology</topic><toplevel>online_resources</toplevel><creatorcontrib>Carvalho, Mônica Canaan</creatorcontrib><creatorcontrib>Gomide, Lucas Rezende</creatorcontrib><creatorcontrib>Scolforo, José Roberto Soares</creatorcontrib><creatorcontrib>Páscoa, Kalill José Viana da</creatorcontrib><creatorcontrib>Araújo, Laís Almeida</creatorcontrib><creatorcontrib>Lopes, Isáira Leite e</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Carvalho, Mônica Canaan</au><au>Gomide, Lucas Rezende</au><au>Scolforo, José Roberto Soares</au><au>Páscoa, Kalill José Viana da</au><au>Araújo, Laís Almeida</au><au>Lopes, Isáira Leite e</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Data mining applied to feature selection methods for aboveground carbon stock modelling</title><date>2022-12-06</date><risdate>2022</risdate><abstract>Abstract The objective of this work was to apply the random forest (RF) algorithm to the modelling of the aboveground carbon (AGC) stock of a tropical forest by testing three feature selection procedures – recursive removal and the uniobjective and multiobjective genetic algorithms (GAs). The used database covered 1,007 plots sampled in the Rio Grande watershed, in the state of Minas Gerais state, Brazil, and 114 environmental variables (climatic, edaphic, geographic, terrain, and spectral). The best feature selection strategy – RF with multiobjective GA – reaches the minor root-square error of 17.75 Mg ha-1 with only four spectral variables – normalized difference moisture index, normalized burnratio 2 correlation text ure, treecover, and latent heat flux –, which represents a reduction of 96.5% in the size of the database. Feature selection strategies assist in obtaining a better RF performance, by improving the accuracy and reducing the volume of the data. Although the recursive removal and multiobjective GA showed a similar performance as feature selection strategies, the latter presents the smallest subset of variables, with the highest accuracy. The findings of this study highlight the importance of using near infrared, short wavelengths, and derived vegetation indices for the remote-sense-based estimation of AGC. The MODIS products show a significant relationship with the AGC stock and should be further explored by the scientific community for the modelling of this stock.</abstract><pub>SciELO journals</pub><doi>10.6084/m9.figshare.21679161</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.6084/m9.figshare.21679161
ispartof
issn
language eng
recordid cdi_datacite_primary_10_6084_m9_figshare_21679161
source DataCite
subjects Agricultural Biotechnology not elsewhere classified
FOS: Agricultural biotechnology
title Data mining applied to feature selection methods for aboveground carbon stock modelling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T11%3A04%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Carvalho,%20M%C3%B4nica%20Canaan&rft.date=2022-12-06&rft_id=info:doi/10.6084/m9.figshare.21679161&rft_dat=%3Cdatacite_PQ8%3E10_6084_m9_figshare_21679161%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true