Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning

Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ord...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Canadian journal of forest research 2022-03, Vol.52 (3), p.385-395
Hauptverfasser: Cosenza, Diogo N, Packalen, Petteri, Maltamo, Matti, Varvia, Petri, Räty, Janne, Soares, Paula, Tomé, Margarida, Strunk, Jacob L, Korhonen, Lauri
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 395
container_issue 3
container_start_page 385
container_title Canadian journal of forest research
container_volume 52
creator Cosenza, Diogo N
Packalen, Petteri
Maltamo, Matti
Varvia, Petri
Räty, Janne
Soares, Paula
Tomé, Margarida
Strunk, Jacob L
Korhonen, Lauri
description Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m 3 ·ha –1 ) for four boreal sites using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (≥250 plots).
doi_str_mv 10.1139/cjfr-2021-0192
format Article
fullrecord <record><control><sourceid>gale_nrcre</sourceid><recordid>TN_cdi_gale_infotracmisc_A696745401</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A696745401</galeid><sourcerecordid>A696745401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c509t-fd7a12878f03a0f1efcb4fa8f638ca71c411edcff6a72ba6d414126bdaef446a3</originalsourceid><addsrcrecordid>eNqVkk-LFDEQxRtRcFy9eg568tBr0p1J9xyXZdWFRcE_51Cdrsxk6E56K5nR-RD7nTftLujAgEgOKSq_96ogryheC34uRL16b7aWyopXouRiVT0pFqLibal43TwtFpzLZbnkqnlevIhxyzmvVc0Xxd2VtWhSZMEyvxs7pN9l6CLSHpILPjLwPZsIe2dSyM82ENsDubCLbAw9DiwdJswyz9IG2YSUiRG8wdkp1xgTc36PPssP7KdLGwaOukAe2QB5EIsGvHd-_bJ4ZmGI-OrxPit-fLj6fvmpvPny8fry4qY0S75Kpe0bEFXbtJbXwK1AazppobWqbg00wkghsDfWKmiqDlQvhRSV6npAK6WC-qx4--A7Ubjd5f30NuzI55G6UrVqlWzk8g-1hgG18zYkAjO6aPSFWqmMSC4yVZ6g1uiRYAgercvtI_7NCd5M7lb_DZ2fgPLpcXTmpOu7I0FmEv5Ka9jFqK-_ff0P9vMx-7iIoRAjodUTuRHooAXXc-z0HDs9x07PscsC8SDwZPLPI5DZ_EtzD1XI2xI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2636864745</pqid></control><display><type>article</type><title>Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning</title><source>Alma/SFX Local Collection</source><creator>Cosenza, Diogo N ; Packalen, Petteri ; Maltamo, Matti ; Varvia, Petri ; Räty, Janne ; Soares, Paula ; Tomé, Margarida ; Strunk, Jacob L ; Korhonen, Lauri</creator><creatorcontrib>Cosenza, Diogo N ; Packalen, Petteri ; Maltamo, Matti ; Varvia, Petri ; Räty, Janne ; Soares, Paula ; Tomé, Margarida ; Strunk, Jacob L ; Korhonen, Lauri</creatorcontrib><description>Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m 3 ·ha –1 ) for four boreal sites using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (≥250 plots).</description><identifier>ISSN: 0045-5067</identifier><identifier>EISSN: 1208-6037</identifier><identifier>DOI: 10.1139/cjfr-2021-0192</identifier><language>eng</language><publisher>1840 Woodward Drive, Suite 1, Ottawa, ON K2C 0P7: Canadian Science Publishing</publisher><subject>Airborne lasers ; apprentissage machine ; approche territoriale ; area-based approach ; Forest management ; Gaussian process ; Laser applications ; Lasers ; Least squares method ; LiDAR ; Machine learning ; Regression analysis ; Remote sensing ; sampling size ; Scanning ; Support vector machines ; taille de l’échantillon ; Technology application ; Training ; télédétection</subject><ispartof>Canadian journal of forest research, 2022-03, Vol.52 (3), p.385-395</ispartof><rights>COPYRIGHT 2022 NRC Research Press</rights><rights>2021 Published by NRC Research Press</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c509t-fd7a12878f03a0f1efcb4fa8f638ca71c411edcff6a72ba6d414126bdaef446a3</citedby><cites>FETCH-LOGICAL-c509t-fd7a12878f03a0f1efcb4fa8f638ca71c411edcff6a72ba6d414126bdaef446a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Cosenza, Diogo N</creatorcontrib><creatorcontrib>Packalen, Petteri</creatorcontrib><creatorcontrib>Maltamo, Matti</creatorcontrib><creatorcontrib>Varvia, Petri</creatorcontrib><creatorcontrib>Räty, Janne</creatorcontrib><creatorcontrib>Soares, Paula</creatorcontrib><creatorcontrib>Tomé, Margarida</creatorcontrib><creatorcontrib>Strunk, Jacob L</creatorcontrib><creatorcontrib>Korhonen, Lauri</creatorcontrib><title>Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning</title><title>Canadian journal of forest research</title><description>Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m 3 ·ha –1 ) for four boreal sites using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (≥250 plots).</description><subject>Airborne lasers</subject><subject>apprentissage machine</subject><subject>approche territoriale</subject><subject>area-based approach</subject><subject>Forest management</subject><subject>Gaussian process</subject><subject>Laser applications</subject><subject>Lasers</subject><subject>Least squares method</subject><subject>LiDAR</subject><subject>Machine learning</subject><subject>Regression analysis</subject><subject>Remote sensing</subject><subject>sampling size</subject><subject>Scanning</subject><subject>Support vector machines</subject><subject>taille de l’échantillon</subject><subject>Technology application</subject><subject>Training</subject><subject>télédétection</subject><issn>0045-5067</issn><issn>1208-6037</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqVkk-LFDEQxRtRcFy9eg568tBr0p1J9xyXZdWFRcE_51Cdrsxk6E56K5nR-RD7nTftLujAgEgOKSq_96ogryheC34uRL16b7aWyopXouRiVT0pFqLibal43TwtFpzLZbnkqnlevIhxyzmvVc0Xxd2VtWhSZMEyvxs7pN9l6CLSHpILPjLwPZsIe2dSyM82ENsDubCLbAw9DiwdJswyz9IG2YSUiRG8wdkp1xgTc36PPssP7KdLGwaOukAe2QB5EIsGvHd-_bJ4ZmGI-OrxPit-fLj6fvmpvPny8fry4qY0S75Kpe0bEFXbtJbXwK1AazppobWqbg00wkghsDfWKmiqDlQvhRSV6npAK6WC-qx4--A7Ubjd5f30NuzI55G6UrVqlWzk8g-1hgG18zYkAjO6aPSFWqmMSC4yVZ6g1uiRYAgercvtI_7NCd5M7lb_DZ2fgPLpcXTmpOu7I0FmEv5Ka9jFqK-_ff0P9vMx-7iIoRAjodUTuRHooAXXc-z0HDs9x07PscsC8SDwZPLPI5DZ_EtzD1XI2xI</recordid><startdate>20220301</startdate><enddate>20220301</enddate><creator>Cosenza, Diogo N</creator><creator>Packalen, Petteri</creator><creator>Maltamo, Matti</creator><creator>Varvia, Petri</creator><creator>Räty, Janne</creator><creator>Soares, Paula</creator><creator>Tomé, Margarida</creator><creator>Strunk, Jacob L</creator><creator>Korhonen, Lauri</creator><general>Canadian Science Publishing</general><general>NRC Research Press</general><general>Canadian Science Publishing NRC Research Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T7</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>U9A</scope></search><sort><creationdate>20220301</creationdate><title>Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning</title><author>Cosenza, Diogo N ; Packalen, Petteri ; Maltamo, Matti ; Varvia, Petri ; Räty, Janne ; Soares, Paula ; Tomé, Margarida ; Strunk, Jacob L ; Korhonen, Lauri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c509t-fd7a12878f03a0f1efcb4fa8f638ca71c411edcff6a72ba6d414126bdaef446a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Airborne lasers</topic><topic>apprentissage machine</topic><topic>approche territoriale</topic><topic>area-based approach</topic><topic>Forest management</topic><topic>Gaussian process</topic><topic>Laser applications</topic><topic>Lasers</topic><topic>Least squares method</topic><topic>LiDAR</topic><topic>Machine learning</topic><topic>Regression analysis</topic><topic>Remote sensing</topic><topic>sampling size</topic><topic>Scanning</topic><topic>Support vector machines</topic><topic>taille de l’échantillon</topic><topic>Technology application</topic><topic>Training</topic><topic>télédétection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cosenza, Diogo N</creatorcontrib><creatorcontrib>Packalen, Petteri</creatorcontrib><creatorcontrib>Maltamo, Matti</creatorcontrib><creatorcontrib>Varvia, Petri</creatorcontrib><creatorcontrib>Räty, Janne</creatorcontrib><creatorcontrib>Soares, Paula</creatorcontrib><creatorcontrib>Tomé, Margarida</creatorcontrib><creatorcontrib>Strunk, Jacob L</creatorcontrib><creatorcontrib>Korhonen, Lauri</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><jtitle>Canadian journal of forest research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cosenza, Diogo N</au><au>Packalen, Petteri</au><au>Maltamo, Matti</au><au>Varvia, Petri</au><au>Räty, Janne</au><au>Soares, Paula</au><au>Tomé, Margarida</au><au>Strunk, Jacob L</au><au>Korhonen, Lauri</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning</atitle><jtitle>Canadian journal of forest research</jtitle><date>2022-03-01</date><risdate>2022</risdate><volume>52</volume><issue>3</issue><spage>385</spage><epage>395</epage><pages>385-395</pages><issn>0045-5067</issn><eissn>1208-6037</eissn><abstract>Semi- and nonparametric models are popular in the area-based approach (ABA) using airborne laser scanning. It is unclear, however, how many predictors and training plots are needed to provide accurate predictions without overfitting. This work aims to explore these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m 3 ·ha –1 ) for four boreal sites using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors only mildly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively. RF did not overfit in any circumstances, whereas GPR overfit even with 500 training plots. Overall, using up to 39 predictors did not generally result in overfit, and for most model types, it resulted in better accuracy for sufficiently large datasets (≥250 plots).</abstract><cop>1840 Woodward Drive, Suite 1, Ottawa, ON K2C 0P7</cop><pub>Canadian Science Publishing</pub><doi>10.1139/cjfr-2021-0192</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0045-5067
ispartof Canadian journal of forest research, 2022-03, Vol.52 (3), p.385-395
issn 0045-5067
1208-6037
language eng
recordid cdi_gale_infotracmisc_A696745401
source Alma/SFX Local Collection
subjects Airborne lasers
apprentissage machine
approche territoriale
area-based approach
Forest management
Gaussian process
Laser applications
Lasers
Least squares method
LiDAR
Machine learning
Regression analysis
Remote sensing
sampling size
Scanning
Support vector machines
taille de l’échantillon
Technology application
Training
télédétection
title Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T09%3A02%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_nrcre&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20of%20numbers%20of%20observations%20and%20predictors%20for%20various%20model%20types%20on%20the%20performance%20of%20forest%20inventory%20with%20airborne%20laser%20scanning&rft.jtitle=Canadian%20journal%20of%20forest%20research&rft.au=Cosenza,%20Diogo%20N&rft.date=2022-03-01&rft.volume=52&rft.issue=3&rft.spage=385&rft.epage=395&rft.pages=385-395&rft.issn=0045-5067&rft.eissn=1208-6037&rft_id=info:doi/10.1139/cjfr-2021-0192&rft_dat=%3Cgale_nrcre%3EA696745401%3C/gale_nrcre%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2636864745&rft_id=info:pmid/&rft_galeid=A696745401&rfr_iscdi=true