Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data

When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a ru...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Marketing science (Providence, R.I.) R.I.), 2014-03, Vol.33 (2), p.188-205
Hauptverfasser:	Schwartz, Eric M., Bradlow, Eric T., Fader, Peter S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Ability Analysis Artificial intelligence Business intelligence Classification classification tree Customer relationship management Customers Data models data science Database models Datasets Decision Decision making Decision trees Descriptive statistics Forecasting Forecasting models hidden Markov models hierarchical Bayesian methods Information classification Learning Longitudinal studies Machine learning Marketing Markov analysis Markov processes Markovian processes model selection posterior predictive model checking Predictive modeling Purchasing random forest Statistics Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	205
container_issue	2
container_start_page	188
container_title	Marketing science (Providence, R.I.)
container_volume	33
creator	Schwartz, Eric M. Bradlow, Eric T. Fader, Peter S.
description	When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model" before having to fit any of them for longitudinal incidence data. We characterize data sets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the "legwork" of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for data set characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth" migration between latent states) are more important to accommodate than are others (e.g., the inclusion of an "off" state with no activity). We also demonstrate the method's broad potential by providing a general "recipe" for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).
doi_str_mv	10.1287/mksc.2013.0825
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_1511818040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A363973690</galeid><jstor_id>24544824</jstor_id><sourcerecordid>A363973690</sourcerecordid><originalsourceid>FETCH-LOGICAL-c606t-8db681146727a34b373be0dbb2cc0f74ffc2add59c369b5778f03941c8bb4b173</originalsourceid><addsrcrecordid>eNqFks2L1DAYxosoOI5evQkFLx62Y77apN6WWT8WRjy4C95Kkr7pZuwkY95U8L-33RVXZUACCYTf8_B-PEXxnJINZUq-PnxFu2GE8g1RrH5QrGjNmqoW6svDYkUkZxXjbfu4eIK4J4RIRtSqOH6MPYzlZxjBZh9DeY0-DOWFztpohHJ7o5O2GZLH7C2-KS_gO4zxuEC63I4a0Ttv9a32KgGULqZyF8Pg89T7oMfyMljfQ7Bw6_q0eOT0iPDs17surt-9vdp-qHaf3l9uz3eVbUiTK9WbRlEqGsmk5sJwyQ2Q3hhmLXFSOGeZ7vu6tbxpTS2lcoS3glpljDBU8nXx6s73mOK3CTB3B48WxlEHiBN283Ao54TM97p4-Q-6j1OaS18oShVVRJB7atAjdD64mOfJLKbdOW94K-dCFqo6QQ0QIOkxBnB-_v6L35zg59PDwduTgrM_BGaa1wXzDgL64SbjoCfEk_42RcQErjsmf9DpR0dJt8SmW2LTLbHpltjMghd3gj3mmH7TTNRCKCbuG1xqTQf8n99PXsnMgw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1511818040</pqid></control><display><type>article</type><title>Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data</title><source>INFORMS PubsOnLine</source><source>Business Source Complete</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Schwartz, Eric M. ; Bradlow, Eric T. ; Fader, Peter S.</creator><creatorcontrib>Schwartz, Eric M. ; Bradlow, Eric T. ; Fader, Peter S.</creatorcontrib><description>When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model" before having to fit any of them for longitudinal incidence data. We characterize data sets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the "legwork" of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for data set characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth" migration between latent states) are more important to accommodate than are others (e.g., the inclusion of an "off" state with no activity). We also demonstrate the method's broad potential by providing a general "recipe" for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).</description><identifier>ISSN: 0732-2399</identifier><identifier>EISSN: 1526-548X</identifier><identifier>DOI: 10.1287/mksc.2013.0825</identifier><identifier>CODEN: MARSE5</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Ability ; Analysis ; Artificial intelligence ; Business intelligence ; Classification ; classification tree ; Customer relationship management ; Customers ; Data models ; data science ; Database models ; Datasets ; Decision ; Decision making ; Decision trees ; Descriptive statistics ; Forecasting ; Forecasting models ; hidden Markov models ; hierarchical Bayesian methods ; Information classification ; Learning ; Longitudinal studies ; Machine learning ; Marketing ; Markov analysis ; Markov processes ; Markovian processes ; model selection ; posterior predictive model checking ; Predictive modeling ; Purchasing ; random forest ; Statistics ; Studies</subject><ispartof>Marketing science (Providence, R.I.), 2014-03, Vol.33 (2), p.188-205</ispartof><rights>2014 INFORMS</rights><rights>COPYRIGHT 2014 Institute for Operations Research and the Management Sciences</rights><rights>Copyright Institute for Operations Research and the Management Sciences Mar/Apr 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c606t-8db681146727a34b373be0dbb2cc0f74ffc2add59c369b5778f03941c8bb4b173</citedby><cites>FETCH-LOGICAL-c606t-8db681146727a34b373be0dbb2cc0f74ffc2add59c369b5778f03941c8bb4b173</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/24544824$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/mksc.2013.0825$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,780,784,803,3692,27924,27925,58017,58250,62616</link.rule.ids></links><search><creatorcontrib>Schwartz, Eric M.</creatorcontrib><creatorcontrib>Bradlow, Eric T.</creatorcontrib><creatorcontrib>Fader, Peter S.</creatorcontrib><title>Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data</title><title>Marketing science (Providence, R.I.)</title><description>When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model" before having to fit any of them for longitudinal incidence data. We characterize data sets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the "legwork" of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for data set characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth" migration between latent states) are more important to accommodate than are others (e.g., the inclusion of an "off" state with no activity). We also demonstrate the method's broad potential by providing a general "recipe" for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).</description><subject>Ability</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Business intelligence</subject><subject>Classification</subject><subject>classification tree</subject><subject>Customer relationship management</subject><subject>Customers</subject><subject>Data models</subject><subject>data science</subject><subject>Database models</subject><subject>Datasets</subject><subject>Decision</subject><subject>Decision making</subject><subject>Decision trees</subject><subject>Descriptive statistics</subject><subject>Forecasting</subject><subject>Forecasting models</subject><subject>hidden Markov models</subject><subject>hierarchical Bayesian methods</subject><subject>Information classification</subject><subject>Learning</subject><subject>Longitudinal studies</subject><subject>Machine learning</subject><subject>Marketing</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Markovian processes</subject><subject>model selection</subject><subject>posterior predictive model checking</subject><subject>Predictive modeling</subject><subject>Purchasing</subject><subject>random forest</subject><subject>Statistics</subject><subject>Studies</subject><issn>0732-2399</issn><issn>1526-548X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>N95</sourceid><recordid>eNqFks2L1DAYxosoOI5evQkFLx62Y77apN6WWT8WRjy4C95Kkr7pZuwkY95U8L-33RVXZUACCYTf8_B-PEXxnJINZUq-PnxFu2GE8g1RrH5QrGjNmqoW6svDYkUkZxXjbfu4eIK4J4RIRtSqOH6MPYzlZxjBZh9DeY0-DOWFztpohHJ7o5O2GZLH7C2-KS_gO4zxuEC63I4a0Ttv9a32KgGULqZyF8Pg89T7oMfyMljfQ7Bw6_q0eOT0iPDs17surt-9vdp-qHaf3l9uz3eVbUiTK9WbRlEqGsmk5sJwyQ2Q3hhmLXFSOGeZ7vu6tbxpTS2lcoS3glpljDBU8nXx6s73mOK3CTB3B48WxlEHiBN283Ao54TM97p4-Q-6j1OaS18oShVVRJB7atAjdD64mOfJLKbdOW94K-dCFqo6QQ0QIOkxBnB-_v6L35zg59PDwduTgrM_BGaa1wXzDgL64SbjoCfEk_42RcQErjsmf9DpR0dJt8SmW2LTLbHpltjMghd3gj3mmH7TTNRCKCbuG1xqTQf8n99PXsnMgw</recordid><startdate>20140301</startdate><enddate>20140301</enddate><creator>Schwartz, Eric M.</creator><creator>Bradlow, Eric T.</creator><creator>Fader, Peter S.</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>XI7</scope><scope>8BJ</scope><scope>FQK</scope><scope>JBE</scope></search><sort><creationdate>20140301</creationdate><title>Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data</title><author>Schwartz, Eric M. ; Bradlow, Eric T. ; Fader, Peter S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c606t-8db681146727a34b373be0dbb2cc0f74ffc2add59c369b5778f03941c8bb4b173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Ability</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Business intelligence</topic><topic>Classification</topic><topic>classification tree</topic><topic>Customer relationship management</topic><topic>Customers</topic><topic>Data models</topic><topic>data science</topic><topic>Database models</topic><topic>Datasets</topic><topic>Decision</topic><topic>Decision making</topic><topic>Decision trees</topic><topic>Descriptive statistics</topic><topic>Forecasting</topic><topic>Forecasting models</topic><topic>hidden Markov models</topic><topic>hierarchical Bayesian methods</topic><topic>Information classification</topic><topic>Learning</topic><topic>Longitudinal studies</topic><topic>Machine learning</topic><topic>Marketing</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Markovian processes</topic><topic>model selection</topic><topic>posterior predictive model checking</topic><topic>Predictive modeling</topic><topic>Purchasing</topic><topic>random forest</topic><topic>Statistics</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Schwartz, Eric M.</creatorcontrib><creatorcontrib>Bradlow, Eric T.</creatorcontrib><creatorcontrib>Fader, Peter S.</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>Business Insights: Essentials</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><jtitle>Marketing science (Providence, R.I.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Schwartz, Eric M.</au><au>Bradlow, Eric T.</au><au>Fader, Peter S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data</atitle><jtitle>Marketing science (Providence, R.I.)</jtitle><date>2014-03-01</date><risdate>2014</risdate><volume>33</volume><issue>2</issue><spage>188</spage><epage>205</epage><pages>188-205</pages><issn>0732-2399</issn><eissn>1526-548X</eissn><coden>MARSE5</coden><abstract>When managers and researchers encounter a data set, they typically ask two key questions: (1) Which model (from a candidate set) should I use? And (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model" before having to fit any of them for longitudinal incidence data. We characterize data sets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the "legwork" of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for data set characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth" migration between latent states) are more important to accommodate than are others (e.g., the inclusion of an "off" state with no activity). We also demonstrate the method's broad potential by providing a general "recipe" for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/mksc.2013.0825</doi><tpages>18</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0732-2399
ispartof	Marketing science (Providence, R.I.), 2014-03, Vol.33 (2), p.188-205
issn	0732-2399 1526-548X
language	eng
recordid	cdi_proquest_journals_1511818040
source	INFORMS PubsOnLine; Business Source Complete; JSTOR Archive Collection A-Z Listing
subjects	Ability Analysis Artificial intelligence Business intelligence Classification classification tree Customer relationship management Customers Data models data science Database models Datasets Decision Decision making Decision trees Descriptive statistics Forecasting Forecasting models hidden Markov models hierarchical Bayesian methods Information classification Learning Longitudinal studies Machine learning Marketing Markov analysis Markov processes Markovian processes model selection posterior predictive model checking Predictive modeling Purchasing random forest Statistics Studies
title	Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T01%3A07%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Model%20Selection%20Using%20Database%20Characteristics:%20Developing%20a%20Classification%20Tree%20for%20Longitudinal%20Incidence%20Data&rft.jtitle=Marketing%20science%20(Providence,%20R.I.)&rft.au=Schwartz,%20Eric%20M.&rft.date=2014-03-01&rft.volume=33&rft.issue=2&rft.spage=188&rft.epage=205&rft.pages=188-205&rft.issn=0732-2399&rft.eissn=1526-548X&rft.coden=MARSE5&rft_id=info:doi/10.1287/mksc.2013.0825&rft_dat=%3Cgale_proqu%3EA363973690%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1511818040&rft_id=info:pmid/&rft_galeid=A363973690&rft_jstor_id=24544824&rfr_iscdi=true