A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data

•Multi-parameter models were introduced recently to overcome the NB model limitations.•We developed the negative binomial-Dirichlet process (NB-DP) model.•The NB-DP was compared to the NB and NB-Lindley (NB-L) models.•The NB-DP offers a better performance than the NB-L for heavy-tailed datasets.•The...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Accident analysis and prevention 2016-06, Vol.91, p.10-18
Hauptverfasser: Shirazi, Mohammadali, Lord, Dominique, Dhavala, Soma Sekhar, Geedipally, Srinivas Reddy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 18
container_issue
container_start_page 10
container_title Accident analysis and prevention
container_volume 91
creator Shirazi, Mohammadali
Lord, Dominique
Dhavala, Soma Sekhar
Geedipally, Srinivas Reddy
description •Multi-parameter models were introduced recently to overcome the NB model limitations.•We developed the negative binomial-Dirichlet process (NB-DP) model.•The NB-DP was compared to the NB and NB-Lindley (NB-L) models.•The NB-DP offers a better performance than the NB-L for heavy-tailed datasets.•The NB-DP can provide useful information about the characteristics of the data. Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion.
doi_str_mv 10.1016/j.aap.2016.02.020
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808086443</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0001457516300537</els_id><sourcerecordid>1785243349</sourcerecordid><originalsourceid>FETCH-LOGICAL-c447t-cac3c5d11d46a88f4abc258a00a9ade409dfea23b3a809a7560d8214f40dd6113</originalsourceid><addsrcrecordid>eNqNkc-OEzEMxiMEYpeFB-CCcuQyJckkMxk4rSr-SStxgXPkJp421UwyJGnR8jw8KCktHAHFUmzp58-WP0Kec7bijHev9iuAZSVqumKiBntArrnuh0Yw1T8k14wx3kjVqyvyJOd9LXvdq8fkSnSDVLIX1-THLc04-wUSzFiStzTgFoo_It34EGcPE91iwAST_46OTj4gJDpHhxMd4yXzYUvjEVPjfF4w5QraeAiFOihAv_myo0B3CMd7WsBPr-l6Vwfagsnn4m2mEByFZZm8rbNjyLREahPk3S-Fp-TRCFPGZ5f_hnx59_bz-kNz9-n9x_XtXWOl7EtjwbZWOc6d7EDrUcLGCqWBMRjAoWSDGxFEu2lBswF61TGnBZejZM51nLc35OVZd0nx6wFzMbPPFqcJAsZDNlyz-jop23-jvVaicnL4H5QpNrT6tAA_ozbFnBOOZkl-hnRvODMny83eVMvNyXLDRA1We15c5A-bGd2fjt8eV-DNGcB6uqPHZLL1GCw6n9AW46L_i_xPviu--Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1780509381</pqid></control><display><type>article</type><title>A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Shirazi, Mohammadali ; Lord, Dominique ; Dhavala, Soma Sekhar ; Geedipally, Srinivas Reddy</creator><creatorcontrib>Shirazi, Mohammadali ; Lord, Dominique ; Dhavala, Soma Sekhar ; Geedipally, Srinivas Reddy</creatorcontrib><description>•Multi-parameter models were introduced recently to overcome the NB model limitations.•We developed the negative binomial-Dirichlet process (NB-DP) model.•The NB-DP was compared to the NB and NB-Lindley (NB-L) models.•The NB-DP offers a better performance than the NB-L for heavy-tailed datasets.•The NB-DP can provide useful information about the characteristics of the data. Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion.</description><identifier>ISSN: 0001-4575</identifier><identifier>EISSN: 1879-2057</identifier><identifier>DOI: 10.1016/j.aap.2016.02.020</identifier><identifier>PMID: 26945472</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Accidents, Traffic - statistics &amp; numerical data ; Binomials ; Byproducts ; Counting ; Crash data ; Crashes ; Dirichlet problem ; Dirichlet process ; Dispersions ; Flexibility ; Generalized linear model ; Humans ; Linear Models ; Models, Statistical ; Negative binomial ; Niobium ; Safety</subject><ispartof>Accident analysis and prevention, 2016-06, Vol.91, p.10-18</ispartof><rights>2016 Elsevier Ltd</rights><rights>Copyright © 2016 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c447t-cac3c5d11d46a88f4abc258a00a9ade409dfea23b3a809a7560d8214f40dd6113</citedby><cites>FETCH-LOGICAL-c447t-cac3c5d11d46a88f4abc258a00a9ade409dfea23b3a809a7560d8214f40dd6113</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.aap.2016.02.020$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26945472$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shirazi, Mohammadali</creatorcontrib><creatorcontrib>Lord, Dominique</creatorcontrib><creatorcontrib>Dhavala, Soma Sekhar</creatorcontrib><creatorcontrib>Geedipally, Srinivas Reddy</creatorcontrib><title>A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data</title><title>Accident analysis and prevention</title><addtitle>Accid Anal Prev</addtitle><description>•Multi-parameter models were introduced recently to overcome the NB model limitations.•We developed the negative binomial-Dirichlet process (NB-DP) model.•The NB-DP was compared to the NB and NB-Lindley (NB-L) models.•The NB-DP offers a better performance than the NB-L for heavy-tailed datasets.•The NB-DP can provide useful information about the characteristics of the data. Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion.</description><subject>Accidents, Traffic - statistics &amp; numerical data</subject><subject>Binomials</subject><subject>Byproducts</subject><subject>Counting</subject><subject>Crash data</subject><subject>Crashes</subject><subject>Dirichlet problem</subject><subject>Dirichlet process</subject><subject>Dispersions</subject><subject>Flexibility</subject><subject>Generalized linear model</subject><subject>Humans</subject><subject>Linear Models</subject><subject>Models, Statistical</subject><subject>Negative binomial</subject><subject>Niobium</subject><subject>Safety</subject><issn>0001-4575</issn><issn>1879-2057</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkc-OEzEMxiMEYpeFB-CCcuQyJckkMxk4rSr-SStxgXPkJp421UwyJGnR8jw8KCktHAHFUmzp58-WP0Kec7bijHev9iuAZSVqumKiBntArrnuh0Yw1T8k14wx3kjVqyvyJOd9LXvdq8fkSnSDVLIX1-THLc04-wUSzFiStzTgFoo_It34EGcPE91iwAST_46OTj4gJDpHhxMd4yXzYUvjEVPjfF4w5QraeAiFOihAv_myo0B3CMd7WsBPr-l6Vwfagsnn4m2mEByFZZm8rbNjyLREahPk3S-Fp-TRCFPGZ5f_hnx59_bz-kNz9-n9x_XtXWOl7EtjwbZWOc6d7EDrUcLGCqWBMRjAoWSDGxFEu2lBswF61TGnBZejZM51nLc35OVZd0nx6wFzMbPPFqcJAsZDNlyz-jop23-jvVaicnL4H5QpNrT6tAA_ozbFnBOOZkl-hnRvODMny83eVMvNyXLDRA1We15c5A-bGd2fjt8eV-DNGcB6uqPHZLL1GCw6n9AW46L_i_xPviu--Q</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Shirazi, Mohammadali</creator><creator>Lord, Dominique</creator><creator>Dhavala, Soma Sekhar</creator><creator>Geedipally, Srinivas Reddy</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7T2</scope><scope>7U2</scope><scope>C1K</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>KR7</scope></search><sort><creationdate>20160601</creationdate><title>A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data</title><author>Shirazi, Mohammadali ; Lord, Dominique ; Dhavala, Soma Sekhar ; Geedipally, Srinivas Reddy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c447t-cac3c5d11d46a88f4abc258a00a9ade409dfea23b3a809a7560d8214f40dd6113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Accidents, Traffic - statistics &amp; numerical data</topic><topic>Binomials</topic><topic>Byproducts</topic><topic>Counting</topic><topic>Crash data</topic><topic>Crashes</topic><topic>Dirichlet problem</topic><topic>Dirichlet process</topic><topic>Dispersions</topic><topic>Flexibility</topic><topic>Generalized linear model</topic><topic>Humans</topic><topic>Linear Models</topic><topic>Models, Statistical</topic><topic>Negative binomial</topic><topic>Niobium</topic><topic>Safety</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shirazi, Mohammadali</creatorcontrib><creatorcontrib>Lord, Dominique</creatorcontrib><creatorcontrib>Dhavala, Soma Sekhar</creatorcontrib><creatorcontrib>Geedipally, Srinivas Reddy</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Safety Science and Risk</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Civil Engineering Abstracts</collection><jtitle>Accident analysis and prevention</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shirazi, Mohammadali</au><au>Lord, Dominique</au><au>Dhavala, Soma Sekhar</au><au>Geedipally, Srinivas Reddy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data</atitle><jtitle>Accident analysis and prevention</jtitle><addtitle>Accid Anal Prev</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>91</volume><spage>10</spage><epage>18</epage><pages>10-18</pages><issn>0001-4575</issn><eissn>1879-2057</eissn><abstract>•Multi-parameter models were introduced recently to overcome the NB model limitations.•We developed the negative binomial-Dirichlet process (NB-DP) model.•The NB-DP was compared to the NB and NB-Lindley (NB-L) models.•The NB-DP offers a better performance than the NB-L for heavy-tailed datasets.•The NB-DP can provide useful information about the characteristics of the data. Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>26945472</pmid><doi>10.1016/j.aap.2016.02.020</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0001-4575
ispartof Accident analysis and prevention, 2016-06, Vol.91, p.10-18
issn 0001-4575
1879-2057
language eng
recordid cdi_proquest_miscellaneous_1808086443
source MEDLINE; Elsevier ScienceDirect Journals
subjects Accidents, Traffic - statistics & numerical data
Binomials
Byproducts
Counting
Crash data
Crashes
Dirichlet problem
Dirichlet process
Dispersions
Flexibility
Generalized linear model
Humans
Linear Models
Models, Statistical
Negative binomial
Niobium
Safety
title A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T02%3A46%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20semiparametric%20negative%20binomial%20generalized%20linear%20model%20for%20modeling%20over-dispersed%20count%20data%20with%20a%20heavy%20tail:%20Characteristics%20and%20applications%20to%20crash%20data&rft.jtitle=Accident%20analysis%20and%20prevention&rft.au=Shirazi,%20Mohammadali&rft.date=2016-06-01&rft.volume=91&rft.spage=10&rft.epage=18&rft.pages=10-18&rft.issn=0001-4575&rft.eissn=1879-2057&rft_id=info:doi/10.1016/j.aap.2016.02.020&rft_dat=%3Cproquest_cross%3E1785243349%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1780509381&rft_id=info:pmid/26945472&rft_els_id=S0001457516300537&rfr_iscdi=true