A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection

Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.148433-148448
Hauptverfasser: Hou, Bingxin, Liu, Ying, Ling, Nam, Liu, Lingzhi, Ren, Yongxiong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 148448
container_issue
container_start_page 148433
container_title IEEE access
container_volume 9
creator Hou, Bingxin
Liu, Ying
Ling, Nam
Liu, Lingzhi
Ren, Yongxiong
description Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. In this paper, we propose a super-fast (inference speed-154 fps) and lightweight (model size-1.45 MB) end-to-end 3D separable convolutional neural network with a multi-input multi-output (MIMO) strategy named "3DS_MM" for moving object detection. To improve detection accuracy, the proposed model adopts 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. To reduce model size and computational complexity, the standard 3D convolution is decomposed into depthwise and pointwise convolutions. Besides, we proposed a MIMO strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. Further, we conducted the scene dependent evaluation (SDE) and scene independent evaluation (SIE) on the benchmark CDnet2014 and DAVIS2016 datasets. Compared to state-of-the-art approaches, our proposed method significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the SDE setup and maintaining a competitive detection accuracy in the SIE setup.
doi_str_mv 10.1109/ACCESS.2021.3123975
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9592757</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9592757</ieee_id><doaj_id>oai_doaj_org_article_95b6bf8fc40a4a729d71a7693329d1e2</doaj_id><sourcerecordid>2595723280</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-168af8f33980c6da62712f844448ac6f09ddee3c33860f3e958970e5741528f33</originalsourceid><addsrcrecordid>eNpNUctOwzAQjBBIIOALuFjinOJH_TpW4VWp0ENBHC032RSXUBfHacXf45AKsQfveLUzI-1k2RXBI0KwvpkUxd1iMaKYkhEjlGnJj7IzSoTOGWfi-B8-zS7bdo1TqTTi8izbTdC9bSOaudV73EP_InaLFrC1wS4bQIXf7HzTRec3tkHP0IXfFvc-fKA3F9_RU9dEl0832y4e8LyL_af2AT35ndus0Hy5hjKiW4ipJamL7KS2TQuXh36evd7fvRSP-Wz-MC0ms7wcYxVzIpStVc2YVrgUlRVUElqrcSplS1FjXVUArGRMCVwz0FxpiYHLMeG0551n00G38nZttsF92vBtvHXmd-DDytgQXdmA0XwplsksOduxlVRXklgpNGMJEqBJ63rQ2gb_1UEbzdp3IV2lNZSnY1JGFU5bbNgqg2_bAPWfK8Gmz8sMeZk-L3PIK7GuBpYDgD-G5ppKLtkPupGQZA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2595723280</pqid></control><display><type>article</type><title>A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Hou, Bingxin ; Liu, Ying ; Ling, Nam ; Liu, Lingzhi ; Ren, Yongxiong</creator><creatorcontrib>Hou, Bingxin ; Liu, Ying ; Ling, Nam ; Liu, Lingzhi ; Ren, Yongxiong</creatorcontrib><description>Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. In this paper, we propose a super-fast (inference speed-154 fps) and lightweight (model size-1.45 MB) end-to-end 3D separable convolutional neural network with a multi-input multi-output (MIMO) strategy named "3DS_MM" for moving object detection. To improve detection accuracy, the proposed model adopts 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. To reduce model size and computational complexity, the standard 3D convolution is decomposed into depthwise and pointwise convolutions. Besides, we proposed a MIMO strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. Further, we conducted the scene dependent evaluation (SDE) and scene independent evaluation (SIE) on the benchmark CDnet2014 and DAVIS2016 datasets. Compared to state-of-the-art approaches, our proposed method significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the SDE setup and maintaining a competitive detection accuracy in the SIE setup.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3123975</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>3D separable convolution ; Accuracy ; Artificial neural networks ; Complexity ; Computational modeling ; Convolution ; Convolutional neural network ; depthwise convolution ; Frames (data processing) ; Inference ; Lightweight ; MIMO communication ; Model accuracy ; moving object detection ; Moving object recognition ; multi-input multi-output ; Neural networks ; Object detection ; pointwise convolution ; scene independent evaluation ; Solid modeling ; Strategy ; Task analysis ; Three dimensional models ; Three-dimensional displays ; Two dimensional models ; unseen videos ; video analytics ; Video data ; video surveillance</subject><ispartof>IEEE access, 2021, Vol.9, p.148433-148448</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-168af8f33980c6da62712f844448ac6f09ddee3c33860f3e958970e5741528f33</citedby><cites>FETCH-LOGICAL-c408t-168af8f33980c6da62712f844448ac6f09ddee3c33860f3e958970e5741528f33</cites><orcidid>0000-0002-9681-656X ; 0000-0002-5741-7937 ; 0000-0002-8596-5199 ; 0000-0003-3380-4243</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9592757$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hou, Bingxin</creatorcontrib><creatorcontrib>Liu, Ying</creatorcontrib><creatorcontrib>Ling, Nam</creatorcontrib><creatorcontrib>Liu, Lingzhi</creatorcontrib><creatorcontrib>Ren, Yongxiong</creatorcontrib><title>A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection</title><title>IEEE access</title><addtitle>Access</addtitle><description>Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. In this paper, we propose a super-fast (inference speed-154 fps) and lightweight (model size-1.45 MB) end-to-end 3D separable convolutional neural network with a multi-input multi-output (MIMO) strategy named "3DS_MM" for moving object detection. To improve detection accuracy, the proposed model adopts 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. To reduce model size and computational complexity, the standard 3D convolution is decomposed into depthwise and pointwise convolutions. Besides, we proposed a MIMO strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. Further, we conducted the scene dependent evaluation (SDE) and scene independent evaluation (SIE) on the benchmark CDnet2014 and DAVIS2016 datasets. Compared to state-of-the-art approaches, our proposed method significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the SDE setup and maintaining a competitive detection accuracy in the SIE setup.</description><subject>3D separable convolution</subject><subject>Accuracy</subject><subject>Artificial neural networks</subject><subject>Complexity</subject><subject>Computational modeling</subject><subject>Convolution</subject><subject>Convolutional neural network</subject><subject>depthwise convolution</subject><subject>Frames (data processing)</subject><subject>Inference</subject><subject>Lightweight</subject><subject>MIMO communication</subject><subject>Model accuracy</subject><subject>moving object detection</subject><subject>Moving object recognition</subject><subject>multi-input multi-output</subject><subject>Neural networks</subject><subject>Object detection</subject><subject>pointwise convolution</subject><subject>scene independent evaluation</subject><subject>Solid modeling</subject><subject>Strategy</subject><subject>Task analysis</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Two dimensional models</subject><subject>unseen videos</subject><subject>video analytics</subject><subject>Video data</subject><subject>video surveillance</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUctOwzAQjBBIIOALuFjinOJH_TpW4VWp0ENBHC032RSXUBfHacXf45AKsQfveLUzI-1k2RXBI0KwvpkUxd1iMaKYkhEjlGnJj7IzSoTOGWfi-B8-zS7bdo1TqTTi8izbTdC9bSOaudV73EP_InaLFrC1wS4bQIXf7HzTRec3tkHP0IXfFvc-fKA3F9_RU9dEl0832y4e8LyL_af2AT35ndus0Hy5hjKiW4ipJamL7KS2TQuXh36evd7fvRSP-Wz-MC0ms7wcYxVzIpStVc2YVrgUlRVUElqrcSplS1FjXVUArGRMCVwz0FxpiYHLMeG0551n00G38nZttsF92vBtvHXmd-DDytgQXdmA0XwplsksOduxlVRXklgpNGMJEqBJ63rQ2gb_1UEbzdp3IV2lNZSnY1JGFU5bbNgqg2_bAPWfK8Gmz8sMeZk-L3PIK7GuBpYDgD-G5ppKLtkPupGQZA</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Hou, Bingxin</creator><creator>Liu, Ying</creator><creator>Ling, Nam</creator><creator>Liu, Lingzhi</creator><creator>Ren, Yongxiong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9681-656X</orcidid><orcidid>https://orcid.org/0000-0002-5741-7937</orcidid><orcidid>https://orcid.org/0000-0002-8596-5199</orcidid><orcidid>https://orcid.org/0000-0003-3380-4243</orcidid></search><sort><creationdate>2021</creationdate><title>A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection</title><author>Hou, Bingxin ; Liu, Ying ; Ling, Nam ; Liu, Lingzhi ; Ren, Yongxiong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-168af8f33980c6da62712f844448ac6f09ddee3c33860f3e958970e5741528f33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>3D separable convolution</topic><topic>Accuracy</topic><topic>Artificial neural networks</topic><topic>Complexity</topic><topic>Computational modeling</topic><topic>Convolution</topic><topic>Convolutional neural network</topic><topic>depthwise convolution</topic><topic>Frames (data processing)</topic><topic>Inference</topic><topic>Lightweight</topic><topic>MIMO communication</topic><topic>Model accuracy</topic><topic>moving object detection</topic><topic>Moving object recognition</topic><topic>multi-input multi-output</topic><topic>Neural networks</topic><topic>Object detection</topic><topic>pointwise convolution</topic><topic>scene independent evaluation</topic><topic>Solid modeling</topic><topic>Strategy</topic><topic>Task analysis</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Two dimensional models</topic><topic>unseen videos</topic><topic>video analytics</topic><topic>Video data</topic><topic>video surveillance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hou, Bingxin</creatorcontrib><creatorcontrib>Liu, Ying</creatorcontrib><creatorcontrib>Ling, Nam</creatorcontrib><creatorcontrib>Liu, Lingzhi</creatorcontrib><creatorcontrib>Ren, Yongxiong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hou, Bingxin</au><au>Liu, Ying</au><au>Ling, Nam</au><au>Liu, Lingzhi</au><au>Ren, Yongxiong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>148433</spage><epage>148448</epage><pages>148433-148448</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Advances in moving object detection have been driven by the active application of deep learning methods. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. In this paper, we propose a super-fast (inference speed-154 fps) and lightweight (model size-1.45 MB) end-to-end 3D separable convolutional neural network with a multi-input multi-output (MIMO) strategy named "3DS_MM" for moving object detection. To improve detection accuracy, the proposed model adopts 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. To reduce model size and computational complexity, the standard 3D convolution is decomposed into depthwise and pointwise convolutions. Besides, we proposed a MIMO strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. Further, we conducted the scene dependent evaluation (SDE) and scene independent evaluation (SIE) on the benchmark CDnet2014 and DAVIS2016 datasets. Compared to state-of-the-art approaches, our proposed method significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the SDE setup and maintaining a competitive detection accuracy in the SIE setup.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3123975</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-9681-656X</orcidid><orcidid>https://orcid.org/0000-0002-5741-7937</orcidid><orcidid>https://orcid.org/0000-0002-8596-5199</orcidid><orcidid>https://orcid.org/0000-0003-3380-4243</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.148433-148448
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_9592757
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects 3D separable convolution
Accuracy
Artificial neural networks
Complexity
Computational modeling
Convolution
Convolutional neural network
depthwise convolution
Frames (data processing)
Inference
Lightweight
MIMO communication
Model accuracy
moving object detection
Moving object recognition
multi-input multi-output
Neural networks
Object detection
pointwise convolution
scene independent evaluation
Solid modeling
Strategy
Task analysis
Three dimensional models
Three-dimensional displays
Two dimensional models
unseen videos
video analytics
Video data
video surveillance
title A Fast Lightweight 3D Separable Convolutional Neural Network With Multi-Input Multi-Output for Moving Object Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T12%3A27%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Fast%20Lightweight%203D%20Separable%20Convolutional%20Neural%20Network%20With%20Multi-Input%20Multi-Output%20for%20Moving%20Object%20Detection&rft.jtitle=IEEE%20access&rft.au=Hou,%20Bingxin&rft.date=2021&rft.volume=9&rft.spage=148433&rft.epage=148448&rft.pages=148433-148448&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3123975&rft_dat=%3Cproquest_ieee_%3E2595723280%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2595723280&rft_id=info:pmid/&rft_ieee_id=9592757&rft_doaj_id=oai_doaj_org_article_95b6bf8fc40a4a729d71a7693329d1e2&rfr_iscdi=true