Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data

In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2019-04, Vol.30 (4), p.1250-1258
Hauptverfasser:	Yu, Yi, Tang, Suhua, Aizawa, Kiyoharu, Aizawa, Akiko
Format:	Artikel
Sprache:	eng
Schlagworte:	Business Category-based deep canonical correlation analysis (C-DCCA) Correlation Correlation analysis Cross-modal cross-modal retrieval Data models Datasets Feasibility studies Feature extraction fine-grained venue discovery Internet Machine learning multimodal data Pairwise error probability Searching Sensory integration Venue Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1258
container_issue	4
container_start_page	1250
container_title	IEEE transaction on neural networks and learning systems
container_volume	30
creator	Yu, Yi Tang, Suhua Aizawa, Kiyoharu Aizawa, Akiko
description	In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.
doi_str_mv	10.1109/TNNLS.2018.2856253
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8432497</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8432497</ieee_id><sourcerecordid>2194171803</sourcerecordid><originalsourceid>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</originalsourceid><addsrcrecordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2194171803</pqid></control><display><type>article</type><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><source>IEEE Electronic Library (IEL)</source><creator>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creator><creatorcontrib>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creatorcontrib><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2018.2856253</identifier><identifier>PMID: 30106743</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Business ; Category-based deep canonical correlation analysis (C-DCCA) ; Correlation ; Correlation analysis ; Cross-modal ; cross-modal retrieval ; Data models ; Datasets ; Feasibility studies ; Feature extraction ; fine-grained venue discovery ; Internet ; Machine learning ; multimodal data ; Pairwise error probability ; Searching ; Sensory integration ; Venue ; Visualization</subject><ispartof>IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</citedby><cites>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</cites><orcidid>0000-0003-2146-6275 ; 0000-0002-0294-6620 ; 0000-0002-5784-8411</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30106743$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><subject>Business</subject><subject>Category-based deep canonical correlation analysis (C-DCCA)</subject><subject>Correlation</subject><subject>Correlation analysis</subject><subject>Cross-modal</subject><subject>cross-modal retrieval</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feasibility studies</subject><subject>Feature extraction</subject><subject>fine-grained venue discovery</subject><subject>Internet</subject><subject>Machine learning</subject><subject>multimodal data</subject><subject>Pairwise error probability</subject><subject>Searching</subject><subject>Sensory integration</subject><subject>Venue</subject><subject>Visualization</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Yu, Yi</creator><creator>Tang, Suhua</creator><creator>Aizawa, Kiyoharu</creator><creator>Aizawa, Akiko</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></search><sort><creationdate>20190401</creationdate><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><author>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Business</topic><topic>Category-based deep canonical correlation analysis (C-DCCA)</topic><topic>Correlation</topic><topic>Correlation analysis</topic><topic>Cross-modal</topic><topic>cross-modal retrieval</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feasibility studies</topic><topic>Feature extraction</topic><topic>fine-grained venue discovery</topic><topic>Internet</topic><topic>Machine learning</topic><topic>multimodal data</topic><topic>Pairwise error probability</topic><topic>Searching</topic><topic>Sensory integration</topic><topic>Venue</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Yi</au><au>Tang, Suhua</au><au>Aizawa, Kiyoharu</au><au>Aizawa, Akiko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>1250</spage><epage>1258</epage><pages>1250-1258</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30106743</pmid><doi>10.1109/TNNLS.2018.2856253</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2162-237X
ispartof	IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258
issn	2162-237X 2162-2388
language	eng
recordid	cdi_ieee_primary_8432497
source	IEEE Electronic Library (IEL)
subjects	Business Category-based deep canonical correlation analysis (C-DCCA) Correlation Correlation analysis Cross-modal cross-modal retrieval Data models Datasets Feasibility studies Feature extraction fine-grained venue discovery Internet Machine learning multimodal data Pairwise error probability Searching Sensory integration Venue Visualization
title	Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T08%3A21%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Category-Based%20Deep%20CCA%20for%20Fine-Grained%20Venue%20Discovery%20From%20Multimodal%20Data&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Yu,%20Yi&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=1250&rft.epage=1258&rft.pages=1250-1258&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2018.2856253&rft_dat=%3Cproquest_RIE%3E2194171803%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2194171803&rft_id=info:pmid/30106743&rft_ieee_id=8432497&rfr_iscdi=true