Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data

In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2019-04, Vol.30 (4), p.1250-1258
Hauptverfasser: Yu, Yi, Tang, Suhua, Aizawa, Kiyoharu, Aizawa, Akiko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1258
container_issue 4
container_start_page 1250
container_title IEEE transaction on neural networks and learning systems
container_volume 30
creator Yu, Yi
Tang, Suhua
Aizawa, Kiyoharu
Aizawa, Akiko
description In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.
doi_str_mv 10.1109/TNNLS.2018.2856253
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8432497</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8432497</ieee_id><sourcerecordid>2194171803</sourcerecordid><originalsourceid>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</originalsourceid><addsrcrecordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2194171803</pqid></control><display><type>article</type><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><source>IEEE Electronic Library (IEL)</source><creator>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creator><creatorcontrib>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creatorcontrib><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2018.2856253</identifier><identifier>PMID: 30106743</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Business ; Category-based deep canonical correlation analysis (C-DCCA) ; Correlation ; Correlation analysis ; Cross-modal ; cross-modal retrieval ; Data models ; Datasets ; Feasibility studies ; Feature extraction ; fine-grained venue discovery ; Internet ; Machine learning ; multimodal data ; Pairwise error probability ; Searching ; Sensory integration ; Venue ; Visualization</subject><ispartof>IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</citedby><cites>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</cites><orcidid>0000-0003-2146-6275 ; 0000-0002-0294-6620 ; 0000-0002-5784-8411</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30106743$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><subject>Business</subject><subject>Category-based deep canonical correlation analysis (C-DCCA)</subject><subject>Correlation</subject><subject>Correlation analysis</subject><subject>Cross-modal</subject><subject>cross-modal retrieval</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feasibility studies</subject><subject>Feature extraction</subject><subject>fine-grained venue discovery</subject><subject>Internet</subject><subject>Machine learning</subject><subject>multimodal data</subject><subject>Pairwise error probability</subject><subject>Searching</subject><subject>Sensory integration</subject><subject>Venue</subject><subject>Visualization</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Yu, Yi</creator><creator>Tang, Suhua</creator><creator>Aizawa, Kiyoharu</creator><creator>Aizawa, Akiko</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></search><sort><creationdate>20190401</creationdate><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><author>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Business</topic><topic>Category-based deep canonical correlation analysis (C-DCCA)</topic><topic>Correlation</topic><topic>Correlation analysis</topic><topic>Cross-modal</topic><topic>cross-modal retrieval</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feasibility studies</topic><topic>Feature extraction</topic><topic>fine-grained venue discovery</topic><topic>Internet</topic><topic>Machine learning</topic><topic>multimodal data</topic><topic>Pairwise error probability</topic><topic>Searching</topic><topic>Sensory integration</topic><topic>Venue</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Yi</au><au>Tang, Suhua</au><au>Aizawa, Kiyoharu</au><au>Aizawa, Akiko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>1250</spage><epage>1258</epage><pages>1250-1258</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30106743</pmid><doi>10.1109/TNNLS.2018.2856253</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258
issn 2162-237X
2162-2388
language eng
recordid cdi_ieee_primary_8432497
source IEEE Electronic Library (IEL)
subjects Business
Category-based deep canonical correlation analysis (C-DCCA)
Correlation
Correlation analysis
Cross-modal
cross-modal retrieval
Data models
Datasets
Feasibility studies
Feature extraction
fine-grained venue discovery
Internet
Machine learning
multimodal data
Pairwise error probability
Searching
Sensory integration
Venue
Visualization
title Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T08%3A21%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Category-Based%20Deep%20CCA%20for%20Fine-Grained%20Venue%20Discovery%20From%20Multimodal%20Data&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Yu,%20Yi&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=1250&rft.epage=1258&rft.pages=1250-1258&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2018.2856253&rft_dat=%3Cproquest_RIE%3E2194171803%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2194171803&rft_id=info:pmid/30106743&rft_ieee_id=8432497&rfr_iscdi=true