Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data
In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fi...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2019-04, Vol.30 (4), p.1250-1258 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1258 |
---|---|
container_issue | 4 |
container_start_page | 1250 |
container_title | IEEE transaction on neural networks and learning systems |
container_volume | 30 |
creator | Yu, Yi Tang, Suhua Aizawa, Kiyoharu Aizawa, Akiko |
description | In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text. |
doi_str_mv | 10.1109/TNNLS.2018.2856253 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8432497</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8432497</ieee_id><sourcerecordid>2194171803</sourcerecordid><originalsourceid>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</originalsourceid><addsrcrecordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2194171803</pqid></control><display><type>article</type><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><source>IEEE Electronic Library (IEL)</source><creator>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creator><creatorcontrib>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</creatorcontrib><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2018.2856253</identifier><identifier>PMID: 30106743</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Business ; Category-based deep canonical correlation analysis (C-DCCA) ; Correlation ; Correlation analysis ; Cross-modal ; cross-modal retrieval ; Data models ; Datasets ; Feasibility studies ; Feature extraction ; fine-grained venue discovery ; Internet ; Machine learning ; multimodal data ; Pairwise error probability ; Searching ; Sensory integration ; Venue ; Visualization</subject><ispartof>IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</citedby><cites>FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</cites><orcidid>0000-0003-2146-6275 ; 0000-0002-0294-6620 ; 0000-0002-5784-8411</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8432497$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30106743$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</description><subject>Business</subject><subject>Category-based deep canonical correlation analysis (C-DCCA)</subject><subject>Correlation</subject><subject>Correlation analysis</subject><subject>Cross-modal</subject><subject>cross-modal retrieval</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feasibility studies</subject><subject>Feature extraction</subject><subject>fine-grained venue discovery</subject><subject>Internet</subject><subject>Machine learning</subject><subject>multimodal data</subject><subject>Pairwise error probability</subject><subject>Searching</subject><subject>Sensory integration</subject><subject>Venue</subject><subject>Visualization</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFqGzEQhkVpaEKSF2igCHrJZV1JI2mlY7qO04KTHJKW3oTWOxs27FqutBvw21euXR86lxmYb4afj5CPnM04Z_bL88PD8mkmGDczYZQWCt6RM8G1KAQY8_44l79OyWVKryyXZkpL-4GcAuNMlxLOyH3lR3wJcVt89QkbOkfc0Kq6oW2IdNGtsbiLPreG_sT1hHTepVV4w7ilixgGej_1YzeExvd07kd_QU5a3ye8PPRz8mNx-1x9K5aPd9-rm2WxkrwcCyVrVvtSy1YrIaUGBdIbD40p69JDDVBbq7S2ILQ3reHeshoRYaW84tzCObne_93E8HvCNLoh58K-92sMU3KCGVMq0AIy-vk_9DVMcZ3TOcFtjsMN21FiT61iSCli6zaxG3zcOs7czrf769vtfLuD73z06fB6qgdsjif_7Gbgag90OfxxbSQIaUv4A3X_gMQ</recordid><startdate>20190401</startdate><enddate>20190401</enddate><creator>Yu, Yi</creator><creator>Tang, Suhua</creator><creator>Aizawa, Kiyoharu</creator><creator>Aizawa, Akiko</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></search><sort><creationdate>20190401</creationdate><title>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</title><author>Yu, Yi ; Tang, Suhua ; Aizawa, Kiyoharu ; Aizawa, Akiko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c417t-54b0ba764f6524463534a8a3d87b7a3b33b995669326a8f81a90beee3c5a51193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Business</topic><topic>Category-based deep canonical correlation analysis (C-DCCA)</topic><topic>Correlation</topic><topic>Correlation analysis</topic><topic>Cross-modal</topic><topic>cross-modal retrieval</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feasibility studies</topic><topic>Feature extraction</topic><topic>fine-grained venue discovery</topic><topic>Internet</topic><topic>Machine learning</topic><topic>multimodal data</topic><topic>Pairwise error probability</topic><topic>Searching</topic><topic>Sensory integration</topic><topic>Venue</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Yi</creatorcontrib><creatorcontrib>Tang, Suhua</creatorcontrib><creatorcontrib>Aizawa, Kiyoharu</creatorcontrib><creatorcontrib>Aizawa, Akiko</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Yi</au><au>Tang, Suhua</au><au>Aizawa, Kiyoharu</au><au>Aizawa, Akiko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2019-04-01</date><risdate>2019</risdate><volume>30</volume><issue>4</issue><spage>1250</spage><epage>1258</epage><pages>1250-1258</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30106743</pmid><doi>10.1109/TNNLS.2018.2856253</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-2146-6275</orcidid><orcidid>https://orcid.org/0000-0002-0294-6620</orcidid><orcidid>https://orcid.org/0000-0002-5784-8411</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2162-237X |
ispartof | IEEE transaction on neural networks and learning systems, 2019-04, Vol.30 (4), p.1250-1258 |
issn | 2162-237X 2162-2388 |
language | eng |
recordid | cdi_ieee_primary_8432497 |
source | IEEE Electronic Library (IEL) |
subjects | Business Category-based deep canonical correlation analysis (C-DCCA) Correlation Correlation analysis Cross-modal cross-modal retrieval Data models Datasets Feasibility studies Feature extraction fine-grained venue discovery Internet Machine learning multimodal data Pairwise error probability Searching Sensory integration Venue Visualization |
title | Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T08%3A21%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Category-Based%20Deep%20CCA%20for%20Fine-Grained%20Venue%20Discovery%20From%20Multimodal%20Data&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Yu,%20Yi&rft.date=2019-04-01&rft.volume=30&rft.issue=4&rft.spage=1250&rft.epage=1258&rft.pages=1250-1258&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2018.2856253&rft_dat=%3Cproquest_RIE%3E2194171803%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2194171803&rft_id=info:pmid/30106743&rft_ieee_id=8432497&rfr_iscdi=true |