A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective
Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a...
Gespeichert in:
Veröffentlicht in: | Social network analysis and mining 2020-12, Vol.10 (1), p.43, Article 43 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 43 |
container_title | Social network analysis and mining |
container_volume | 10 |
creator | Verma, Vijay Aggarwal, Rajesh Kumar |
description | Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets. |
doi_str_mv | 10.1007/s13278-020-00660-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920667773</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920667773</sourcerecordid><originalsourceid>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRSMEEhX0B1hZYh0Yx4lTs6sqnqrEBtbWYE_BJYmDnSL6Ffwy7kOwY2N75HuP5ZNlZxwuOEB9Gbko6kkOBeQAUkKuDrIRn0iVV6VUh7_nCo6zcYxLAOAghAI5yr6nzPi2x4CD-ySGHTbr6CLzCxZd6xoMblizljCuAkWG765jg2fDG7EHNAaDZa6z9JXWBGoafPF7VKAEbqmzafRdvGLU9i44g016xW4IPtCwnXsKsSezqZ1mRwtsIo33-0n2fHP9NLvL54-397PpPDdlKYa84DWXVhalAm5RlASiMrXisuJISk7swqarQkmooFTWwkQYUFBXkiMqjuIkO99x--A_VhQHvfSrkH4fdaGKZLGua5FSxS5lgo8x0EL3wbUY1pqD3rjXO_c6uddb91qlktiVYgp3rxT-0P-0fgCXMoil</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920667773</pqid></control><display><type>article</type><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><source>Springer Nature - Complete Springer Journals</source><source>ProQuest Central</source><creator>Verma, Vijay ; Aggarwal, Rajesh Kumar</creator><creatorcontrib>Verma, Vijay ; Aggarwal, Rajesh Kumar</creatorcontrib><description>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-020-00660-9</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Affinity ; Algorithms ; Applications of Graph Theory and Complex Networks ; Binary data ; Coefficients ; Collaboration ; Comparative analysis ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Economics ; Empirical analysis ; Game Theory ; Humanities ; Information overload ; Law ; Linear algebra ; Machine learning ; Methodology of the Social Sciences ; Music ; Neighborhoods ; Original Article ; Ratings & rankings ; Recommender systems ; Similarity ; Similarity measures ; Social and Behav. Sciences ; Statistics for Social Sciences</subject><ispartof>Social network analysis and mining, 2020-12, Vol.10 (1), p.43, Article 43</ispartof><rights>Springer-Verlag GmbH Austria, part of Springer Nature 2020</rights><rights>Springer-Verlag GmbH Austria, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</citedby><cites>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</cites><orcidid>0000-0002-1186-3974</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13278-020-00660-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2920667773?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Verma, Vijay</creatorcontrib><creatorcontrib>Aggarwal, Rajesh Kumar</creatorcontrib><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</description><subject>Affinity</subject><subject>Algorithms</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Binary data</subject><subject>Coefficients</subject><subject>Collaboration</subject><subject>Comparative analysis</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Economics</subject><subject>Empirical analysis</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Information overload</subject><subject>Law</subject><subject>Linear algebra</subject><subject>Machine learning</subject><subject>Methodology of the Social Sciences</subject><subject>Music</subject><subject>Neighborhoods</subject><subject>Original Article</subject><subject>Ratings & rankings</subject><subject>Recommender systems</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Social and Behav. Sciences</subject><subject>Statistics for Social Sciences</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kMtOwzAQRSMEEhX0B1hZYh0Yx4lTs6sqnqrEBtbWYE_BJYmDnSL6Ffwy7kOwY2N75HuP5ZNlZxwuOEB9Gbko6kkOBeQAUkKuDrIRn0iVV6VUh7_nCo6zcYxLAOAghAI5yr6nzPi2x4CD-ySGHTbr6CLzCxZd6xoMblizljCuAkWG765jg2fDG7EHNAaDZa6z9JXWBGoafPF7VKAEbqmzafRdvGLU9i44g016xW4IPtCwnXsKsSezqZ1mRwtsIo33-0n2fHP9NLvL54-397PpPDdlKYa84DWXVhalAm5RlASiMrXisuJISk7swqarQkmooFTWwkQYUFBXkiMqjuIkO99x--A_VhQHvfSrkH4fdaGKZLGua5FSxS5lgo8x0EL3wbUY1pqD3rjXO_c6uddb91qlktiVYgp3rxT-0P-0fgCXMoil</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Verma, Vijay</creator><creator>Aggarwal, Rajesh Kumar</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-1186-3974</orcidid></search><sort><creationdate>20201201</creationdate><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><author>Verma, Vijay ; Aggarwal, Rajesh Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Affinity</topic><topic>Algorithms</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Binary data</topic><topic>Coefficients</topic><topic>Collaboration</topic><topic>Comparative analysis</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Economics</topic><topic>Empirical analysis</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Information overload</topic><topic>Law</topic><topic>Linear algebra</topic><topic>Machine learning</topic><topic>Methodology of the Social Sciences</topic><topic>Music</topic><topic>Neighborhoods</topic><topic>Original Article</topic><topic>Ratings & rankings</topic><topic>Recommender systems</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Social and Behav. Sciences</topic><topic>Statistics for Social Sciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Verma, Vijay</creatorcontrib><creatorcontrib>Aggarwal, Rajesh Kumar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Verma, Vijay</au><au>Aggarwal, Rajesh Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>10</volume><issue>1</issue><spage>43</spage><pages>43-</pages><artnum>43</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-020-00660-9</doi><orcidid>https://orcid.org/0000-0002-1186-3974</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1869-5450 |
ispartof | Social network analysis and mining, 2020-12, Vol.10 (1), p.43, Article 43 |
issn | 1869-5450 1869-5469 |
language | eng |
recordid | cdi_proquest_journals_2920667773 |
source | Springer Nature - Complete Springer Journals; ProQuest Central |
subjects | Affinity Algorithms Applications of Graph Theory and Complex Networks Binary data Coefficients Collaboration Comparative analysis Computer Science Data mining Data Mining and Knowledge Discovery Datasets Economics Empirical analysis Game Theory Humanities Information overload Law Linear algebra Machine learning Methodology of the Social Sciences Music Neighborhoods Original Article Ratings & rankings Recommender systems Similarity Similarity measures Social and Behav. Sciences Statistics for Social Sciences |
title | A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T11%3A50%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20comparative%20analysis%20of%20similarity%20measures%20akin%20to%20the%20Jaccard%20index%20in%20collaborative%20recommendations:%20empirical%20and%20theoretical%20perspective&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Verma,%20Vijay&rft.date=2020-12-01&rft.volume=10&rft.issue=1&rft.spage=43&rft.pages=43-&rft.artnum=43&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-020-00660-9&rft_dat=%3Cproquest_cross%3E2920667773%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920667773&rft_id=info:pmid/&rfr_iscdi=true |