A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Social network analysis and mining 2020-12, Vol.10 (1), p.43, Article 43
Hauptverfasser:	Verma, Vijay, Aggarwal, Rajesh Kumar
Format:	Artikel
Sprache:	eng
Schlagworte:	Affinity Algorithms Applications of Graph Theory and Complex Networks Binary data Coefficients Collaboration Comparative analysis Computer Science Data mining Data Mining and Knowledge Discovery Datasets Economics Empirical analysis Game Theory Humanities Information overload Law Linear algebra Machine learning Methodology of the Social Sciences Music Neighborhoods Original Article Ratings & rankings Recommender systems Similarity Similarity measures Social and Behav. Sciences Statistics for Social Sciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page	43
container_title	Social network analysis and mining
container_volume	10
creator	Verma, Vijay Aggarwal, Rajesh Kumar
description	Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.
doi_str_mv	10.1007/s13278-020-00660-9
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920667773</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920667773</sourcerecordid><originalsourceid>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRSMEEhX0B1hZYh0Yx4lTs6sqnqrEBtbWYE_BJYmDnSL6Ffwy7kOwY2N75HuP5ZNlZxwuOEB9Gbko6kkOBeQAUkKuDrIRn0iVV6VUh7_nCo6zcYxLAOAghAI5yr6nzPi2x4CD-ySGHTbr6CLzCxZd6xoMblizljCuAkWG765jg2fDG7EHNAaDZa6z9JXWBGoafPF7VKAEbqmzafRdvGLU9i44g016xW4IPtCwnXsKsSezqZ1mRwtsIo33-0n2fHP9NLvL54-397PpPDdlKYa84DWXVhalAm5RlASiMrXisuJISk7swqarQkmooFTWwkQYUFBXkiMqjuIkO99x--A_VhQHvfSrkH4fdaGKZLGua5FSxS5lgo8x0EL3wbUY1pqD3rjXO_c6uddb91qlktiVYgp3rxT-0P-0fgCXMoil</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920667773</pqid></control><display><type>article</type><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><source>Springer Nature - Complete Springer Journals</source><source>ProQuest Central</source><creator>Verma, Vijay ; Aggarwal, Rajesh Kumar</creator><creatorcontrib>Verma, Vijay ; Aggarwal, Rajesh Kumar</creatorcontrib><description>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-020-00660-9</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Affinity ; Algorithms ; Applications of Graph Theory and Complex Networks ; Binary data ; Coefficients ; Collaboration ; Comparative analysis ; Computer Science ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Economics ; Empirical analysis ; Game Theory ; Humanities ; Information overload ; Law ; Linear algebra ; Machine learning ; Methodology of the Social Sciences ; Music ; Neighborhoods ; Original Article ; Ratings & rankings ; Recommender systems ; Similarity ; Similarity measures ; Social and Behav. Sciences ; Statistics for Social Sciences</subject><ispartof>Social network analysis and mining, 2020-12, Vol.10 (1), p.43, Article 43</ispartof><rights>Springer-Verlag GmbH Austria, part of Springer Nature 2020</rights><rights>Springer-Verlag GmbH Austria, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</citedby><cites>FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</cites><orcidid>0000-0002-1186-3974</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13278-020-00660-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2920667773?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Verma, Vijay</creatorcontrib><creatorcontrib>Aggarwal, Rajesh Kumar</creatorcontrib><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</description><subject>Affinity</subject><subject>Algorithms</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Binary data</subject><subject>Coefficients</subject><subject>Collaboration</subject><subject>Comparative analysis</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Datasets</subject><subject>Economics</subject><subject>Empirical analysis</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Information overload</subject><subject>Law</subject><subject>Linear algebra</subject><subject>Machine learning</subject><subject>Methodology of the Social Sciences</subject><subject>Music</subject><subject>Neighborhoods</subject><subject>Original Article</subject><subject>Ratings & rankings</subject><subject>Recommender systems</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Social and Behav. Sciences</subject><subject>Statistics for Social Sciences</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kMtOwzAQRSMEEhX0B1hZYh0Yx4lTs6sqnqrEBtbWYE_BJYmDnSL6Ffwy7kOwY2N75HuP5ZNlZxwuOEB9Gbko6kkOBeQAUkKuDrIRn0iVV6VUh7_nCo6zcYxLAOAghAI5yr6nzPi2x4CD-ySGHTbr6CLzCxZd6xoMblizljCuAkWG765jg2fDG7EHNAaDZa6z9JXWBGoafPF7VKAEbqmzafRdvGLU9i44g016xW4IPtCwnXsKsSezqZ1mRwtsIo33-0n2fHP9NLvL54-397PpPDdlKYa84DWXVhalAm5RlASiMrXisuJISk7swqarQkmooFTWwkQYUFBXkiMqjuIkO99x--A_VhQHvfSrkH4fdaGKZLGua5FSxS5lgo8x0EL3wbUY1pqD3rjXO_c6uddb91qlktiVYgp3rxT-0P-0fgCXMoil</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Verma, Vijay</creator><creator>Aggarwal, Rajesh Kumar</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-1186-3974</orcidid></search><sort><creationdate>20201201</creationdate><title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</title><author>Verma, Vijay ; Aggarwal, Rajesh Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c443t-21716d624901da34e035c791651ae968dfd49029605049dd083c0907561aa91a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Affinity</topic><topic>Algorithms</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Binary data</topic><topic>Coefficients</topic><topic>Collaboration</topic><topic>Comparative analysis</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Datasets</topic><topic>Economics</topic><topic>Empirical analysis</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Information overload</topic><topic>Law</topic><topic>Linear algebra</topic><topic>Machine learning</topic><topic>Methodology of the Social Sciences</topic><topic>Music</topic><topic>Neighborhoods</topic><topic>Original Article</topic><topic>Ratings & rankings</topic><topic>Recommender systems</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Social and Behav. Sciences</topic><topic>Statistics for Social Sciences</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Verma, Vijay</creatorcontrib><creatorcontrib>Aggarwal, Rajesh Kumar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Verma, Vijay</au><au>Aggarwal, Rajesh Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>10</volume><issue>1</issue><spage>43</spage><pages>43-</pages><artnum>43</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-020-00660-9</doi><orcidid>https://orcid.org/0000-0002-1186-3974</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1869-5450
ispartof	Social network analysis and mining, 2020-12, Vol.10 (1), p.43, Article 43
issn	1869-5450 1869-5469
language	eng
recordid	cdi_proquest_journals_2920667773
source	Springer Nature - Complete Springer Journals; ProQuest Central
subjects	Affinity Algorithms Applications of Graph Theory and Complex Networks Binary data Coefficients Collaboration Comparative analysis Computer Science Data mining Data Mining and Knowledge Discovery Datasets Economics Empirical analysis Game Theory Humanities Information overload Law Linear algebra Machine learning Methodology of the Social Sciences Music Neighborhoods Original Article Ratings & rankings Recommender systems Similarity Similarity measures Social and Behav. Sciences Statistics for Social Sciences
title	A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T11%3A50%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20comparative%20analysis%20of%20similarity%20measures%20akin%20to%20the%20Jaccard%20index%20in%20collaborative%20recommendations:%20empirical%20and%20theoretical%20perspective&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Verma,%20Vijay&rft.date=2020-12-01&rft.volume=10&rft.issue=1&rft.spage=43&rft.pages=43-&rft.artnum=43&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-020-00660-9&rft_dat=%3Cproquest_cross%3E2920667773%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920667773&rft_id=info:pmid/&rfr_iscdi=true