Improving the visibility of library resources via mapping library subject headings to Wikipedia articles

Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this pa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Library hi tech 2018-02, Vol.36 (1), p.57-74
Hauptverfasser: Joorabchi, Arash, Mahdi, Abdulhussain E
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 74
container_issue 1
container_start_page 57
container_title Library hi tech
container_volume 36
creator Joorabchi, Arash
Mahdi, Abdulhussain E
description Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value To the best of the authors’ kno
doi_str_mv 10.1108/LHT-04-2017-0066
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2533362772</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2533362772</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-18be936932cd51f3178bcbc06c0029d6c7d25d8787fcdc0f50858743c082ca763</originalsourceid><addsrcrecordid>eNptUcFKAzEQDaJgrd49BjzHTpLdJD1KUVsoeKnoLWSTrE3ddtdkW-jfm6V6EDwNzHtvZt4bhG4p3FMKarKcrwgUhAGVBECIMzRiUBaECvF-jkYguSRKcXqJrlLaAEDJJBuh9WLbxfYQdh-4X3t8CClUoQn9Ebc1bkIVTTzi6FO7j9anjBu8NV038H_RtK823vZ47Y3L_YT7Fr-Fz9B5l9km9sE2Pl2ji9o0yd_81DF6fXpczeZk-fK8mD0sieWU9oSqyk-5mHJmXUlrTqWqbGVBWAA2dcJKx0qnpJK1dRbqElSpZMEtKGaNFHyM7k5zs62vvU-93uTbd3mlZiXnXDApWWbBiWVjm1L0te5i2GY3moIe8tQ5Tw2FHvLUQ55ZMjlJ_NZH07j_FH8-wL8BrKp3zg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2533362772</pqid></control><display><type>article</type><title>Improving the visibility of library resources via mapping library subject headings to Wikipedia articles</title><source>Emerald Journals</source><creator>Joorabchi, Arash ; Mahdi, Abdulhussain E</creator><creatorcontrib>Joorabchi, Arash ; Mahdi, Abdulhussain E</creatorcontrib><description>Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.</description><identifier>ISSN: 0737-8831</identifier><identifier>EISSN: 2054-166X</identifier><identifier>DOI: 10.1108/LHT-04-2017-0066</identifier><language>eng</language><publisher>Bradford: Emerald Publishing Limited</publisher><subject>Algorithms ; Automatic control ; Classification ; Classifiers ; Datasets ; Digital libraries ; Encyclopedias ; Indexing ; Information literacy ; Information retrieval ; Information seeking behavior ; Libraries ; Library and information science ; Library catalogs ; Library collections ; Library materials ; Library resources ; Machine learning ; Mapping ; Metadata ; Performance evaluation ; Queries ; Query expansion ; Statistical analysis ; Subject heading schemes ; Subject indexing ; Visibility ; Websites</subject><ispartof>Library hi tech, 2018-02, Vol.36 (1), p.57-74</ispartof><rights>Emerald Publishing Limited</rights><rights>Emerald Publishing Limited 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c311t-18be936932cd51f3178bcbc06c0029d6c7d25d8787fcdc0f50858743c082ca763</citedby><cites>FETCH-LOGICAL-c311t-18be936932cd51f3178bcbc06c0029d6c7d25d8787fcdc0f50858743c082ca763</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/LHT-04-2017-0066/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>315,781,785,968,11639,27928,27929,52693</link.rule.ids></links><search><creatorcontrib>Joorabchi, Arash</creatorcontrib><creatorcontrib>Mahdi, Abdulhussain E</creatorcontrib><title>Improving the visibility of library resources via mapping library subject headings to Wikipedia articles</title><title>Library hi tech</title><description>Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.</description><subject>Algorithms</subject><subject>Automatic control</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Datasets</subject><subject>Digital libraries</subject><subject>Encyclopedias</subject><subject>Indexing</subject><subject>Information literacy</subject><subject>Information retrieval</subject><subject>Information seeking behavior</subject><subject>Libraries</subject><subject>Library and information science</subject><subject>Library catalogs</subject><subject>Library collections</subject><subject>Library materials</subject><subject>Library resources</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Metadata</subject><subject>Performance evaluation</subject><subject>Queries</subject><subject>Query expansion</subject><subject>Statistical analysis</subject><subject>Subject heading schemes</subject><subject>Subject indexing</subject><subject>Visibility</subject><subject>Websites</subject><issn>0737-8831</issn><issn>2054-166X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNptUcFKAzEQDaJgrd49BjzHTpLdJD1KUVsoeKnoLWSTrE3ddtdkW-jfm6V6EDwNzHtvZt4bhG4p3FMKarKcrwgUhAGVBECIMzRiUBaECvF-jkYguSRKcXqJrlLaAEDJJBuh9WLbxfYQdh-4X3t8CClUoQn9Ebc1bkIVTTzi6FO7j9anjBu8NV038H_RtK823vZ47Y3L_YT7Fr-Fz9B5l9km9sE2Pl2ji9o0yd_81DF6fXpczeZk-fK8mD0sieWU9oSqyk-5mHJmXUlrTqWqbGVBWAA2dcJKx0qnpJK1dRbqElSpZMEtKGaNFHyM7k5zs62vvU-93uTbd3mlZiXnXDApWWbBiWVjm1L0te5i2GY3moIe8tQ5Tw2FHvLUQ55ZMjlJ_NZH07j_FH8-wL8BrKp3zg</recordid><startdate>20180207</startdate><enddate>20180207</enddate><creator>Joorabchi, Arash</creator><creator>Mahdi, Abdulhussain E</creator><general>Emerald Publishing Limited</general><general>Emerald Group Publishing Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M1O</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20180207</creationdate><title>Improving the visibility of library resources via mapping library subject headings to Wikipedia articles</title><author>Joorabchi, Arash ; Mahdi, Abdulhussain E</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-18be936932cd51f3178bcbc06c0029d6c7d25d8787fcdc0f50858743c082ca763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Automatic control</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Datasets</topic><topic>Digital libraries</topic><topic>Encyclopedias</topic><topic>Indexing</topic><topic>Information literacy</topic><topic>Information retrieval</topic><topic>Information seeking behavior</topic><topic>Libraries</topic><topic>Library and information science</topic><topic>Library catalogs</topic><topic>Library collections</topic><topic>Library materials</topic><topic>Library resources</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Metadata</topic><topic>Performance evaluation</topic><topic>Queries</topic><topic>Query expansion</topic><topic>Statistical analysis</topic><topic>Subject heading schemes</topic><topic>Subject indexing</topic><topic>Visibility</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Joorabchi, Arash</creatorcontrib><creatorcontrib>Mahdi, Abdulhussain E</creatorcontrib><collection>CrossRef</collection><collection>Global News &amp; ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Library &amp; Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Library Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Library hi tech</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Joorabchi, Arash</au><au>Mahdi, Abdulhussain E</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving the visibility of library resources via mapping library subject headings to Wikipedia articles</atitle><jtitle>Library hi tech</jtitle><date>2018-02-07</date><risdate>2018</risdate><volume>36</volume><issue>1</issue><spage>57</spage><epage>74</epage><pages>57-74</pages><issn>0737-8831</issn><eissn>2054-166X</eissn><abstract>Purpose Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.</abstract><cop>Bradford</cop><pub>Emerald Publishing Limited</pub><doi>10.1108/LHT-04-2017-0066</doi><tpages>18</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0737-8831
ispartof Library hi tech, 2018-02, Vol.36 (1), p.57-74
issn 0737-8831
2054-166X
language eng
recordid cdi_proquest_journals_2533362772
source Emerald Journals
subjects Algorithms
Automatic control
Classification
Classifiers
Datasets
Digital libraries
Encyclopedias
Indexing
Information literacy
Information retrieval
Information seeking behavior
Libraries
Library and information science
Library catalogs
Library collections
Library materials
Library resources
Machine learning
Mapping
Metadata
Performance evaluation
Queries
Query expansion
Statistical analysis
Subject heading schemes
Subject indexing
Visibility
Websites
title Improving the visibility of library resources via mapping library subject headings to Wikipedia articles
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T23%3A57%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20the%20visibility%20of%20library%20resources%20via%20mapping%20library%20subject%20headings%20to%20Wikipedia%20articles&rft.jtitle=Library%20hi%20tech&rft.au=Joorabchi,%20Arash&rft.date=2018-02-07&rft.volume=36&rft.issue=1&rft.spage=57&rft.epage=74&rft.pages=57-74&rft.issn=0737-8831&rft.eissn=2054-166X&rft_id=info:doi/10.1108/LHT-04-2017-0066&rft_dat=%3Cproquest_cross%3E2533362772%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2533362772&rft_id=info:pmid/&rfr_iscdi=true