What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe
Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a...
Gespeichert in:
Veröffentlicht in: | Journal on computing and cultural heritage 2024-03, Vol.17 (2), p.1-23, Article 25 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 23 |
---|---|
container_issue | 2 |
container_start_page | 1 |
container_title | Journal on computing and cultural heritage |
container_volume | 17 |
creator | Musso, Marta Arnold, Kerstin Nanni, Federico Cannelli, Beatrice |
description | Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities. |
doi_str_mv | 10.1145/3494572 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3494572</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3494572</sourcerecordid><originalsourceid>FETCH-LOGICAL-a206t-52aba83552c1707df7336790af9f08c9d615de29b3141b08c675a8536ddbb13e3</originalsourceid><addsrcrecordid>eNo9kM1LAzEQxYMoWKt495STnlbzsUk2Jylr1YWCIhVvLtlsYiP7RZIW_O_d2tbTzLz5zWN4AFxidItxyu5oKlMmyBGYYMZ4kgqMjw89F_QUnIXwjRAnEqEJ-PxYqQiLAF0HFbyHue9DSBrXfa1VA5f94DR8MNHo6PoOXsOis71v1d_0ZqJ3ZjNy4_HM65XbmABfex9Hab72_WDOwYlVTTAX-zoF74_zZf6cLF6einy2SBRBPCaMqEpllDGisUCitoJSLiRSVlqUaVlzzGpDZEVxiqtR4YKpjFFe11WFqaFTcLPz1dv_vbHl4F2r_E-JUbmNpdzHMpJXO1Lp9h86LH8Bg1RbmQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe</title><source>Access via ACM Digital Library</source><creator>Musso, Marta ; Arnold, Kerstin ; Nanni, Federico ; Cannelli, Beatrice</creator><creatorcontrib>Musso, Marta ; Arnold, Kerstin ; Nanni, Federico ; Cannelli, Beatrice</creatorcontrib><description>Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.</description><identifier>ISSN: 1556-4673</identifier><identifier>EISSN: 1556-4711</identifier><identifier>DOI: 10.1145/3494572</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computing methodologies ; Information extraction ; Language resources ; Lexical semantics ; Natural language processing ; Topic modeling</subject><ispartof>Journal on computing and cultural heritage, 2024-03, Vol.17 (2), p.1-23, Article 25</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a206t-52aba83552c1707df7336790af9f08c9d615de29b3141b08c675a8536ddbb13e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3494572$$EPDF$$P50$$Gacm$$Hfree_for_read</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Musso, Marta</creatorcontrib><creatorcontrib>Arnold, Kerstin</creatorcontrib><creatorcontrib>Nanni, Federico</creatorcontrib><creatorcontrib>Cannelli, Beatrice</creatorcontrib><title>What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe</title><title>Journal on computing and cultural heritage</title><addtitle>ACM JOCCH</addtitle><description>Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.</description><subject>Computing methodologies</subject><subject>Information extraction</subject><subject>Language resources</subject><subject>Lexical semantics</subject><subject>Natural language processing</subject><subject>Topic modeling</subject><issn>1556-4673</issn><issn>1556-4711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kM1LAzEQxYMoWKt495STnlbzsUk2Jylr1YWCIhVvLtlsYiP7RZIW_O_d2tbTzLz5zWN4AFxidItxyu5oKlMmyBGYYMZ4kgqMjw89F_QUnIXwjRAnEqEJ-PxYqQiLAF0HFbyHue9DSBrXfa1VA5f94DR8MNHo6PoOXsOis71v1d_0ZqJ3ZjNy4_HM65XbmABfex9Hab72_WDOwYlVTTAX-zoF74_zZf6cLF6einy2SBRBPCaMqEpllDGisUCitoJSLiRSVlqUaVlzzGpDZEVxiqtR4YKpjFFe11WFqaFTcLPz1dv_vbHl4F2r_E-JUbmNpdzHMpJXO1Lp9h86LH8Bg1RbmQ</recordid><startdate>20240326</startdate><enddate>20240326</enddate><creator>Musso, Marta</creator><creator>Arnold, Kerstin</creator><creator>Nanni, Federico</creator><creator>Cannelli, Beatrice</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240326</creationdate><title>What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe</title><author>Musso, Marta ; Arnold, Kerstin ; Nanni, Federico ; Cannelli, Beatrice</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a206t-52aba83552c1707df7336790af9f08c9d615de29b3141b08c675a8536ddbb13e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies</topic><topic>Information extraction</topic><topic>Language resources</topic><topic>Lexical semantics</topic><topic>Natural language processing</topic><topic>Topic modeling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Musso, Marta</creatorcontrib><creatorcontrib>Arnold, Kerstin</creatorcontrib><creatorcontrib>Nanni, Federico</creatorcontrib><creatorcontrib>Cannelli, Beatrice</creatorcontrib><collection>CrossRef</collection><jtitle>Journal on computing and cultural heritage</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Musso, Marta</au><au>Arnold, Kerstin</au><au>Nanni, Federico</au><au>Cannelli, Beatrice</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe</atitle><jtitle>Journal on computing and cultural heritage</jtitle><stitle>ACM JOCCH</stitle><date>2024-03-26</date><risdate>2024</risdate><volume>17</volume><issue>2</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><artnum>25</artnum><issn>1556-4673</issn><eissn>1556-4711</eissn><abstract>Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3494572</doi><tpages>23</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1556-4673 |
ispartof | Journal on computing and cultural heritage, 2024-03, Vol.17 (2), p.1-23, Article 25 |
issn | 1556-4673 1556-4711 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3494572 |
source | Access via ACM Digital Library |
subjects | Computing methodologies Information extraction Language resources Lexical semantics Natural language processing Topic modeling |
title | What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T05%3A24%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=What%20Is%20in%20a%20?%20Cross-lingual%20Topic%20Detection%20&%20Information%20Retrieval%20in%20Archives%20Portal%20Europe&rft.jtitle=Journal%20on%20computing%20and%20cultural%20heritage&rft.au=Musso,%20Marta&rft.date=2024-03-26&rft.volume=17&rft.issue=2&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.artnum=25&rft.issn=1556-4673&rft.eissn=1556-4711&rft_id=info:doi/10.1145/3494572&rft_dat=%3Cacm_cross%3E3494572%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |