AnnoFin–A hybrid algorithm to annotate financial text
•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by l...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2017-12, Vol.88, p.270-275 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 275 |
---|---|
container_issue | |
container_start_page | 270 |
container_title | Expert systems with applications |
container_volume | 88 |
creator | Swarup Das, Ananda Mehta, Sameep Subramaniam, L.V. |
description | •AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%.
In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company. |
doi_str_mv | 10.1016/j.eswa.2017.07.016 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1956485974</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417417304852</els_id><sourcerecordid>1956485974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</originalsourceid><addsrcrecordid>eNp9kM1KAzEUhYMoWKsv4GrA9Yz5m_yAm1KsCgU3ug6ZJGMztJOapP7sfAff0Ccxpa6FA3dxv3Pv4QBwiWCDIGLXQ-PSu24wRLyBRYgdgQkSnNSMS3IMJlC2vKaI01NwltIACwghnwA-G8ew8OPP1_esWn120dtKr19C9Hm1qXKodNlnnV3V-1GPxut1ld1HPgcnvV4nd_E3p-B5cfs0v6-Xj3cP89myNgSLXAvOXNcR21trHNeWGNRiC5lk0gphRYcRk5QLKg00kPS459xKLgVl1glsyRRcHe5uY3jduZTVEHZxLC8Vki2jopWcFgofKBNDStH1ahv9RsdPhaDaF6QGtS9I7QtSsAixYro5mFzJ_-ZdVMl4NxpnfXQmKxv8f_Zf9gxvCQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1956485974</pqid></control><display><type>article</type><title>AnnoFin–A hybrid algorithm to annotate financial text</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</creator><creatorcontrib>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</creatorcontrib><description>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%.
In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2017.07.016</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Accounting ; Algorithms ; Clustering ; Early warning systems ; Euclidean geometry ; Financial sentences ; Hyperplanes ; Label propagation algorithm ; Machine learning ; Sentences ; SVM ; Text classification ; Texts ; Training</subject><ispartof>Expert systems with applications, 2017-12, Vol.88, p.270-275</ispartof><rights>2017 Elsevier Ltd</rights><rights>Copyright Elsevier BV Dec 1, 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</citedby><cites>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2017.07.016$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Swarup Das, Ananda</creatorcontrib><creatorcontrib>Mehta, Sameep</creatorcontrib><creatorcontrib>Subramaniam, L.V.</creatorcontrib><title>AnnoFin–A hybrid algorithm to annotate financial text</title><title>Expert systems with applications</title><description>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%.
In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</description><subject>Accounting</subject><subject>Algorithms</subject><subject>Clustering</subject><subject>Early warning systems</subject><subject>Euclidean geometry</subject><subject>Financial sentences</subject><subject>Hyperplanes</subject><subject>Label propagation algorithm</subject><subject>Machine learning</subject><subject>Sentences</subject><subject>SVM</subject><subject>Text classification</subject><subject>Texts</subject><subject>Training</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp9kM1KAzEUhYMoWKsv4GrA9Yz5m_yAm1KsCgU3ug6ZJGMztJOapP7sfAff0Ccxpa6FA3dxv3Pv4QBwiWCDIGLXQ-PSu24wRLyBRYgdgQkSnNSMS3IMJlC2vKaI01NwltIACwghnwA-G8ew8OPP1_esWn120dtKr19C9Hm1qXKodNlnnV3V-1GPxut1ld1HPgcnvV4nd_E3p-B5cfs0v6-Xj3cP89myNgSLXAvOXNcR21trHNeWGNRiC5lk0gphRYcRk5QLKg00kPS459xKLgVl1glsyRRcHe5uY3jduZTVEHZxLC8Vki2jopWcFgofKBNDStH1ahv9RsdPhaDaF6QGtS9I7QtSsAixYro5mFzJ_-ZdVMl4NxpnfXQmKxv8f_Zf9gxvCQ</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Swarup Das, Ananda</creator><creator>Mehta, Sameep</creator><creator>Subramaniam, L.V.</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20171201</creationdate><title>AnnoFin–A hybrid algorithm to annotate financial text</title><author>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accounting</topic><topic>Algorithms</topic><topic>Clustering</topic><topic>Early warning systems</topic><topic>Euclidean geometry</topic><topic>Financial sentences</topic><topic>Hyperplanes</topic><topic>Label propagation algorithm</topic><topic>Machine learning</topic><topic>Sentences</topic><topic>SVM</topic><topic>Text classification</topic><topic>Texts</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Swarup Das, Ananda</creatorcontrib><creatorcontrib>Mehta, Sameep</creatorcontrib><creatorcontrib>Subramaniam, L.V.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Swarup Das, Ananda</au><au>Mehta, Sameep</au><au>Subramaniam, L.V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AnnoFin–A hybrid algorithm to annotate financial text</atitle><jtitle>Expert systems with applications</jtitle><date>2017-12-01</date><risdate>2017</risdate><volume>88</volume><spage>270</spage><epage>275</epage><pages>270-275</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%.
In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2017.07.016</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2017-12, Vol.88, p.270-275 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_journals_1956485974 |
source | ScienceDirect Journals (5 years ago - present) |
subjects | Accounting Algorithms Clustering Early warning systems Euclidean geometry Financial sentences Hyperplanes Label propagation algorithm Machine learning Sentences SVM Text classification Texts Training |
title | AnnoFin–A hybrid algorithm to annotate financial text |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T06%3A11%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AnnoFin%E2%80%93A%20hybrid%20algorithm%20to%20annotate%20financial%20text&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Swarup%20Das,%20Ananda&rft.date=2017-12-01&rft.volume=88&rft.spage=270&rft.epage=275&rft.pages=270-275&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2017.07.016&rft_dat=%3Cproquest_cross%3E1956485974%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1956485974&rft_id=info:pmid/&rft_els_id=S0957417417304852&rfr_iscdi=true |