AnnoFin–A hybrid algorithm to annotate financial text

•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2017-12, Vol.88, p.270-275
Hauptverfasser:	Swarup Das, Ananda, Mehta, Sameep, Subramaniam, L.V.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accounting Algorithms Clustering Early warning systems Euclidean geometry Financial sentences Hyperplanes Label propagation algorithm Machine learning Sentences SVM Text classification Texts Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	275
container_issue
container_start_page	270
container_title	Expert systems with applications
container_volume	88
creator	Swarup Das, Ananda Mehta, Sameep Subramaniam, L.V.
description	•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.
doi_str_mv	10.1016/j.eswa.2017.07.016
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1956485974</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417417304852</els_id><sourcerecordid>1956485974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</originalsourceid><addsrcrecordid>eNp9kM1KAzEUhYMoWKsv4GrA9Yz5m_yAm1KsCgU3ug6ZJGMztJOapP7sfAff0Ccxpa6FA3dxv3Pv4QBwiWCDIGLXQ-PSu24wRLyBRYgdgQkSnNSMS3IMJlC2vKaI01NwltIACwghnwA-G8ew8OPP1_esWn120dtKr19C9Hm1qXKodNlnnV3V-1GPxut1ld1HPgcnvV4nd_E3p-B5cfs0v6-Xj3cP89myNgSLXAvOXNcR21trHNeWGNRiC5lk0gphRYcRk5QLKg00kPS459xKLgVl1glsyRRcHe5uY3jduZTVEHZxLC8Vki2jopWcFgofKBNDStH1ahv9RsdPhaDaF6QGtS9I7QtSsAixYro5mFzJ_-ZdVMl4NxpnfXQmKxv8f_Zf9gxvCQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1956485974</pqid></control><display><type>article</type><title>AnnoFin–A hybrid algorithm to annotate financial text</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</creator><creatorcontrib>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</creatorcontrib><description>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2017.07.016</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Accounting ; Algorithms ; Clustering ; Early warning systems ; Euclidean geometry ; Financial sentences ; Hyperplanes ; Label propagation algorithm ; Machine learning ; Sentences ; SVM ; Text classification ; Texts ; Training</subject><ispartof>Expert systems with applications, 2017-12, Vol.88, p.270-275</ispartof><rights>2017 Elsevier Ltd</rights><rights>Copyright Elsevier BV Dec 1, 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</citedby><cites>FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2017.07.016$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Swarup Das, Ananda</creatorcontrib><creatorcontrib>Mehta, Sameep</creatorcontrib><creatorcontrib>Subramaniam, L.V.</creatorcontrib><title>AnnoFin–A hybrid algorithm to annotate financial text</title><title>Expert systems with applications</title><description>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</description><subject>Accounting</subject><subject>Algorithms</subject><subject>Clustering</subject><subject>Early warning systems</subject><subject>Euclidean geometry</subject><subject>Financial sentences</subject><subject>Hyperplanes</subject><subject>Label propagation algorithm</subject><subject>Machine learning</subject><subject>Sentences</subject><subject>SVM</subject><subject>Text classification</subject><subject>Texts</subject><subject>Training</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp9kM1KAzEUhYMoWKsv4GrA9Yz5m_yAm1KsCgU3ug6ZJGMztJOapP7sfAff0Ccxpa6FA3dxv3Pv4QBwiWCDIGLXQ-PSu24wRLyBRYgdgQkSnNSMS3IMJlC2vKaI01NwltIACwghnwA-G8ew8OPP1_esWn120dtKr19C9Hm1qXKodNlnnV3V-1GPxut1ld1HPgcnvV4nd_E3p-B5cfs0v6-Xj3cP89myNgSLXAvOXNcR21trHNeWGNRiC5lk0gphRYcRk5QLKg00kPS459xKLgVl1glsyRRcHe5uY3jduZTVEHZxLC8Vki2jopWcFgofKBNDStH1ahv9RsdPhaDaF6QGtS9I7QtSsAixYro5mFzJ_-ZdVMl4NxpnfXQmKxv8f_Zf9gxvCQ</recordid><startdate>20171201</startdate><enddate>20171201</enddate><creator>Swarup Das, Ananda</creator><creator>Mehta, Sameep</creator><creator>Subramaniam, L.V.</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20171201</creationdate><title>AnnoFin–A hybrid algorithm to annotate financial text</title><author>Swarup Das, Ananda ; Mehta, Sameep ; Subramaniam, L.V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-876ebb3dfddce7ad3c152d06969d88d8b216947849c0c03f2f77d979846de82d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Accounting</topic><topic>Algorithms</topic><topic>Clustering</topic><topic>Early warning systems</topic><topic>Euclidean geometry</topic><topic>Financial sentences</topic><topic>Hyperplanes</topic><topic>Label propagation algorithm</topic><topic>Machine learning</topic><topic>Sentences</topic><topic>SVM</topic><topic>Text classification</topic><topic>Texts</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Swarup Das, Ananda</creatorcontrib><creatorcontrib>Mehta, Sameep</creatorcontrib><creatorcontrib>Subramaniam, L.V.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Swarup Das, Ananda</au><au>Mehta, Sameep</au><au>Subramaniam, L.V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AnnoFin–A hybrid algorithm to annotate financial text</atitle><jtitle>Expert systems with applications</jtitle><date>2017-12-01</date><risdate>2017</risdate><volume>88</volume><spage>270</spage><epage>275</epage><pages>270-275</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•AnnoFin helps a user to classify financial text data to ten categories.•AnnoFin, when trained with 30% of data and has an accuracy of 73.56%.•The accuracy increases by 2% if the training data is increased by 10%. In this work, we study the problem of annotating a large volume of Financial text by learning from a small set of human-annotated training data. The training data is prepared by randomly selecting some text sentences from the large corpus of financial text. Conventionally, bootstrapping algorithm is used to annotate large volume of unlabeled data by learning from a small set of annotated data. However, the small set of annotated data have to be carefully chosen as seed data. Thus, our approach is a digress from the conventional approach of bootstrapping as we let the users randomly select the seed data. We show that our proposed algorithm has an accuracy of 73.56% in classifying the financial texts into the different categories (“Accounting”, “Cost”, “Employee”, “Financing”, “Sales”, “Investments”, “Operations”, “Profit”, “Regulations” and “Irrelevant”) even when the training data is just 30% of the total data set. Additionally, the accuracy improves by an approximate average of 2% for an increase of the training data by 10% and the accuracy of our system is 77.91% when the training data is about 50% of the total data set. As a dictionary of hand chosen keywords prepared by domain experts are often used for financial text extraction, we assumed the existence of almost linearly separable hyperplanes between the different classes and therefore, we have used Linear Support Vector Machine along with a modified version of Label Propagation Algorithm which exploits the notion of neighborhood (in Euclidean space) for classification. We believe that our proposed techniques will be of help to Early Warning Systems used in banks where large volumes of unstructured texts need to be processed for better insights about a company.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2017.07.016</doi><tpages>6</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2017-12, Vol.88, p.270-275
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_1956485974
source	ScienceDirect Journals (5 years ago - present)
subjects	Accounting Algorithms Clustering Early warning systems Euclidean geometry Financial sentences Hyperplanes Label propagation algorithm Machine learning Sentences SVM Text classification Texts Training
title	AnnoFin–A hybrid algorithm to annotate financial text
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T06%3A11%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AnnoFin%E2%80%93A%20hybrid%20algorithm%20to%20annotate%20financial%20text&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Swarup%20Das,%20Ananda&rft.date=2017-12-01&rft.volume=88&rft.spage=270&rft.epage=275&rft.pages=270-275&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2017.07.016&rft_dat=%3Cproquest_cross%3E1956485974%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1956485974&rft_id=info:pmid/&rft_els_id=S0957417417304852&rfr_iscdi=true