Financial Knowledge Graph Based Financial Report Query System
Financial Knowledge Graph Based Financial Report Query System
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Corresponding author: Syed Farhan Mohsin, Dr. Shaukat Wasi (e-mail: farhan.mohsin@duhs.edu.pk, shaukat.wasi@jinnah.edu).
ABSTRACT Annual Financial Reports are the core in the Banking Sector to publish its financial statistics.
Extracting useful information from these complex and lengthy reports involves manual process to resolve
the financial queries, resulting in delays and ambiguity in investment decisions. One of the major reasons is
the lack of any standardization in the format and vocabulary used in the reports. An automated system for
resolution of intelligent financial queries is therefore difficult to design.
Several works have been proposed to overcome these problems using Information Extraction; however,
they do not address the semantic interoperability of the reports across different institutions. This work
proposed an automated querying engine to answer the financial queries using Ontology based Information
Extraction. For Semantic modeling of financial reports, a Financial Knowledge Graph, assisted by Financial
Ontology, has been proposed. The nodes are populated with entities, while links are populated with
relationships using Information Extraction applied on annual reports. Two benefits have been provided by
this system to stakeholders through automation: decision making through queries and generation of custom
financial stories. The work can further be extended to other domains including healthcare and academia
where physical reports are used for communication.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
information is easily adjustable in the existing format without symbolic learning models using Tabu Search algorithm and
disturbing the structure of the ontology [47]. Greedy Search algorithm and evaluated their performances
We have employed knowledge graph (KG) for several using financial filings. In another research [13], authors
reasons in this research. Firstly, it may not necessarily have implemented a software agent which extracts fundamental
company data from the Electronic Data Gathering, Analysis
some semantic layer to describe the entity model, as it is
and Retrieval (EDGAR) database of the United States
required in relational databases [4]. Secondly, the schema is Securities and Exchange Commission (SEC) and outputs
flexible and easily adjustable that it is always easier to add this data in a format which is useful to support stock market
new properties to an existing record or modify schema trading decisions. EDGAR is a specialized database which
without affecting other graph entities [5]. Highly variable, stores information as provided by companies in the 10-k
incomplete, or dynamic data can be represented by Property Format or XBRL formats [14]. A two-step approach was
Graph stores that consume less space and supports proposed in [15] to perform rule-based text extraction and
attribute/link discovery. Finally, graph databases can answer acquisition of structured data from unstructured text. In this
queries that span over multiple entities by graph traversal. work, we are working with annual financial disclosures
Only those nodes which are accessible because of the query, which require data extraction from PDF files that includes
tables, graphics, structured and unstructured text [2], [16],
are explored by the graph database engine. Because every
[17] [18].
record is handled individually, it drastically boosts the query
performance and helps in reducing resource cost of the query
B. ONTOLOGY BASED INFORMATION EXTRACTION
results [6] [7].
In Ontology-Based Information Extraction (OBIE),
Our knowledge graph will assist an investor by generating
information extraction process is assisted by Ontologies [19].
financial stories that can aid decision making. Further plan is
Ontology is defined as a formal and explicit specification of a
to generate text based financial stories and further extend it to
shared conceptualization and is usually knowledge domain
visualizations/graphics as proposed by [8], the whole concept
specific [20]. As the task of information extraction also deals
is shown in Fig. 1.
with retrieving information for a particular domain, ontology
is one of the candidate solutions in information extraction
[21]. Researchers have been using ontology-based
mechanisms for extracting required information from
unstructured or semi-structured natural language text [21]
[22]. An Ontology model is developed for mobile payment
data risk control domain in [43]. The model takes the user as
entity and operation/transaction as relationship and gathers
the data on separate timestamp to fulfill the requirements of
the financial risk control domain.
To overcome the problem of the heterogeneous data, an
Ontology related to poverty alleviation domain is constructed
FIGURE 1. Sweet spot of Data, Story and Visual [51]. in [46]. This ontology is further used to create the nodes and
Following are the major contributions of this work: (A) edges of the knowledge graph. The visualization techniques
Integration of Information Extraction with Semantic Web (B) are applied on the knowledge graph which helps in providing
Proposed Financial Knowledge Graph to model the domain the results of the different queries related to the poverty
of Financial Systems (C) Mapping Financial Reports in the alleviation. Bankruptcy Prediction Computational Model
domain using Ontology (D) Extending manual financial (BPCM) is presented in [47], which is used to perform the
reports as machine readable using Information Extraction. bankruptcy predictions of the financial institutes or the
companies. Ontology of the Bankruptcy Prediction (OBP) is
II. RELATED WORK constructed to uniformly extract the data from different data
sources and to utilize the financial data of the companies.
A. INFORMATION EXTRACTION Semantic Analysis Graph Database (SAGRADA) is created
In recent years, most of the Financial Information extraction which consumes the OBP ontology, while the graph database
work has been done on a specific reporting standard known is used for storing and visualization of the data.
as XBRL (eXtensible Business Reporting Language) which
is a freely available and global framework for exchanging C. KNOWLEDGE GRAPH CONSTRUCTION
business information. In 2011, the Securities and Exchange A knowledge graph (KG) is a semantic graph consisting of
Commission (SEC) mandated XBRL as the filing standard vertices (or nodes) and edges. The vertices represent
for all US public companies. A rule-based information concepts or entities. The edges represent the semantic
extraction methodology was introduced in [1] for the relationships between concepts or entities [6]. By exploiting
extraction of highly accurate financial information to aid KG, partially observed entities and concepts can be
investment decisions. They trained two different rule-based
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
connected together to form a complete and structured Reference [43] utilizes the knowledge graph to represent the
knowledge repository [4]. transactions data visually which helps in reducing financial
For a knowledge graph, usually an ontology defines the frauds in the domain of financial risk control. Reference [44]
architecture and constraints for the data residing in it. claims that an Anti-TrustRank algorithm based on the
Ontology assisted Financial knowledge graph (KG) knowledge graph data of the financial institutions can be
population deals with attaching the detected named entity used for Anti Money Laundering purpose also. The
with the correct label/category. When a named entity is algorithm considers the web as a graph, pages and the link
detected from the unstructured text, which has no ontological between the pages, as node and edges, respectively and it
mapping defined, right node category is sought in KG to assists the financial institutions in finding the money
attach the entity; this task is known as fine-grained named launderers and helps to protect the financial institute from
entity classification [23]. Otherwise, if the desired entity money laundering. Reference [45] proposed a financial news
mapping exists in the ontology, the aim of this task is to link recommendation framework, based on NNR and INNR
this detected entity mention with its corresponding real-world models, which uses the knowledge graph for the financial
entity in the knowledge graph (KG), which is known as the news recommendations. The edges of the graph are updated
entity linking task. during the stock market trading through INNR and after the
An extensive comparison survey of well-known graph stock market closing through NNR. Both the models are then
databases was conducted by the authors in [9]. The results combined in the end to attain better, accurate, and efficient
are summarized in Table I. Owing to the powerful features financial news recommendations. The idea of extracting
and flexibility of Neo4j1, we selected it for our knowledge financials news specifically related to the Chinese stock
graph implementation. market from different Chinese encyclopedias and financial
news websites is introduced in [49]. According to the author,
TABLE I the news will set the stock market sentiments. The ontology
GRAPH DATABASE DATA STRUCTURES [9] is created, and the financial knowledge graph is used to
construct the relationship between the entities of the stocks
Graph Type Nodes Edges from the financial news, which is further used to analyze and
identify the impact of the news on different stock prices and
Node labeled
Edge labeled
Hypergraph
attribution
attribution
Attributed
Nested
Node
Edge
D. QUESTION ANSWERING
AllegroGraph Question answering system based on knowledge graph of
DEX Chinese classic poetry is proposed in [48]. The Chinese
Filament poetry related information is extracted from the classical
Chinese poetry website and the knowledge graph is
G-Store
constructed on the basis of this data and stored in the
HyperGraphDB database. The Rasa framework, as a natural language
InfiniteGraph processing, is adapted to answer the queries of the user. A
Neo4j graph data-driven framework is proposed in [50], which
provides the answers of the natural language questions using
Sones
RDF graph repository. In the first step the authors have
vertexDB translated the natural language questions into SPARQL and
then in the next step all the translated SPARQL's are
Knowledge graph identification (KGI) is a technique for evaluated by the system, which provides the answer of the
knowledge graph construction that jointly reasons about question. IBM based researches proposed Question
entities, attributes, and relations in the presence of uncertain Answering system [26] to sequentially perform linguistic
inputs and ontological constraints [24]. Candidate facts from analysis of query, do named entity extraction, entity / graph
an information extraction system can be represented as an search, fusion and ranking of possible answers. Our research
extraction graph; where entities are nodes, categories are is also following the similar approach.
labels associated with each node, and relations are directed
edges between the nodes [25]. In another work [22], the E. GAP ANALYSIS
authors studied financial documents for knowledge graphs Previous subsections identified research background in
population with financial entities and their interrelationships. different related areas, which show that related works have
been performed in parts by communities of Semantic Web
They presented experimental results and discussed
and Information Extraction and Visualization. However, the
knowledge graph (KG) construction techniques on financial existing research literatures have not provided any complete
filings along with its challenges and possible solutions. system that extends flat financial reports to machine
readability for making it query-able. With the complex
1
https://neo4j.com/product/
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
Ontology enrichment
Structured Financial
discovery using KG
Entity Resolution
linking/Relation
extraction using
Construction of
extraction from
Ontology/Rule-
Unstructured /
Q/A using KG
Financial KG
identification
Story telling
Information
/Knowledge
based/LOD
and KG
/Entity
Paper
Year
Text
[12] 2018 H H H H NH H
[42] 2018 H H H H NH NH
[22] 2017 H NH H H NH NH
Ontology for top down+ LOD
[28] 2018 NH H H H H
bottom-up
[24] 2013 H Ontology H H NH NH
[1] 2012 H NH NH NH NH NH
Rule based for ER / Ontology for
[21] 2012 H H NH NH NH
event detection
[37] 2013 NH Ontology H H NH NH
[26] 2016 NH LOD (WikiData) NH H H H
[32] 2018 NH NH NH NH H NH
[3] 2017 NH NH NH NH NH NH
[23] 2012 H Ontology H NH NH NH
[7] 2017 NH NH H H NH H
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
B. DATA SOURCES
FIGURE 2. Sample Competency Questions Primarily, we ingest information from financial filings;
however, further data sources can add value to knowledge
3) ONTOLOGY-BASED INFORMATION EXTRACTION graph (KG) enrichment. We have identified following three
The steps involved in ontology-based information extraction types of potential data sources [52].
are mentioned below:
1) SEMI-STRUCTURED
Study and analyze data set Annual Report.
Define stop words Longer company profiles.
Document preprocessing Imprint information on company web pages.
Extract all the terms from ontology that are known Running tickers on company information.
instances/entities and can be directly mentioned in
the text along with its direct and indirect super 2) STRUCTURED SOURCES
classes. This will serve as a gazetteer list. Publicly available balance sheets in structured
format.
Extract relationship names between two entities
using object property of an ontological concept. Short company profiles (e.g. from Business
Registers, Stock Exchange, web pages, etc.)n
Apply rule-based information extraction techniques
with supporting information found in previous two Wikipedia Infoboxes.
steps. A set of rules were manually crafted and
3) UNSTRUCTURED
implemented to extract each target.
Annexes to balance sheets in annual reports of
4) KNOWLEDGE GRAPH CREATION companies.
This phase is overlapped with the previous phase as the
Newspapers.
information extracted from text and ontology will help in
knowledge graph (KG) creation that is Knowledge graph Specialized web pages etc.
population with appropriate nodes, relationships, and labels.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
C. PROPOSED SYSTEM ARCHITECTURE ontology, and SPARQL query language for retrieving data
from the ontology. We have used OWLAPI to import and
The proposed System architecture, which presents how the manipulate our ontology in Eclipse3 (Java IDE). The
user queries are processed and how the system will generate different steps involved in the ontology engineering process
the results, is shown in Fig. 4. are shown in Fig. 6.
D. POTENTIAL USERS
The proposed system will benefit investors, creditors,
external agencies, regulators and account holders for
decision making as shown in Fig. 5.
3
Eclipse https://www.eclipse.org/
2 4
Protégé https://protege.stanford.edu/ MCB https://www.mcb.com.pk/
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
B. DATA SET Which bank had the highest dividend payout ratio?
For domain modeling, annual reports are collected from
online repository5 published by banks across the globe in Name the bank having highest capital reserve ratio
English Language. The concepts generated from these in YYYY.
reports are modeled as tuples to help in guided information What is Profitability After Tax of XXXX(bank
extraction of meaningful patterns. The ontological dataset is name) in YYYY? What was the long term rating
then applied on two major commercial banks of Pakistan as of ABL in YYYY?
proof of concept towards adaptability.
How many banks have rating “Extremely Strong”
The related financial information is extracted from MCB in a long term?
Bank Limited Annual report 2017, which contains the brief
information about the bank’s financial information, shown in Will UBL default in next year?
Fig. 7. XXX audited which banks this year?
The related financial information is extracted from United
Bank Limited Annual report 2017, which contains the brief How many banks were given Extremely Strong
information about the bank’s financial information, shown in long term rating this year by PACRA?
Fig. 8. Which banks have stable profitability ratios over a
UBL6 posted profit after tax (PAT) amounting to Rs. 25.4 period of time?
billion during the year ended December 31, 2017 compared
to Rs. 27.7 billion in 2016. Earnings per share were Who won Best Local bank award in YYYY?
reported at Rs. 20.77 in 2017 against Rs. 22.65 per share Who gave Best Investment Bank Award in
last year. Profit before tax (PBT) closed at Rs. 40.2 billion YYYY?
in 2017 compared to Rs. 46.0 billion in 2016. The
consolidated PAT stood at Rs. 26.2 billion in 2017 (2016: Which banks’ stock are safer to invest in?
Rs. 28.0 billion) with earnings per share recorded at Rs.
EPS > 0 in last year
21.39 (2016: Rs. 22.70). Gross revenues stood at Rs. 78.6
billion (Dec’16: Rs. 80.7 billion). Despite the low interest Long term rating Extremely Strong or Strong
rate regime, growth in the balance sheet maintained net
markup income in line with the 2016 level to close at Rs. Short term rating High
56.4 billion. Non-markup Income decreased by 6% year on Current year Stock price > Last year stock price
year to reach Rs. 22.2 billion in 2017 but mainly due to
lower capital gains and dividends. The cost to income ratio Which banks stocks are risky but yield is higher
increased from 39.6% in 2016 to 45.0%.
Long term rating NOT(Extremely Strong or
Strong)
FIGURE 8. Financial Information Extracted from United Bank Limited
Annual Report [54] Short term rating NOT High
C. DEFINE DOMAIN AND SCOPE (Current year Stock price - Last year stock price)/
Keeping in view the data available in the Annual report, Last year stock price > some threshold percentage
competency questions from dataset are used to define Which bank Net Interest Income has increased this
ontological concepts/properties and limit system scope. year?
Name the bank having most after- tax profitability Does MCB has Islamic Window?
in year YYYY?
What is the currency of USD bond?
Banks whose Market Price per Share @Year Start
> @YearEnd OR whose stock price has increased Which bonds are foreign government bonds?
this year
What type of dividend did bank XXX gave to its
Which bank pays the highest dividend to his stock holders in YYYY?
stockholders?
Did UBL invested in Manufacturing Sector in
List banks whose stock/share market price YYYY?
increased over last three years?
MCB invested in which Government bonds this
Name the bank having most before-tax profitability year?
in YYYY?
What was UBL total investment in Bonds in
Which bank paid the highest tax amount in year YYYY?
YYYY?
5
https://www.annualreports.com/
6
UBL https://www.ubldigital.com/
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
E. DATA AND OBJECT PROPERTIES FIGURE 10. SPARQL for retrieving Bank names in Ontology
Object Properties will serve as relationships in the KG and
data properties will serve as Entity attributes. The details are A. BASIC PROCESSING
shown in Fig. 9. In this phase, some basic processing will be required to
convert this character stream into a sequence of lexical items
F. OWL API FOR ONTOLOGY IMPORT AND QUERYING (words, phrases, and syntactic markers) to further consume
SPARQL is used for querying Ontology. All the queries information. Each document is passed through sentence
were first validated in Protégé then were used in Java for splitter and tokenizer. Sentence splitter splits the sentence
information extraction from Ontology. Fig. 10 shows the using some delimiter and tokenizer chops input character
SPARQL Query for retrieving valid Bank names which are streams into tokens that can be words, numbers, identifiers
instances of Concept “Bank” and Fig. 11 shows Ontology. or punctuation (depending on the problem).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
Entity
“UBL posted profit after tax amounting Extraction
Dates (Year only) Regular expression
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
TABLE V.
STAKEHOLDER TYPE AND QUESTIONS
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
What is Profitability After Tax of [11] Jesús Barrasa, "RDF Triple Stores vs. Labeled Property Graphs:
14 Creditors/Rating Firms
XXXX(bank name) in YYYY? What’s the Difference?," 2017. [Online]. Available:
Creditors/Rating https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-
What was the long term rating of difference/.
15 Firms/Investor/StockH
ABL in YYYY?
olders [12] M. P. Muñoz, A. Llaves and T. Kume, "Populating the FLE
Creditors/Rating Financial Knowledge Graph.," in International Semantic Web
How many banks have rating Conference (P&D/Industry/BlueSky), 2018.
16 Firms/Investor/StockH
“Extremely Strong” in a long term?
olders [13] C. Leinemann, F. Schlottmann, D. Seese and T. Stuempert,
Creditors/Rating "Automatic extraction and analysis of financial data from the
17 Will UBL default in next year?
Firms/General Public EDGAR database," South African Journal of Information
Creditors/Rating Which banks have stable Management, vol. 3, no. 2, 2001.
18 Firms/Investor/StockH profitability ratios over a period of [14] T. Stümpert, "Extracting Financial Data from SEC Filings for US
olders time? GAAP Accountants," in Handbook on Information Technology in
Which bank Net Interest Income Finance, Springer, 2008, pp. 357-375.
19 Creditors/Borrowers
has increased this year? [15] M. Atzmueller, P. Kluegl and F. Puppe, "Rule-Based Information
20 Depositor Does MCB has Islamic Window? Extraction for Structured Data Acquisition using TextMarker.," in
Which bonds are foreign LWA, 2008.
21 General public [16] M. Göbel, T. Hassan, E. Oro and G. Orsi, "A methodology for
government bonds?
Investor/StockHolders/ Did UBL invested in evaluating algorithms for table understanding in PDF documents," in
22 Proceedings of the 2012 ACM symposium on Document
Creditors Manufacturing Sector in YYYY?
MCB invested in which Govt engineering, 2012.
23 Investor/StockHolders [17] T. Loughran and B. McDonald, "Textual analysis in accounting and
bonds this year?
What was UBL total investment in finance: A survey," Journal of Accounting Research, vol. 54, no. 4,
24 Investor/StockHolders pp. 1187-1230, 2016.
Bonds in YYYY?
25 General public HBL is located in which countries? [18] A. C. e Silva, A. Jorge and L. Torgo, "Automatic selection of table
areas in documents for information extraction," in Portuguese
Conference on Artificial Intelligence, 2003.
[19] D. C. Wimalasuriya and D. Dou, Ontology-based information
ACKNOWLEDGMENT extraction: An introduction and a survey of current approaches, Sage
Publications Sage UK: London, England.
We are extremely obliged and thankful to Mohammed Ali
[20] N. F. Noy, D. L. McGuinness and others, Ontology development
Jinnah University, Karachi, and Islamic University of 101: A guide to creating your first ontology.
Madinah, for partially funding this work. We would like to [21] E. Arendarenko and T. Kakkonen, "Ontology-based information and
thank the financial banks of local and foreign regions for event extraction for business intelligence," in International
providing access to their annual reports. Conference on Artificial Intelligence: Methodology, Systems, and
Applications, 2012.
[22] J. Pujara, "Extracting Knowledge Graphs from Financial Filings,"
REFERENCES ACM Reference, vol. 2, 2017.
[23] W. Shen, J. Wang, P. Luo and M. Wang, "A graph-based approach
[1] M. Sheikh and S. Conlon, "A rule-based system to extract financial for ontology population with named entities," in Proceedings of the
information," Journal of Computer Information Systems, vol. 52, no. 21st ACM international conference on Information and knowledge
4, pp. 10-19, 2012. management - CIK.
[2] A. S. Corrêa and P.-O. Zander, "Unleashing Tabular Content to Open [24] J. Pujara, H. Miao, L. Getoor and W. W. Cohen, "Ontology-aware
Data: A Survey on PDF Table Extraction Methods and Tools," in partitioning for knowledge graph identification," in Proceedings of
Proceedings of the 18th Annual International Conference on Digital the 2013 workshop on Automated knowledge base construction,
Government Research, 2017. 2013.
[3] R. Rastan, "Automatic Tabular Data Extraction and Understanding.," [25] J. Pujara, H. Miao, L. Getoor and W. W. Cohen, "Large-scale
2017.. knowledge graph identification using psl," in 2013 AAAI Fall
[4] J. Yan, C. Wang, W. Cheng, M. Gao and A. Zhou, "A retrospective Symposium Series, 2013.
of knowledge graphs," Frontiers of Computer Science, vol. 12, no. 1, [26] V. Lopez, P. Tommasi, S. Kotoulas and J. Wu, "QuerioDALI:
pp. 55-74, 2018. question answering over dynamic and linked knowledge graphs," in
[5] Q. Wang, Z. Mao, B. Wang and L. Guo, Knowledge graph International Semantic Web Conference, 2016.
embedding: A survey of approaches and applications, 2017. [27] Paul Nelson, "Natural Language Processing (NLP) Techniques for
[6] L. Ehrlinger and W. Wöß, "Towards a Definition of Knowledge Extracting Information," [Online]. Available:
Graphs.," in SEMANTiCS (Posters, Demos, SuCCESS), 2016. https://www.searchtechnologies.com/blog/natural-language-
[7] S. Choudhury, K. Agarwal, S. Purohit, B. Zhang, M. Pirrung, W. processing-techniques.
Smith and M. Thomas, "Nous: Construction and querying of [28] Z. Zhao, S.-K. Han and I.-M. So, "Architecture of Knowledge Graph
dynamic knowledge graphs," in Data Engineering (ICDE), 2017 Construction Techniques," 2018.
IEEE 33rd International Conference on, 2017. [29] H. Paulheim, "Knowledge graph refinement: A survey of approaches
[8] K. Veel, "Make data sing: The automation of storytelling," Big Data and evaluation methods," Semantic web, vol. 8, no. 3, pp. 489-508,
& Society, vol. 5, no. 1, p. 2053951718756686, 2018. 2017.
[9] R. Angles, "A comparison of current graph database models," in [30] J. R. Trigueros, "Extracting earnings information from financial
2012 IEEE 28th International Conference on Data Engineering statements via genetic algorithms," in Computational Intelligence for
Workshops, 2012. Financial Engineering, 1999.(CIFEr) Proceedings of the IEEE/IAFE
[10] UTPAL BHATT, "Addressing Key Challenges in Financial Services 1999 Conference on, 1999.
with Neo4j," [Online]. Available: https://neo4j.com/resources- [31] Jo Stichbury, "WTF is a knowledge graph?," 2017. [Online].
old/neo4j-financial-services-white-paper/. Available: https://hackernoon.com/wtf-is-a-knowledge-graph-
a16603a1a25f.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
[32] Kristin Veel, "Make data sing: The automation of storytelling," vol. 30, no. 5, pp. 824-837, 1 May 2018, doi:
[Online]. Available: https://journals.sagepub.com/doi/abs/10.1177/2. 10.1109/TKDE.2017.2766634.
[33] B. Yildiz, K. Kaiser and S. Miksch, "pdf2table: A method to extract [51] Verhoef, P., Kooge, E., Walk, N. (2015). Creating Value with Big
table information from pdf files," in IICAI, 2005. Data Analytics. London: Routledge,
[34] A. Shigarov, A. Mikhailov and A. Altaev, "Configurable Table https://doi.org/10.4324/9781315734750
Structure Recognition in Untagged PDF Documents," in Proceedings [52] lemon: An Ontology-Lexicon model for theMultilingual Semantic
of the 2016 ACM Symposium on Document Engineering, New York, Web [Online]. Available:
NY, USA, 2016.. https://www.w3.org/International/multilingualweb/madrid/slides/decl
[35] M. Rospocher, M. Van Erp, P. Vossen, A. Fokkens, I. Aldabe, G. erck.pdf Accessed on: Jan 19, 2021.
Rigau, A. Soroa, T. Ploeger and T. Bogaard, "Building event-centric [53] MCB Bank Limited, Financial Statements For the year ended
knowledge graphs from news," Journal of Web Semantics, 2016. December 31, 2017. [Online]. Available:
[36] R. Rastan, H.-Y. Paik and J. Shepherd, "TEXUS: a task-based https://www.mcb.com.pk/assets/Accounts%20Dec%202017.pdf
approach for table extraction and understanding," in Proceedings of Accessed on: Jan 19, 2021.
the 2015 ACM Symposium on Document Engineering, 2015. [54] United Bank Limited, 2017 Annual Report. [Online]. Available:
[37] J. Pujara, H. Miao, L. Getoor and W. Cohen, "Knowledge graph https://www.ubldirect.com/corporate/resources/ubl/aboutus/financial
identification," in International Semantic Web Conference, 2013. _report/report_2017/UBLAnnualReport2017.pdf Accessed on: Jan
[38] W. Liu, J. Liu, M. Wu, S. Abbas, W. Hu, B. Wei and Q. Zheng, 19, 2021.
"Representation learning over multiple knowledge graphs for
knowledge graphs alignment," Neurocomputing, vol. 320, pp. 12-24,
2018.
[39] G. Ji, S. He, L. Xu, K. Liu and J. Zhao, "Knowledge Graph
Embedding via Dynamic Mapping Matrix," in Proceedings of the
53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural
Language Processing (Volume 1: Long Papers), 2015.
[40] S. Zwicklbauer, C. Seifert and M. Granitzer, "DoSeR - A knowledge-
base-agnostic framework for entity disambiguation using semantic
embeddings," in Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics), 2016.
[41] Neo4j, "White Paper: The Power of Graph-Based Search," 2015.
[42] Y. Jia, Y. Qi, H. Shang, R. Jiang and A. Li, "A Practical Approach to
Constructing a Knowledge Graph for Cybersecurity," Engineering,
vol. 4, no. 1, pp. 53-60, 2018.
[43] Wang Z., Guo M., Li Z., Tang M., Yu J. (2020), “Knowledge Graph
Construction for Payment Data Risk Control,” In: Yang CT., Pei Y.,
Chang JW. (eds) Innovative Computing. Lecture Notes in Electrical
Engineering, vol 675. Springer, Singapore, pp 1901-1907, 2020.
[44] Astrova I. (2020), “How the Anti-TrustRank Algorithm Can Help to
Protect the Reputation of Financial Institutions,” In: Dalpiaz F.,
Zdravkovic J., Loucopoulos P. (eds) Research Challenges in
Information Science. RCIS 2020. Lecture Notes in Business
Information Processing, vol 385. Springer, Cham, pp. 503–508
[45] Jiangtao Ren, Jiawei Long, Zhikang Xu, “Financial news
recommendation based on graph embeddings,” Decision Support
Systems, vol 125, October 2019, 113115.
[46] Yun H., He Y., Lin L., Pan Z., Zhang X. (2019), “Construction
Research and Application of Poverty Alleviation Knowledge Graph,”
In: Ni W., Wang X., Song W., Li Y. (eds) Web Information Systems
and Applications, WISA 2019. Lecture Notes in Computer Science,
vol 11817. Springer, Cham, pp. 430-442
[47] N. Yerashenia and A. Bolotov, "Computational Modelling for
Bankruptcy Prediction: Semantic Data Analysis Integrating Graph
Database and Financial Ontology," 2019 IEEE 21st Conference on
Business Informatics (CBI), Moscow, Russia, 2019, pp. 84-93, doi:
10.1109/CBI.2019.00017.
[48] Z. Chen, S. Yin and X. Zhu, "Research and Implementation of QA
System Based on the Knowledge Graph of Chinese Classic Poetry,"
2020 IEEE 5th International Conference on Cloud Computing and
Big Data Analytics (ICCCBDA), Chengdu, China, 2020, pp. 495-
499, doi: 10.1109/ICCCBDA49378.2020.9095587.
[49] Wang, Shaokai and Wang, Ziao and Zhang, Xiaofeng,
“基于知识图谱的智能投资决策 (Intelligent Investment Decision
Based on Knowledge Graph),” 2019. Available at:
https://ssrn.com/abstract=3449419 (Accessed: 09 November, 2020).
[50] S. Hu, L. Zou, J. X. Yu, H. Wang and D. Zhao, "Answering Natural
Language Questions by Subgraph Matching over Knowledge
Graphs," in IEEE Transactions on Knowledge and Data Engineering,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access
MS. SAMREEN ZEHRA received the B.S. DR. SYED IMRAN JAMI received his BS in
degree in computer science from University of Computer Science from the University of Karachi
Karachi, Karachi, Pakistan, in 2005 and the M.S. in 2000, MS in Computer Science from Lahore
degree in computer science from Mohammad Ali University of Management Sciences in 2004 and
Jinnah University, Karachi, Pakistan, in 2019. PhD in Computer Science from the National
Since 2008, she has been a Senior Software University of Computer & Emerging Sciences in
Engineer with the National Bank of Pakistan, 2011.
Pakistan. She has a vast experience in application He is one of the founding members of the Centre
development using the best technical practices for Research in Ubiquitous Computing and
and her research interest includes efficient use of associated with it since 2006. He also worked
knowledge management and retrieval techniques to facilitate decision with Haptics Research Lab and Pervasive and
making for the stakeholders in the financial industry. Networked Systems Research Group at Deakin
MS. ZEHRA former conference paper was about Ontology-based University, Australia.
Sentiment Analysis. She has been also a member of Australian Computer Dr Jami has authored sixteen journal papers and ten conference papers. He
Society since July 2018. worked on several funded research projects and supervised 12 Graduate
and Doctoral students.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/