0% found this document useful (0 votes)
30 views18 pages

Financial Knowledge Graph Based Financial Report Query System

Uploaded by

paulstavio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
30 views18 pages

Financial Knowledge Graph Based Financial Report Query System

Uploaded by

paulstavio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 18

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Financial Knowledge Graph based Financial


Report Query System (January 2021)
1 1 1 1
MS Samreen Zehra , Syed Farhan Mohsin , Shaukat Wasi , Syed Imran Jami ,
2 1
Muhammad Shoaib Siddiqui , and SM Khaleeq ur Rehman Raazi
1
Dept. of Computer Science, Mohammad Ali Jinnah University, Karachi, Pakistan
2
Faculty of Computer and Information Systems, Islamic University of Madinah, KSA

Corresponding author: Syed Farhan Mohsin, Dr. Shaukat Wasi (e-mail: farhan.mohsin@duhs.edu.pk, shaukat.wasi@jinnah.edu).

ABSTRACT Annual Financial Reports are the core in the Banking Sector to publish its financial statistics.
Extracting useful information from these complex and lengthy reports involves manual process to resolve
the financial queries, resulting in delays and ambiguity in investment decisions. One of the major reasons is
the lack of any standardization in the format and vocabulary used in the reports. An automated system for
resolution of intelligent financial queries is therefore difficult to design.

Several works have been proposed to overcome these problems using Information Extraction; however,
they do not address the semantic interoperability of the reports across different institutions. This work
proposed an automated querying engine to answer the financial queries using Ontology based Information
Extraction. For Semantic modeling of financial reports, a Financial Knowledge Graph, assisted by Financial
Ontology, has been proposed. The nodes are populated with entities, while links are populated with
relationships using Information Extraction applied on annual reports. Two benefits have been provided by
this system to stakeholders through automation: decision making through queries and generation of custom
financial stories. The work can further be extended to other domains including healthcare and academia
where physical reports are used for communication.

INDEX TERMS Ontology, Financial Knowledge Graph, Information Extraction

I. INTRODUCTION entities, language pattern variations, and statistical methods


Stakeholders seek information regarding company profile limitations [1]. Another problem in information extraction
and its general financial standing before taking any decision from these financial datasets is that these are usually
though various channels. They usually restrict their research available as non-structured texts or in PDF that involves
to the financial indicators, such as, revenues, net profit, meticulous manual preprocessing or application of
earnings per share (EPS) and price to earnings ratio (PE sophisticated ETL (Extract, transform, load) tools in order to
ratio) and credit ratings mentioned in its financial filings. The ingest data automatically [2] [3]. This step will be done
required information is widely dispersed in the quarterly and manually for our research work and resulting data will be
annual reports declared by companies; therefore, it is difficult stored in separate text files for each entity. The system scope
for investors to read and interpret the financial implications is defined after analyzing the dataset and competency
mentioned therein. The other problem is almost each and questions.
every company produce a bulky report and it is quite The integration of information extraction and the semantic
cumbersome for the stakeholders to go through it. Yet web help in extracting the related information from the
experts argue that investors should study management heterogeneous data formats and from multiple sources in the
discussion and analysis along with directors’ report to get a desired format. The addition of the new node or relations or
clear understanding of current state of affairs of a business. deletion of the previous node has made no effect in the
Automatic information extraction from these financial consistency and the information schema. It has the flexibility
disclosures is hard, owing to the lack of boundaries between to gather, exchange, and update the information from
the items to be extracted, context dependence of the targets different sources; the new nodes and the extracted

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

information is easily adjustable in the existing format without symbolic learning models using Tabu Search algorithm and
disturbing the structure of the ontology [47]. Greedy Search algorithm and evaluated their performances
We have employed knowledge graph (KG) for several using financial filings. In another research [13], authors
reasons in this research. Firstly, it may not necessarily have implemented a software agent which extracts fundamental
company data from the Electronic Data Gathering, Analysis
some semantic layer to describe the entity model, as it is
and Retrieval (EDGAR) database of the United States
required in relational databases [4]. Secondly, the schema is Securities and Exchange Commission (SEC) and outputs
flexible and easily adjustable that it is always easier to add this data in a format which is useful to support stock market
new properties to an existing record or modify schema trading decisions. EDGAR is a specialized database which
without affecting other graph entities [5]. Highly variable, stores information as provided by companies in the 10-k
incomplete, or dynamic data can be represented by Property Format or XBRL formats [14]. A two-step approach was
Graph stores that consume less space and supports proposed in [15] to perform rule-based text extraction and
attribute/link discovery. Finally, graph databases can answer acquisition of structured data from unstructured text. In this
queries that span over multiple entities by graph traversal. work, we are working with annual financial disclosures
Only those nodes which are accessible because of the query, which require data extraction from PDF files that includes
tables, graphics, structured and unstructured text [2], [16],
are explored by the graph database engine. Because every
[17] [18].
record is handled individually, it drastically boosts the query
performance and helps in reducing resource cost of the query
B. ONTOLOGY BASED INFORMATION EXTRACTION
results [6] [7].
In Ontology-Based Information Extraction (OBIE),
Our knowledge graph will assist an investor by generating
information extraction process is assisted by Ontologies [19].
financial stories that can aid decision making. Further plan is
Ontology is defined as a formal and explicit specification of a
to generate text based financial stories and further extend it to
shared conceptualization and is usually knowledge domain
visualizations/graphics as proposed by [8], the whole concept
specific [20]. As the task of information extraction also deals
is shown in Fig. 1.
with retrieving information for a particular domain, ontology
is one of the candidate solutions in information extraction
[21]. Researchers have been using ontology-based
mechanisms for extracting required information from
unstructured or semi-structured natural language text [21]
[22]. An Ontology model is developed for mobile payment
data risk control domain in [43]. The model takes the user as
entity and operation/transaction as relationship and gathers
the data on separate timestamp to fulfill the requirements of
the financial risk control domain.
To overcome the problem of the heterogeneous data, an
Ontology related to poverty alleviation domain is constructed
FIGURE 1. Sweet spot of Data, Story and Visual [51]. in [46]. This ontology is further used to create the nodes and
Following are the major contributions of this work: (A) edges of the knowledge graph. The visualization techniques
Integration of Information Extraction with Semantic Web (B) are applied on the knowledge graph which helps in providing
Proposed Financial Knowledge Graph to model the domain the results of the different queries related to the poverty
of Financial Systems (C) Mapping Financial Reports in the alleviation. Bankruptcy Prediction Computational Model
domain using Ontology (D) Extending manual financial (BPCM) is presented in [47], which is used to perform the
reports as machine readable using Information Extraction. bankruptcy predictions of the financial institutes or the
companies. Ontology of the Bankruptcy Prediction (OBP) is
II. RELATED WORK constructed to uniformly extract the data from different data
sources and to utilize the financial data of the companies.
A. INFORMATION EXTRACTION Semantic Analysis Graph Database (SAGRADA) is created
In recent years, most of the Financial Information extraction which consumes the OBP ontology, while the graph database
work has been done on a specific reporting standard known is used for storing and visualization of the data.
as XBRL (eXtensible Business Reporting Language) which
is a freely available and global framework for exchanging C. KNOWLEDGE GRAPH CONSTRUCTION
business information. In 2011, the Securities and Exchange A knowledge graph (KG) is a semantic graph consisting of
Commission (SEC) mandated XBRL as the filing standard vertices (or nodes) and edges. The vertices represent
for all US public companies. A rule-based information concepts or entities. The edges represent the semantic
extraction methodology was introduced in [1] for the relationships between concepts or entities [6]. By exploiting
extraction of highly accurate financial information to aid KG, partially observed entities and concepts can be
investment decisions. They trained two different rule-based

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

connected together to form a complete and structured Reference [43] utilizes the knowledge graph to represent the
knowledge repository [4]. transactions data visually which helps in reducing financial
For a knowledge graph, usually an ontology defines the frauds in the domain of financial risk control. Reference [44]
architecture and constraints for the data residing in it. claims that an Anti-TrustRank algorithm based on the
Ontology assisted Financial knowledge graph (KG) knowledge graph data of the financial institutions can be
population deals with attaching the detected named entity used for Anti Money Laundering purpose also. The
with the correct label/category. When a named entity is algorithm considers the web as a graph, pages and the link
detected from the unstructured text, which has no ontological between the pages, as node and edges, respectively and it
mapping defined, right node category is sought in KG to assists the financial institutions in finding the money
attach the entity; this task is known as fine-grained named launderers and helps to protect the financial institute from
entity classification [23]. Otherwise, if the desired entity money laundering. Reference [45] proposed a financial news
mapping exists in the ontology, the aim of this task is to link recommendation framework, based on NNR and INNR
this detected entity mention with its corresponding real-world models, which uses the knowledge graph for the financial
entity in the knowledge graph (KG), which is known as the news recommendations. The edges of the graph are updated
entity linking task. during the stock market trading through INNR and after the
An extensive comparison survey of well-known graph stock market closing through NNR. Both the models are then
databases was conducted by the authors in [9]. The results combined in the end to attain better, accurate, and efficient
are summarized in Table I. Owing to the powerful features financial news recommendations. The idea of extracting
and flexibility of Neo4j1, we selected it for our knowledge financials news specifically related to the Chinese stock
graph implementation. market from different Chinese encyclopedias and financial
news websites is introduced in [49]. According to the author,
TABLE I the news will set the stock market sentiments. The ontology
GRAPH DATABASE DATA STRUCTURES [9] is created, and the financial knowledge graph is used to
construct the relationship between the entities of the stocks
Graph Type Nodes Edges from the financial news, which is further used to analyze and
identify the impact of the news on different stock prices and
Node labeled

Edge labeled
Hypergraph

attribution

attribution
Attributed

the different possibilities of the stock risks involved in timely


Directed
Simple

Nested

Node

Edge

Graph Database manner.

D. QUESTION ANSWERING
AllegroGraph     Question answering system based on knowledge graph of
DEX       Chinese classic poetry is proposed in [48]. The Chinese
Filament     poetry related information is extracted from the classical
Chinese poetry website and the knowledge graph is
G-Store    
constructed on the basis of this data and stored in the
HyperGraphDB     database. The Rasa framework, as a natural language
InfiniteGraph       processing, is adapted to answer the queries of the user. A
Neo4j       graph data-driven framework is proposed in [50], which
     
provides the answers of the natural language questions using
Sones 
RDF graph repository. In the first step the authors have
vertexDB     translated the natural language questions into SPARQL and
then in the next step all the translated SPARQL's are
Knowledge graph identification (KGI) is a technique for evaluated by the system, which provides the answer of the
knowledge graph construction that jointly reasons about question. IBM based researches proposed Question
entities, attributes, and relations in the presence of uncertain Answering system [26] to sequentially perform linguistic
inputs and ontological constraints [24]. Candidate facts from analysis of query, do named entity extraction, entity / graph
an information extraction system can be represented as an search, fusion and ranking of possible answers. Our research
extraction graph; where entities are nodes, categories are is also following the similar approach.
labels associated with each node, and relations are directed
edges between the nodes [25]. In another work [22], the E. GAP ANALYSIS
authors studied financial documents for knowledge graphs Previous subsections identified research background in
population with financial entities and their interrelationships. different related areas, which show that related works have
been performed in parts by communities of Semantic Web
They presented experimental results and discussed
and Information Extraction and Visualization. However, the
knowledge graph (KG) construction techniques on financial existing research literatures have not provided any complete
filings along with its challenges and possible solutions. system that extends flat financial reports to machine
readability for making it query-able. With the complex
1
https://neo4j.com/product/

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

nature of these reports, the manual process of finding III. METHODOLOGY


relevant information for investment is quite tedious and
cumbersome task. This work is an attempt to integrate A. PROJECT PHASES BREAKDOWN
information extraction work with semantic web on Financial We have divided the research work broadly in three phases.
Reports using Knowledge Graph. All the phases are discussed separately in this paper.
TABLE II
GAP ANALYSIS

Ontology enrichment
Structured Financial

discovery using KG
Entity Resolution

linking/Relation
extraction using

Construction of
extraction from

Ontology/Rule-
Unstructured /

Q/A using KG
Financial KG

identification

Story telling
Information

/Knowledge
based/LOD

and KG
/Entity
Paper

Year

Text

[12] 2018 H H H H NH H
[42] 2018 H H H H NH NH
[22] 2017 H NH H H NH NH
Ontology for top down+ LOD
[28] 2018 NH H H H H
bottom-up
[24] 2013 H Ontology H H NH NH
[1] 2012 H NH NH NH NH NH
Rule based for ER / Ontology for
[21] 2012 H H NH NH NH
event detection
[37] 2013 NH Ontology H H NH NH
[26] 2016 NH LOD (WikiData) NH H H H
[32] 2018 NH NH NH NH H NH
[3] 2017 NH NH NH NH NH NH
[23] 2012 H Ontology H NH NH NH
[7] 2017 NH NH H H NH H

Table II shows the limitations of existing research on 1) DEFINING COMPETENCY QUESTIONS


financial knowledge graph based financial query system. It For our knowledge graph (KG) to answer the competency
depicts that very limited number of research have been done questions, following steps were needed:
in this domain. In Table II, the H stands for Handled and NH  Find Information resources that can provide
stands for Not Handled. In most of the research papers, the valuable information
information extraction part is missing because extracting the  Gather Information
related information from the curated, heterogeneous, and
complex data is still a challenging task. The construction of  Translate information in machine readable form
financial knowledge graph for the purpose of querying,  Extract Director’s report
visualization, and producing results and stories is also not
 Study and analyze data set
commonly used by the researchers. The facility to get the
answers of the users’ questions and queries in the form of 2) ONTOLOGY ENGINEERING
natural language is also an important factor. This factor is Keeping competency questions in mind from previous step,
also not commonly provided by the researchers, as shown in enumerate all the terms that should be available in ontology:
Table II.  Define Concepts/Classes
By finding the above deficiencies, a novel approach in this
domain is proposed for extracting financial information from  Define Object properties and Data Properties
different banks annual disclosures using ontology and store  Populate static instances into ontology like
that in an efficient Financial Knowledge Graph (FKG) for bank
future refinements, agility, and fact discovery. The graph can
 Names that will help in information extraction
be queried for getting answers to user queries and will be
phase
able to generate user stories according to the needs of
different users.  Define axioms/constraints
The sample competency questions are shown in Fig. 2.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

SAMPLE COMPETENCY QUESTIONS 5) QUESTION/ANSWERING AND STORY TELLING


This phase involves validating if the knowledge graph can
► Name the bank having most after- tax profitability in
answer the queries well. It involves extracting target entities
year YYYY? from the query, convert it into a Graph querying statement,
extract results from graph and display the results.
► Name the bank having most before-tax profitability
The different phases involved in the Construction of
in YYYY? Financial Knowledge Graph from Banks’ Annual
Disclosures are shown in Fig. 3.
► Which bank paid the highest tax amount?
► Name the banks who’s profitability has declined in
YYYY compared from the previous year.
► Which bank had the highest dividend payout ratio in
XXXX?
► Name the bank having highest capital reserve ratio
in XXXX?
► Name the banks those are satisfying Capital Reserve FIGURE 3. Phases of Construction of Financial Knowledge Graph from
Banks’ Annual Disclosures
requirement as set by SBP.
6) KNOWLEDGE GRAPH ENRICHMENT
► Rate the bank in terms of the profitability and This occurs when stakeholder asks for a piece of
sustainability factors. information and it is not mapped into entity. Ontology will
be updated in that case and documents will be rescanned to
► Which banks have long term rating AAB or higher? enrich knowledge graph (KG) with newer nodes and
► Which banks have short term rating A1+ or higher? relationships. In case the information is new, its attribute
will be analyzed and it will be added in existing financial
► Who did Bank X audit in YYYY? knowledge graph (KG).

B. DATA SOURCES
FIGURE 2. Sample Competency Questions Primarily, we ingest information from financial filings;
however, further data sources can add value to knowledge
3) ONTOLOGY-BASED INFORMATION EXTRACTION graph (KG) enrichment. We have identified following three
The steps involved in ontology-based information extraction types of potential data sources [52].
are mentioned below:
1) SEMI-STRUCTURED
 Study and analyze data set  Annual Report.
 Define stop words  Longer company profiles.
 Document preprocessing  Imprint information on company web pages.
 Extract all the terms from ontology that are known  Running tickers on company information.
instances/entities and can be directly mentioned in
the text along with its direct and indirect super 2) STRUCTURED SOURCES
classes. This will serve as a gazetteer list.  Publicly available balance sheets in structured
format.
 Extract relationship names between two entities
using object property of an ontological concept.  Short company profiles (e.g. from Business
Registers, Stock Exchange, web pages, etc.)n
 Apply rule-based information extraction techniques
with supporting information found in previous two  Wikipedia Infoboxes.
steps. A set of rules were manually crafted and
3) UNSTRUCTURED
implemented to extract each target.
 Annexes to balance sheets in annual reports of
4) KNOWLEDGE GRAPH CREATION companies.
This phase is overlapped with the previous phase as the
 Newspapers.
information extracted from text and ontology will help in
knowledge graph (KG) creation that is Knowledge graph  Specialized web pages etc.
population with appropriate nodes, relationships, and labels.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

C. PROPOSED SYSTEM ARCHITECTURE ontology, and SPARQL query language for retrieving data
from the ontology. We have used OWLAPI to import and
The proposed System architecture, which presents how the manipulate our ontology in Eclipse3 (Java IDE). The
user queries are processed and how the system will generate different steps involved in the ontology engineering process
the results, is shown in Fig. 4. are shown in Fig. 6.

FIGURE 4. Proposed System Architecture

D. POTENTIAL USERS
The proposed system will benefit investors, creditors,
external agencies, regulators and account holders for
decision making as shown in Fig. 5.

FIGURE 6. Ontology Engineering Process

MCB4 Bank Limited reported Profit Before Tax (PBT) of


Rs.31.01 billion and Profit After Tax (PAT) of Rs. 22.46
billion. In comparison with the last year, Profit Before
Tax has decreased by 14.03% whereas Profit After Tax
has increased by 2.59% on account of reversal of prior
FIGURE 5. Stakeholders of the proposed System year tax charges. Net markup income of the Bank was
reported at Rs. 42.41 billion, down by 3.21% over last
IV. ONTOLOGY DEVELOPMENT year owing to the maturity of high yielding bonds and
comparative low-interest rate environment. On the gross
A. INTRODUCTION markup income side, the Bank reported an increase of
Ontology Based Information Extraction (OBIE) is Rs. 6.69 billion whereas on the interest expense side,
exploited in this research. Ontology develops a shared the Bank registered an increase of Rs. 8.09 billion over
understanding of domain by building common vocabulary. last year. To supplement its net interest margins, the Bank
Ontology comprises of concepts, instances, object/data remained focused on increasing its low- cost deposit base
properties, and sometimes other existing ontologies [20]. and venture in higher-yielding assets. The Board of
There are several tools available for ontology engineering. Directors declared a final cash dividend of Rs.4 per share
In this work, we have used Protégé2, being user friendly and for the year ended December 31, 2017, which is in
addition to Rs. 12.0 per share interim dividends already
widespread tool for editing and developing ontologies. It
paid to shareholders, taking the dividend payout ratio
hides the underlying complexity of domain modeling and
to 83.14%. The effect of the recommendation is not
enables users to focus on the domain knowledge in terms of
reflected in the above appropriations.
real-world entities, inter-entity relationships, and constraints.
Protégé ontologies can be effortlessly exported into multiple FIGURE 7. Financial Information Extracted from MCB Bank Limited
formats, such as, Resource Description Framework (RDF), Annual Report [53]
Turtle, and Web Ontology Language (OWL). In addition to
this, we have used VOWL and Onto Graf plugins to visualize

3
Eclipse https://www.eclipse.org/
2 4
Protégé https://protege.stanford.edu/ MCB https://www.mcb.com.pk/

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

B. DATA SET  Which bank had the highest dividend payout ratio?
For domain modeling, annual reports are collected from
online repository5 published by banks across the globe in  Name the bank having highest capital reserve ratio
English Language. The concepts generated from these in YYYY.
reports are modeled as tuples to help in guided information  What is Profitability After Tax of XXXX(bank
extraction of meaningful patterns. The ontological dataset is name) in YYYY? What was the long term rating
then applied on two major commercial banks of Pakistan as of ABL in YYYY?
proof of concept towards adaptability.
 How many banks have rating “Extremely Strong”
The related financial information is extracted from MCB in a long term?
Bank Limited Annual report 2017, which contains the brief
information about the bank’s financial information, shown in  Will UBL default in next year?
Fig. 7.  XXX audited which banks this year?
The related financial information is extracted from United
Bank Limited Annual report 2017, which contains the brief  How many banks were given Extremely Strong
information about the bank’s financial information, shown in long term rating this year by PACRA?
Fig. 8.  Which banks have stable profitability ratios over a
UBL6 posted profit after tax (PAT) amounting to Rs. 25.4 period of time?
billion during the year ended December 31, 2017 compared
to Rs. 27.7 billion in 2016. Earnings per share were  Who won Best Local bank award in YYYY?
reported at Rs. 20.77 in 2017 against Rs. 22.65 per share  Who gave Best Investment Bank Award in
last year. Profit before tax (PBT) closed at Rs. 40.2 billion YYYY?
in 2017 compared to Rs. 46.0 billion in 2016. The
consolidated PAT stood at Rs. 26.2 billion in 2017 (2016:  Which banks’ stock are safer to invest in?
Rs. 28.0 billion) with earnings per share recorded at Rs.
 EPS > 0 in last year
21.39 (2016: Rs. 22.70). Gross revenues stood at Rs. 78.6
billion (Dec’16: Rs. 80.7 billion). Despite the low interest  Long term rating Extremely Strong or Strong
rate regime, growth in the balance sheet maintained net
markup income in line with the 2016 level to close at Rs.  Short term rating High
56.4 billion. Non-markup Income decreased by 6% year on  Current year Stock price > Last year stock price
year to reach Rs. 22.2 billion in 2017 but mainly due to
lower capital gains and dividends. The cost to income ratio  Which banks stocks are risky but yield is higher
increased from 39.6% in 2016 to 45.0%.
 Long term rating NOT(Extremely Strong or
Strong)
FIGURE 8. Financial Information Extracted from United Bank Limited
Annual Report [54]  Short term rating NOT High
C. DEFINE DOMAIN AND SCOPE  (Current year Stock price - Last year stock price)/
Keeping in view the data available in the Annual report, Last year stock price > some threshold percentage
competency questions from dataset are used to define  Which bank Net Interest Income has increased this
ontological concepts/properties and limit system scope. year?
 Name the bank having most after- tax profitability  Does MCB has Islamic Window?
in year YYYY?
 What is the currency of USD bond?
 Banks whose Market Price per Share @Year Start
> @YearEnd OR whose stock price has increased  Which bonds are foreign government bonds?
this year
 What type of dividend did bank XXX gave to its
 Which bank pays the highest dividend to his stock holders in YYYY?
stockholders?
 Did UBL invested in Manufacturing Sector in
 List banks whose stock/share market price YYYY?
increased over last three years?
 MCB invested in which Government bonds this
 Name the bank having most before-tax profitability year?
in YYYY?
 What was UBL total investment in Bonds in
 Which bank paid the highest tax amount in year YYYY?
YYYY?
5
https://www.annualreports.com/
6
UBL https://www.ubldigital.com/

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

D. CONCEPTS/CLASS HIERARCHY Real Estate,


The detailed information regarding classes, description Wapda Bonds
regarding classes, parent class and instances are shown in France,
Locations of the
Table III. Location Location Financial Institutions /
Germany,
Pakistan, UK,
TABLE III.
Banks
USA
CLASS HIERARCHY A, A-, A-1, A-
Parent 1, A-2, A-3,
Class Description Instances A1, A2, A3,
Class
AA, AA-,
Award given to a bank
AAA, Aa1,
Award Award by an award giving PBA Ratings awarded by
Rating Rating Aa2, Aa3, Aaa,
organization the Rating Firms
BBB, BBB-,
Cash, Interest
Baa1, Baa2,
BalanceSh BalanceS Items present in a payable,
Baa3, F1, F2,
eetItem heetItem Balance Sheet Property, Tax
F3, P-1, P-2, P-
payable
3
ASKARI
ABL,
BANK LTD. -
BankIslami,
TFC, ASPIN
DIB, Deloitte,
PHARMA
Dubai_Islamic_
(PVT) LTD -
Bank, EY,
SUKUK, Organizati Organiza Financial Institutions /
Fitch, HBL,
BANK AL- on tion Organizations
IBP, JCR-VIS,
HABIB LTD. -
KPMG, MCB,
TFC, JS BANK
Moody’s,
LTD. - TFC, K-
Bond Bond Types of Bonds PACRA, PwC,
ELECTRIC
S&P, UBL
LTD. -
Abbott
SUKUK,
Laboratories
MASOOD
Pak Ltd., Allied
TEXTILE
Bank Ltd.,
MILLS LTD. -
Buxly Paints
SUKUK,
Ltd., Fauji
National
Group of companies Cement Co
Saving Bonds,
that operate in the Ltd., Fauji
Wapda Bond
Sector Sector same segment of the Fertilizer Co.
Bank Islami,
economy or share a Ltd., Habib
Best Bank
similar business type Bank Limited.,
Award, DIB,
Highnoon
HBL, Khushali
Category Category Categories of Banks Laboratories
Microfinance
Ltd., KAPCO,
Bank, Pakistan
PIA, Sanofi-
Banking
Aventis
Awards
Pakistan Ltd.
Currency of different AUD, GBP,
Currency Currency Ownership certificates
countries PKR, USD Stock Stock
of any company
Bi Yearly
BalanceS Resources owned by
Dividend, Asset Cash, Property
heetItem the company
Profit distribution of Bonus Shares,
Interest
Dividend Dividend the companies to its Quarterly BalanceS Debt or obligations of
Liabilities payable, Tax
shareholders Dividend, Right heetItem the company
payable
Shares, Yearly
Dividend ASPIN
PHARMA
Different vocabularies
(PVT) LTD -
Financial Financial and terminologies Corporate Bond is a
SUKUK, K-
Term Term commonly used in the debt security which is
ELECTRIC
financial market CorporateB issued by company
Bond LTD. -
Amazon, ond and sold to investors to
SUKUK,
Apple, Gold, meet its financial
MASOOD
Investme Spending money for MCB-DCF requirements.
Investment TEXTILE
nt generating income Income Fund,
MILLS LTD. -
PIB, Pakistan
SUKUK
Income Fund,

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

Government bond is a depositors.


debt security loaned by Financial Profits after payment
PAT
a government to assist National Term of tax
GovtBond Bond government spending, Saving Bonds, Financial Profits before payment
most often issued in Wapda Bond PBT
Term of tax
the country’s local Profit_Bef Financial Profits before payment
interest. ore_Tax Term of tax
ASKARI ProfitAfter Financial Profits after payment
BANK LTD. - Tax Term of tax
Certificates issued by
TermFinan TFC, BANK BondInvest Investme Investment in the form PIB, Wapda
the companies for the
ceCertificat Bond AL-HABIB ment nt of Bonds Bonds
generation of short and
e LTD. - TFC, JS MCB-DCF
medium term funds. MutualFun
BANK LTD. - Investme Investment in the Income Fund,
TFC dInvestmen
nt Mutual Funds Pakistan
The different t
Best Bank Income Fund
categories awarded on Investment in the
Award, SectorInve Investme Gold, Real
AwardCate the basis of the different sectors of the
Category Pakistan stment nt Estate
gory performance and economy
Banking
efforts in different Investment in the
Awards StockInves Investme
sectors different stocks of the Amazon, Apple
Bank Islami, tment nt
companies
DIB, HBL, Audit firm investigates
BankCateg The different Organizat Deloitte,
Category Khushali AuditFirm frauds, deficiencies in
ory categories of the bank ion KPMG, PwC
Microfinance the organization
Bank Award Firm reward
Bi Yearly Organizat
AwardFirm and recognizes the IBP
Dividend, ion
Funds paid to company’s efforts
Cash_Divi Quarterly Financial institute ABL,
Dividend shareholders in the
dend Dividend, Organizat which accepts deposits BankIslami,
form of cash Bank
Yearly ion and provide loans to DIB, HBL,
Dividend the customers MCB, UBL
Funds paid to Credit rating agency is
Stock_Divi Bonus Shares,
Dividend shareholders in the an independent
dend Right Shares
form of stock enterprise that
Earnings per share evaluates the financial
Earnings_P Financial indicates the standing of issuers of
er_Share Term company’s financial Fitch, JCR-
Organizat debt instrument and
position in the market RatingFirm VIS, Moody’s,
ion then assigns a rating
Earnings per share PACRA, S&P
that exhibits its
Financial indicates the assessment of the
EPS
Term company’s financial issuer’s aptitude to
position in the market make the debt
Gross_Mar The revenue generated payments.
Financial
kup_Incom after eliminating the A, A-, A1, A2,
Term
e cost A3, AA, AA,
The cost of borrowing Long Term rating have AA-, AAA,
Interest_Ex Financial
money from financial LongTerm Rating the maturity of one Aa1, Aa2, Aa3,
pense Term
institutions year or more Aaa, BBB,
Amount earned by an BBB-, Baa1,
InterestInc Financial investor's money that Baa2, Baa3
omeEarned Term he places in an Short Term rating have A-1, A-2, A-3,
investment or project. ShortTerm Rating the maturity of one F1, F2, F3, P-1,
Difference between year or less P-2, P-3
the interest incomes a Allied Bank
Net_Interes Financial bank earns from its Banking Sector Ltd., Habib
t_Income Term lending activities and Bank Limited.
the interest it pays to Infrastructu
depositors. Sector KAPCO, PIA
re
Difference between Buxly Paints
the interest incomes a Ltd., Fauji
Financial Manufactur
NII bank earns from its Sector Cement Co
Term ing
lending activities and Ltd., Fauji
the interest it pays to Fertilizer Co.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

Ltd. [27]. We are using ontology for entity recognition, entity


Abbott alignment and relation extraction. Entities will be stored into
Laboratories RDF/OWL based knowledge graph for the sake of
Pak Ltd., efficiency, scalability and flexibility [22].
Pharmaceu Highnoon
Sector In order to construct a knowledge graph, information
tical Laboratories
Ltd., Sanofi- extraction is critical for its correctness. Relevant information
Aventis from unstructured text corpora is extracted and mapped to
Pakistan Ltd. some pre-defined knowledge graph concept, consequently,
the structural relationships are defined between extracted
entities. Following sections discuss the detailed approach
used for information extraction.

FIGURE 9. Data Properties

E. DATA AND OBJECT PROPERTIES FIGURE 10. SPARQL for retrieving Bank names in Ontology
Object Properties will serve as relationships in the KG and
data properties will serve as Entity attributes. The details are A. BASIC PROCESSING
shown in Fig. 9. In this phase, some basic processing will be required to
convert this character stream into a sequence of lexical items
F. OWL API FOR ONTOLOGY IMPORT AND QUERYING (words, phrases, and syntactic markers) to further consume
SPARQL is used for querying Ontology. All the queries information. Each document is passed through sentence
were first validated in Protégé then were used in Java for splitter and tokenizer. Sentence splitter splits the sentence
information extraction from Ontology. Fig. 10 shows the using some delimiter and tokenizer chops input character
SPARQL Query for retrieving valid Bank names which are streams into tokens that can be words, numbers, identifiers
instances of Concept “Bank” and Fig. 11 shows Ontology. or punctuation (depending on the problem).

V. INFORMATION EXTRACTION FROM ANNUAL B. TEXT PRE-PROCESSING


REPORTS The second phase includes Named Entity Recognition and
Information Extraction (IE) deals with automatic retrieval of co-reference resolution. The rule-based processing is
certain types of information from natural language text. It employed to recognize the named entities alongside a
aims to retrieve occurrences of a particular class of objects gazetteer to hunt the overall sorts of entities (relation terms,
and identify relationships among them [19]. Once the text stop words etc.) and ontology to seek out domain-specific
corpora is developed manually, next phase is entity entities like bank names, Rating Agencies, Award Names
extraction/recognition and relationship extraction/prediction and Financial terms [21].

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

FIGURE 11. Ontology for Knowledge Graph


VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

information. These limitations pushed researchers to


C. ENTITY RESOLUTION develop alternative database technologies known as NOSQL
We have created different Classes with “Same as” attribute databases [9]. These databases can be categorized on the
for detecting bank name and term variations in the text as basis of underlying data models like 1)Wide-column stores
United Bank Limited may be written as UBL or United uses Google’s BigTable model (e.g., Cassandra) 2)
Bank Ltd. Therefore, in our knowledge graph all will be Document stores are designed to store semi-structured data
considered the same. (e.g., MongoDB) 3) Key-value stores maintains a key to
value persistent map for data indexing and retrieval (e.g.
D. RELATION EXTRACTION BerkeleyDB); and 4) Graph Databases that store
It deals with extracting relation between two entities information in a graph-like data structure.
detected by NE Recognizers with the help of additional
TABLE IV.
annotation. INFORMATION EXTRACTION TECHNIQUES USED
The core of Entity and relation extraction is a hand-crafted Extraction
collection of rules. The patterns are generated through text Information Type Technique applied
Type
analysis and represent the unique language constructions
which are used to describe a particular Entity/Relation. These Entity
Amount Regular expression
patterns are then matched with processed text to discover and Extraction
extract required pieces of information [21]. Entity
Percentages Regular expression
Extraction

Entity
“UBL posted profit after tax amounting Extraction
Dates (Year only) Regular expression

to Rs. 25.4 billion during the year ended Entity


Bank Ontology based extraction
December 31, 2017” Extraction
Rule: Bank_ profit after tax _Amount Entity
Financial Term Ontology based extraction
({Organization}): Bank Extraction
({Token, !Split})*
Entity
{Token.root == "profit after tax"} Extraction
Ratings Ontology based extraction
({Token, !Split, ! Money })*
({Money }):Amount Entity
Audit Firm Ontology based extraction
Extraction
FIGURE 12. SPARQL for retrieving Bank names in Ontology
Entity
Award Ontology based extraction
Our system knows the bank names and financial terms from Extraction
the ontology, therefore, they are defined as entities of the
Entity Balance Sheet
type Bank and financial term in the named entity recognition Ontology based extraction
Extraction Terms
step. The currency amount “Rs. 25.4 billion” is classified on
the same step and is given the Money annotation. The rule Entity
Currency Ontology based extraction
“Bank_ profit after tax _Amount” is triggered as the relation Extraction
extraction phase begins, as its pattern is a perfect match for Entity
the input sentence, as shown in Fig. 12. The time of the Location Ontology based extraction
Extraction
mentioned sentence will be stored in knowledge graph (KG)
on relation attribute [21]. The output of this phase are Verbs stored in
Relation Hand Crafted Rules/Rule
semantic triplets with additional information of direct and Extraction
ontology as
based
indirect super classes of the extracted entities and attributes relationships
related to each triplets (like YEAR in above example) to be
added in the knowledge graph (KG). This project is different
in terms of Entity types, therefore, standard IE libraries may B. KNOWLEDGE GRAPH CONSTRUCTION
not be applied directly. Below are category-wise items and Knowledge graph can be constructed using 1) top-down
the respective technique employed. approach based on some knowledge base/ schema such as
the domain ontologies or 2) bottom-up approach focusing on
knowledge instances such as Linked Open Data (LOD)
VI. INFORMATION EXTRACTION FROM ANNUAL
REPORTS datasets [28]. As we are using top-down approach, we
developed the ontology in advance. Information extraction
and Knowledge graph population is overlapped in our case.
A. OVERVIEW
As the information is extracted from the text it is added into
Relational databases are so powerful and well understood
knowledge graph (KG) along with additional annotations
yet still carry many limitations when it comes to efficient
like super classes of extracted entities.
storage, scalability and efficient query processing where
several joins are needed to get a specific piece of

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

We have used Neo4j for implementing our Graph


Database, although it is not a pure knowledge graph (KG) in hasLocation
a real sense but it provides the structure and API for the proof MATCH (b:Bank{orgName:'HBL'}) ,
of our concept.
(l:Location{locationName:'Pakistan'}) CREATE (b)-

C. INSTALLING AND CONFIGURING NEO4J [h:hasLocation]->(l) return h


Neo4j is a browser based Graph Database which has API hasLonTermRating
support for many languages. We have used Neo4j API for
manipulating the database in Eclipse – JAVA IDE. The Ne4j CREATE
server needs to be started before running any commands (r:Rating:LongTerm:ExtremelyStrong{ratingName:'AAA'})
either on browser or from a program like any Database
CREATE CONSTRAINT ON (l:Rating) ASSERT l.ratingName IS
Server.
UNIQUE
Location
hasTerm
CREATE
MATCH(i:FinancialTerm)
(l:Location{locationName:'Pakistan',locationType:"Country"
where ID(i)=3224 with(i) match(b:Bank{orgName:'HBL'})
,isLocationGlobal:false})
MERGE (b)-[:hasTerm{year:2017}]->(i)
CREATE CONSTRAINT ON (l:Location) ASSERT
l.locationName IS UNIQUE FIGURE 14. Sample Script for Relationship creation based on
Ontology
Rating
CREATE(r:Rating:LongTerm:ExtremelyStrong{ratingName:'
Fig. 15 shows the labels generated from the concept
taxonomy.
AAA'})
Bank
CREATE(r:Organization:Bank{orgName:'HBL',isBankIslamic
:true,isBankConventional:true})
CREATE CONSTRAINT ON (a:Organization) ASSERT
a.orgName IS UNIQUE
FinancialTerm
CREATE(r:FinancialTerm:Profit_Before_Tax{financialTermN
ame:'ProfitAfterTax',financialTermAmount:1015000000,pe
rcentChange:4}) return id(r)
FIGURE 15. Generating Labels from the Concept Taxonomy

FIGURE 13. Sample Script for Node creation based on Ontology


Fig. 16 shows the Enumeration defined as Slots for
Knowledge graph attribute vales.
D. CREATING NODES
For node creation, Ontological information will be extracted
for creating Node labels (Class/Category). Similarly, data
properties of a Concept in Ontology will become property of
the Nodes. For Example, HBL is a Bank that is also an
Organization, therefore, HBL will have two Node Labels;
Organization and Bank. A bank may be situated on multiple
locations therefore, hasLocation relationship will connect
FIGURE 16. Enumeration defined as Slots for KG attribute values
the Bank to multiple locations which are entities as well.
The nodes creation examples are mentioned in Fig. 13.
Fig. 17 shows the Node with attributes having multi-
E. CREATING EDGES
Object Property names from ontology will become labels. The highlighted area with arrow sign in the Fig. 18
relationship names in the Financial knowledge graph (KG) shows the Relationship of Node with the Attribute. The
with domains and ranges of the concept as target entities relationship name is mentioned within the arrow sign.
types. The link attributes will be populated using the
information extracted from the text. Fig. 14 shows queries F. Abbreviations and Acronyms
to establish the relationship between entities created in the Knowledge graph can never be complete as real world’s
last step. formalized knowledge cannot reasonably reach full

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

coverage, it contain information about each and every entity


in the universe. Furthermore, it is nearly impossible to
construct a knowledge graph which is fully correct,
especially when heuristic methods are applied. The trade-off
between coverage and correctness is handled differently in
each knowledge graph [29]. Knowledge graph refinement
improves an existing knowledge graph like adding missing
knowledge or identifying/removing errors. Logical
reasoning is applied on some knowledge graphs for
validating the consistency of statements in the graph, and
removing the inconsistent statements.

FIGURE 17. Node with attributes having multi-labels

FIGURE 19. Example Cypher Queries

VII. QUERYING OVER KNOWLEDGE GRAPH


It is difficult to extract who won the BEST INVESTMENT
BANK AWARD in 2018 from ontology as winAward
relationship does not specify the year in which an award was
won by a bank or we need to define a separate class in
Protégé to maintain the relationship between AWARD and
BANK for keeping track of the year as well. Whereas, it is
very simple in knowledge graph to directly query. Similarly,
the information about the final dividend and already paid
FIGURE 18. Relationship with Attribute dividend amount of all the banks or a particular bank or the
list of highest dividends paid banks can also be available
through a simple knowledge graph query. The information
In this research work, when new resources are added for related to the profit before tax and after tax or list of the
Information Extraction in future, existing ontology works banks, who paid the highest tax during a particular year can
with an extension, if new taxonomy/property/relationships also be obtained easily. The examples of different cypher
values are defined. If only data is of newer format/type then queries with their results are shown in Fig. 19.
it can be directly ingested into knowledge graph (KG). The multi-labeling and Edge Property is utilized in the
We need to apply rule-based extraction or statistical above queries. In this work, similar approach is followed as
methods to get to know the information pattern based on followed in [26]. Presently, we have developed the interface
relationship between target information with some other in Visual Studio 2012 for querying Financial knowledge
entity. If we can infer the entity type through its pattern graph (KG) as displayed in Fig. 20.
(Information categorization/Entity Type recognition), the
1) NAMED ENTITY (NE) AND RELATION
information will be appended in the Financial ontology. All
EXTRACTION
the uploaded documents will be rescanned, and knowledge
We follow this step for user queries as done for financial
graph (KG) will be populated with the term related data
knowledge graph (KG) generation.
values. In case of some new information category or entity
property values, for now, ontology will be updated manually.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

2) ANCHORING generate a working prototype, we have focused on director's


User vocabulary may differ from terminologies used in statement.
the KGs, this gap is filled by semantic expansion. Here, The result shows that our proposed system based on financial
we do this by using Ontology – “Equivalent Classes”. knowledge graph, successfully provides the desired
3) GRAPH PATTERN SEARCH information against the queries of the general public or
This deals with mapping the extracted information from investors related to the investment in different banks or for
query to a Graph pattern search query for getting results. the particular bank, efficiently and smoothly.
We have written rules to translate user requirement into a In this research only banks are considered; however, this
formal query that can be executed against the knowledge model can be extended to cater information of other
graphs. companies in future. This research can be applicable in
4) MERGING OF RESULTS variety of domains like, healthcare, travelling, fuel
Some queries are complex enough and need merging the companies, production companies, etc. where the annual
results of multiple Cypher queries to give an answer to reports are published but their customers and end users are
the user. unable to query or gather the desired information easily.
Additionally, for making graph more powerful and useful,
our aim is to design a web crawler to get several other
financial factors like, stock listings from authentic web
pages, generalization of the model for companies other than
banks, incorporate automatic entity linking and
disambiguation mechanism for knowledge graph (KG)
enrichment, financial Story generation for different type of
stakeholders that can aid decision making, exploit the
flexibility of open Information Extraction systems using
unsupervised learning or semi-supervised learning, and use
of readily available financial ontologies.

TABLE V.
STAKEHOLDER TYPE AND QUESTIONS

S.No Stakeholder Type Question


FIGURE 20. Sample Interface for QA

Investor/Prospective Which banks’ stock are safer to


A. CATEGORIZATION OF QUERIES WITH 1
Stock holder invest in?
STAKEHOLDERS’ PERSPECTIVE Investor/Prospective Which bank received Best
The summary table contains the stakeholder type, and the 2
Stock holder Investment Bank Award in YYYY?
related questions, which are mentioned in Table V. 3
Investor/Prospective Which bank pays the highest
Stock holder dividend to his stockholders?
VIII. CONCLUSIONS AND FUTURE WORK Investor/Prospective What is EarningsPerShare of Bank
4
Stock holder XXX?
This research proposed a novel approach for data extraction Name the bank having most after-
from Bank’s Annual reports for the population of Financial 5 Creditors/Rating Firms
tax profitability in year YYYY?
Knowledge graph. We discussed the techniques used for Banks whose Market Price per
information extraction, Ontology engineering procedure, Share @Year Start > @YearEnd
6 Investor/StockHolders
OR whose stock price has
Financial Knowledge graph creation, and Question increased this year
Answering mechanism for the graph. Which bank pays the highest
7 Investor/StockHolders
In most of the countries, financial regulatory body dividend to his stockholders?
enforces companies to publish their annual reports online. In List banks whose stock/share
8 Investor/StockHolders market price increased over last
our research, in spite of availability of the required dataset in three years?
this report, the format required extensive efforts due to the Name the bank having most before-
9 Creditors/Rating Firms
PDF format. The report had multiple sections and automatic tax profitability in YYYY?
extraction needed much effort and time. Additionally, entity Which bank paid the highest tax
10 Creditors/Rating Firms
amount in year YYYY?
identifiers and formats for the same type of data some-times Name the banks who’s profitability
differ between organizations. Rule-based extraction is 11 Creditors/Rating Firms has declined in YYYY compared
powerful but to handle multiple unforeseen patterns, our from the previous year.
model should use some other approach like statistical Which bank had the highest
12 Investor/StockHolders
dividend payout ratio?
learning, FIBO, etc. Also, to limit the research scope and Name the bank having highest
13 Creditors/Rating Firms
capital reserve ratio in YYYY.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

What is Profitability After Tax of [11] Jesús Barrasa, "RDF Triple Stores vs. Labeled Property Graphs:
14 Creditors/Rating Firms
XXXX(bank name) in YYYY? What’s the Difference?," 2017. [Online]. Available:
Creditors/Rating https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-
What was the long term rating of difference/.
15 Firms/Investor/StockH
ABL in YYYY?
olders [12] M. P. Muñoz, A. Llaves and T. Kume, "Populating the FLE
Creditors/Rating Financial Knowledge Graph.," in International Semantic Web
How many banks have rating Conference (P&D/Industry/BlueSky), 2018.
16 Firms/Investor/StockH
“Extremely Strong” in a long term?
olders [13] C. Leinemann, F. Schlottmann, D. Seese and T. Stuempert,
Creditors/Rating "Automatic extraction and analysis of financial data from the
17 Will UBL default in next year?
Firms/General Public EDGAR database," South African Journal of Information
Creditors/Rating Which banks have stable Management, vol. 3, no. 2, 2001.
18 Firms/Investor/StockH profitability ratios over a period of [14] T. Stümpert, "Extracting Financial Data from SEC Filings for US
olders time? GAAP Accountants," in Handbook on Information Technology in
Which bank Net Interest Income Finance, Springer, 2008, pp. 357-375.
19 Creditors/Borrowers
has increased this year? [15] M. Atzmueller, P. Kluegl and F. Puppe, "Rule-Based Information
20 Depositor Does MCB has Islamic Window? Extraction for Structured Data Acquisition using TextMarker.," in
Which bonds are foreign LWA, 2008.
21 General public [16] M. Göbel, T. Hassan, E. Oro and G. Orsi, "A methodology for
government bonds?
Investor/StockHolders/ Did UBL invested in evaluating algorithms for table understanding in PDF documents," in
22 Proceedings of the 2012 ACM symposium on Document
Creditors Manufacturing Sector in YYYY?
MCB invested in which Govt engineering, 2012.
23 Investor/StockHolders [17] T. Loughran and B. McDonald, "Textual analysis in accounting and
bonds this year?
What was UBL total investment in finance: A survey," Journal of Accounting Research, vol. 54, no. 4,
24 Investor/StockHolders pp. 1187-1230, 2016.
Bonds in YYYY?
25 General public HBL is located in which countries? [18] A. C. e Silva, A. Jorge and L. Torgo, "Automatic selection of table
areas in documents for information extraction," in Portuguese
Conference on Artificial Intelligence, 2003.
[19] D. C. Wimalasuriya and D. Dou, Ontology-based information
ACKNOWLEDGMENT extraction: An introduction and a survey of current approaches, Sage
Publications Sage UK: London, England.
We are extremely obliged and thankful to Mohammed Ali
[20] N. F. Noy, D. L. McGuinness and others, Ontology development
Jinnah University, Karachi, and Islamic University of 101: A guide to creating your first ontology.
Madinah, for partially funding this work. We would like to [21] E. Arendarenko and T. Kakkonen, "Ontology-based information and
thank the financial banks of local and foreign regions for event extraction for business intelligence," in International
providing access to their annual reports. Conference on Artificial Intelligence: Methodology, Systems, and
Applications, 2012.
[22] J. Pujara, "Extracting Knowledge Graphs from Financial Filings,"
REFERENCES ACM Reference, vol. 2, 2017.
[23] W. Shen, J. Wang, P. Luo and M. Wang, "A graph-based approach
[1] M. Sheikh and S. Conlon, "A rule-based system to extract financial for ontology population with named entities," in Proceedings of the
information," Journal of Computer Information Systems, vol. 52, no. 21st ACM international conference on Information and knowledge
4, pp. 10-19, 2012. management - CIK.
[2] A. S. Corrêa and P.-O. Zander, "Unleashing Tabular Content to Open [24] J. Pujara, H. Miao, L. Getoor and W. W. Cohen, "Ontology-aware
Data: A Survey on PDF Table Extraction Methods and Tools," in partitioning for knowledge graph identification," in Proceedings of
Proceedings of the 18th Annual International Conference on Digital the 2013 workshop on Automated knowledge base construction,
Government Research, 2017. 2013.
[3] R. Rastan, "Automatic Tabular Data Extraction and Understanding.," [25] J. Pujara, H. Miao, L. Getoor and W. W. Cohen, "Large-scale
2017.. knowledge graph identification using psl," in 2013 AAAI Fall
[4] J. Yan, C. Wang, W. Cheng, M. Gao and A. Zhou, "A retrospective Symposium Series, 2013.
of knowledge graphs," Frontiers of Computer Science, vol. 12, no. 1, [26] V. Lopez, P. Tommasi, S. Kotoulas and J. Wu, "QuerioDALI:
pp. 55-74, 2018. question answering over dynamic and linked knowledge graphs," in
[5] Q. Wang, Z. Mao, B. Wang and L. Guo, Knowledge graph International Semantic Web Conference, 2016.
embedding: A survey of approaches and applications, 2017. [27] Paul Nelson, "Natural Language Processing (NLP) Techniques for
[6] L. Ehrlinger and W. Wöß, "Towards a Definition of Knowledge Extracting Information," [Online]. Available:
Graphs.," in SEMANTiCS (Posters, Demos, SuCCESS), 2016. https://www.searchtechnologies.com/blog/natural-language-
[7] S. Choudhury, K. Agarwal, S. Purohit, B. Zhang, M. Pirrung, W. processing-techniques.
Smith and M. Thomas, "Nous: Construction and querying of [28] Z. Zhao, S.-K. Han and I.-M. So, "Architecture of Knowledge Graph
dynamic knowledge graphs," in Data Engineering (ICDE), 2017 Construction Techniques," 2018.
IEEE 33rd International Conference on, 2017. [29] H. Paulheim, "Knowledge graph refinement: A survey of approaches
[8] K. Veel, "Make data sing: The automation of storytelling," Big Data and evaluation methods," Semantic web, vol. 8, no. 3, pp. 489-508,
& Society, vol. 5, no. 1, p. 2053951718756686, 2018. 2017.
[9] R. Angles, "A comparison of current graph database models," in [30] J. R. Trigueros, "Extracting earnings information from financial
2012 IEEE 28th International Conference on Data Engineering statements via genetic algorithms," in Computational Intelligence for
Workshops, 2012. Financial Engineering, 1999.(CIFEr) Proceedings of the IEEE/IAFE
[10] UTPAL BHATT, "Addressing Key Challenges in Financial Services 1999 Conference on, 1999.
with Neo4j," [Online]. Available: https://neo4j.com/resources- [31] Jo Stichbury, "WTF is a knowledge graph?," 2017. [Online].
old/neo4j-financial-services-white-paper/. Available: https://hackernoon.com/wtf-is-a-knowledge-graph-
a16603a1a25f.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

[32] Kristin Veel, "Make data sing: The automation of storytelling," vol. 30, no. 5, pp. 824-837, 1 May 2018, doi:
[Online]. Available: https://journals.sagepub.com/doi/abs/10.1177/2. 10.1109/TKDE.2017.2766634.
[33] B. Yildiz, K. Kaiser and S. Miksch, "pdf2table: A method to extract [51] Verhoef, P., Kooge, E., Walk, N. (2015). Creating Value with Big
table information from pdf files," in IICAI, 2005. Data Analytics. London: Routledge,
[34] A. Shigarov, A. Mikhailov and A. Altaev, "Configurable Table https://doi.org/10.4324/9781315734750
Structure Recognition in Untagged PDF Documents," in Proceedings [52] lemon: An Ontology-Lexicon model for theMultilingual Semantic
of the 2016 ACM Symposium on Document Engineering, New York, Web [Online]. Available:
NY, USA, 2016.. https://www.w3.org/International/multilingualweb/madrid/slides/decl
[35] M. Rospocher, M. Van Erp, P. Vossen, A. Fokkens, I. Aldabe, G. erck.pdf Accessed on: Jan 19, 2021.
Rigau, A. Soroa, T. Ploeger and T. Bogaard, "Building event-centric [53] MCB Bank Limited, Financial Statements For the year ended
knowledge graphs from news," Journal of Web Semantics, 2016. December 31, 2017. [Online]. Available:
[36] R. Rastan, H.-Y. Paik and J. Shepherd, "TEXUS: a task-based https://www.mcb.com.pk/assets/Accounts%20Dec%202017.pdf
approach for table extraction and understanding," in Proceedings of Accessed on: Jan 19, 2021.
the 2015 ACM Symposium on Document Engineering, 2015. [54] United Bank Limited, 2017 Annual Report. [Online]. Available:
[37] J. Pujara, H. Miao, L. Getoor and W. Cohen, "Knowledge graph https://www.ubldirect.com/corporate/resources/ubl/aboutus/financial
identification," in International Semantic Web Conference, 2013. _report/report_2017/UBLAnnualReport2017.pdf Accessed on: Jan
[38] W. Liu, J. Liu, M. Wu, S. Abbas, W. Hu, B. Wei and Q. Zheng, 19, 2021.
"Representation learning over multiple knowledge graphs for
knowledge graphs alignment," Neurocomputing, vol. 320, pp. 12-24,
2018.
[39] G. Ji, S. He, L. Xu, K. Liu and J. Zhao, "Knowledge Graph
Embedding via Dynamic Mapping Matrix," in Proceedings of the
53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural
Language Processing (Volume 1: Long Papers), 2015.
[40] S. Zwicklbauer, C. Seifert and M. Granitzer, "DoSeR - A knowledge-
base-agnostic framework for entity disambiguation using semantic
embeddings," in Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics), 2016.
[41] Neo4j, "White Paper: The Power of Graph-Based Search," 2015.
[42] Y. Jia, Y. Qi, H. Shang, R. Jiang and A. Li, "A Practical Approach to
Constructing a Knowledge Graph for Cybersecurity," Engineering,
vol. 4, no. 1, pp. 53-60, 2018.
[43] Wang Z., Guo M., Li Z., Tang M., Yu J. (2020), “Knowledge Graph
Construction for Payment Data Risk Control,” In: Yang CT., Pei Y.,
Chang JW. (eds) Innovative Computing. Lecture Notes in Electrical
Engineering, vol 675. Springer, Singapore, pp 1901-1907, 2020.
[44] Astrova I. (2020), “How the Anti-TrustRank Algorithm Can Help to
Protect the Reputation of Financial Institutions,” In: Dalpiaz F.,
Zdravkovic J., Loucopoulos P. (eds) Research Challenges in
Information Science. RCIS 2020. Lecture Notes in Business
Information Processing, vol 385. Springer, Cham, pp. 503–508
[45] Jiangtao Ren, Jiawei Long, Zhikang Xu, “Financial news
recommendation based on graph embeddings,” Decision Support
Systems, vol 125, October 2019, 113115.
[46] Yun H., He Y., Lin L., Pan Z., Zhang X. (2019), “Construction
Research and Application of Poverty Alleviation Knowledge Graph,”
In: Ni W., Wang X., Song W., Li Y. (eds) Web Information Systems
and Applications, WISA 2019. Lecture Notes in Computer Science,
vol 11817. Springer, Cham, pp. 430-442
[47] N. Yerashenia and A. Bolotov, "Computational Modelling for
Bankruptcy Prediction: Semantic Data Analysis Integrating Graph
Database and Financial Ontology," 2019 IEEE 21st Conference on
Business Informatics (CBI), Moscow, Russia, 2019, pp. 84-93, doi:
10.1109/CBI.2019.00017.
[48] Z. Chen, S. Yin and X. Zhu, "Research and Implementation of QA
System Based on the Knowledge Graph of Chinese Classic Poetry,"
2020 IEEE 5th International Conference on Cloud Computing and
Big Data Analytics (ICCCBDA), Chengdu, China, 2020, pp. 495-
499, doi: 10.1109/ICCCBDA49378.2020.9095587.
[49] Wang, Shaokai and Wang, Ziao and Zhang, Xiaofeng,
“基于知识图谱的智能投资决策 (Intelligent Investment Decision
Based on Knowledge Graph),” 2019. Available at:
https://ssrn.com/abstract=3449419 (Accessed: 09 November, 2020).
[50] S. Hu, L. Zou, J. X. Yu, H. Wang and D. Zhao, "Answering Natural
Language Questions by Subgraph Matching over Knowledge
Graphs," in IEEE Transactions on Knowledge and Data Engineering,

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077916, IEEE Access

MS. SAMREEN ZEHRA received the B.S. DR. SYED IMRAN JAMI received his BS in
degree in computer science from University of Computer Science from the University of Karachi
Karachi, Karachi, Pakistan, in 2005 and the M.S. in 2000, MS in Computer Science from Lahore
degree in computer science from Mohammad Ali University of Management Sciences in 2004 and
Jinnah University, Karachi, Pakistan, in 2019. PhD in Computer Science from the National
Since 2008, she has been a Senior Software University of Computer & Emerging Sciences in
Engineer with the National Bank of Pakistan, 2011.
Pakistan. She has a vast experience in application He is one of the founding members of the Centre
development using the best technical practices for Research in Ubiquitous Computing and
and her research interest includes efficient use of associated with it since 2006. He also worked
knowledge management and retrieval techniques to facilitate decision with Haptics Research Lab and Pervasive and
making for the stakeholders in the financial industry. Networked Systems Research Group at Deakin
MS. ZEHRA former conference paper was about Ontology-based University, Australia.
Sentiment Analysis. She has been also a member of Australian Computer Dr Jami has authored sixteen journal papers and ten conference papers. He
Society since July 2018. worked on several funded research projects and supervised 12 Graduate
and Doctoral students.

SYED FARHAN MOHSIN was born in DR. MUHAMMAD SHOAIB SIDDIQUI


Karachi, Pakistan, in 1976. He received the received the B.S. degree from the Department of
M.C.S. degree from Al-Khair University (AJK) Computer Sciences, University of Karachi, in
and M.S degree in Computer Science from the 2004, and the M.S. and Ph.D. degrees in
PAF KIET, Karachi, in 2013 and now enrolled in Computer Engineering from Kyung Hee
the Ph.D. degree program in Computer Science University, South Korea, in 2008 and 2012,
from Muhammad Ali Jinnah University (MAJU), respectively.
Karachi, Pakistan, from 2017. His research interests include routing, security,
Since 2011, he is working as a Web Developer in and management in wireless networks, sensor
Dow University of Health Sciences. He has networks, IP traceback, secure provenance,
published his recent research paper “A Survey on blockchain technologies and remote monitoring
Distributed Information Systems using Semantic using the IoT.
Web Techniques” in NUST Journal of Dr. Siddiqui is a member of IEEE and ACM.
Engineering Sciences, in 2019. His first publication “Web based Currently, he is an Associate Professor at the Islamic University of
multimedia recommendation system for e-learning website” was published Madinah, Kingdom of Saudi Arabia.
in 2010. 9, he

DR. SHAUKAT WASI did his intermediate DR. MUHAMMAD KHALIQ-UR-RAHMAN


from Cadet College Petaro in 1998. He RAAZI SYED (M’15) is an Associate Professor
completed his graduation in computer science and Head of the Department in the department of
from University of Karachi. Then he joined Computer Science at Mohammad Ali Jinnah
FAST-National University of Computer and University since 2016. Also, he is a member (M)
Emerging Sciences (NUCES) for higher studies of IEEE since 2015. Previously, he worked as
and did his Masters in Computer Science and Assistant Professor in Karachi Institute of
PhD in Computer Science from there. He started Economics and Technology from 2011 to 2016.
his professional career at FAST. He was one of He completed his PhD in Computer Engineering
the founding faculty members of Computer from Kyung Hee University, South Korea in
Science Department at DHA Suffa University, 2010. Before that, he did his MS in Computer
Karachi. Currently he is Associate professor and Associate Dean, Faculty Science from Lahore University of Management
of Computing (FOC), at Mohammad Ali Jinnah University (MAJU), Sciences (LUMS) in 2006 and BE in Computer Software from National
Karachi. University of Sciences and Technology (NUST) in 2004.
He has his expertise in Text Classification and Mining, Information His research interests include Information Security, Internet of Things,
Retrieval and Extraction, and Human Computer Interaction. He is heading Content Based Communications, Ad Hoc Networks and Architectures for
the Interactive and Intelligent Natural Language Processing (IINLP) Engineering Applications.
research group at FOC, MAJU. He has published 19 publications in local
and international conferences and Journals. He is honored to be a Program
Evaluator for National Computing Education Accreditation Council
(NCEAC), Pakistan.
Under the guidance of Dr. Wasi, FOC at MAJU, Karachi has planned an
international computing conference in collaboration with IEEE Karachi
section in 2021.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like