A Gated Recurrent Unit Approach To Bitcoin

Journal of
Risk and Financial

Management
Article
A Gated Recurrent Unit Approach to Bitcoin
Price Prediction
Aniruddha Dutta 1, *, Saket Kumar 1,2 and Meheli Basu 3
1 Haas School of Business, University of California, Berkeley, CA 94720, USA; saket_kumar@berkeley.edu
2 Reserve Bank of India, Mumbai, Maharashtra 400001, India
3 Joseph M. Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA 15260, USA;
meheli.basu@pitt.edu
* Correspondence: aniruddha_dutta@berkeley.edu

Received: 21 December 2019; Accepted: 27 January 2020; Published: 3 February 2020
Abstract: In today’s era of big data, deep learning and artificial intelligence have formed the backbone
for cryptocurrency portfolio optimization. Researchers have investigated various state of the art
machine learning models to predict Bitcoin price and volatility. Machine learning models like
recurrent neural network (RNN) and long short-term memory (LSTM) have been shown to perform
better than traditional time series models in cryptocurrency price prediction. However, very few
studies have applied sequence models with robust feature engineering to predict future pricing. In
this study, we investigate a framework with a set of advanced machine learning forecasting methods
with a fixed set of exogenous and endogenous factors to predict daily Bitcoin prices. We study and
compare different approaches using the root mean squared error (RMSE). Experimental results show
that the gated recurring unit (GRU) model with recurrent dropout performs better than popular
existing models. We also show that simple trading strategies, when implemented with our proposed
GRU model and with proper learning, can lead to financial gain.
Keywords: Bitcoin; trading strategy; artificial intelligence; cryptocurrency; neural networks; time
series analysis; deep learning; predictive model; risk management
1. Introduction
Bitcoin was first launched in 2008 to serve as a transaction medium between participants
without the need for any intermediary (Nakamoto 2008; Barrdear and Kumhof 2016). Since 2017,
cryptocurrencies have been gaining immense popularity, thanks to the rapid growth of their market
capitalization (ElBahrawy et al. 2017), resulting in a revenue of more than $850 billion in 2019. The
digital currency market is diverse and provides investors with a wide variety of different products. A
recent survey (Hileman and Rauchs 2017) revealed that more than 1500 cryptocurrencies are actively
traded by individual and institutional investors worldwide across different exchanges. Over 170
hedge funds, specialized in cryptocurrencies, have emerged since 2017 and in response to institutional
demand for trading and hedging, Bitcoin’s futures have been rapidly launched (Corbet et al. 2018).
The growth of virtual currencies (Baronchelli 2018) has fueled interest from the scientific community
(Barrdear and Kumhof 2016; Dwyer 2015; Bohme et al. 2015; Casey and Vigna 2015; Cusumano 2014;
Krafft et al. 2018; Rogojanu and Badeaetal 2014; White 2015; Baek and Elbeck 2015; Bech and Garratt
2017; Blau 2017; Dow 2019; Fama et al. 2019; Fantacci 2019; Malherbe et al. 2019). Cryptocurrencies
have faced periodic rises and sudden dips in specific time periods, and therefore the cryptocurrency
trading community has a need for a standardized method to accurately predict the fluctuating price
trends. Cryptocurrency price fluctuations and forecasts studied in the past (Poyser 2017) focused on the
analysis and forecasting of price fluctuations, using mostly traditional approaches for financial markets
J. Risk Financial Manag. 2020, 13, 23; doi:10.3390/jrfm13020023 www.mdpi.com/journal/jrfm

J. Risk Financial Manag. 2020, 13, 23 2 of 16
analysis and prediction (Ciaian et al. 2016; Guo and Antulov-Fantulin 2018; Gajardo et al. 2018; Gandal
and Halaburda 2016). Sovbetov (2018) observed that crypto market-related factors such as market beta,
trading volume, and volatility are significant predictors of both short-term and long-term prices of
cryptocurrencies. Constructing robust predictive models to accurately forecast cryptocurrency prices
is an important business challenge for potential investors and government agencies. Cryptocurrency
trading is actually a time series forecasting problem, and due to high volatility, it is different from price
forecasting in traditional financial markets (Muzammal et al. 2019). Briere et al. (2015) found that
Bitcoin shows extremely high returns, but is characterized by high volatility and low correlation to
traditional assets. The high volatility of Bitcoin is well-documented (Blundell-Wignall 2014; Lo and
Wang 2014). Some econometric methods have been applied to predict Bitcoin volatility estimates such
as (Katsiampa 2017; Kim et al. 2016; Kristoufek 2015).
Traditional time series prediction methods include univariate autoregressive (AR), univariate
moving average (MA), simple exponential smoothing (SES), and autoregressive integrated moving
average (ARIMA) (Siami-Namini and Namin 2018). Kaiser (2019) used time series models to investigate
seasonality patterns in Bitcoin trading (Kaiser 2019). While seasonal ARIMA or SARIMA models
are suitable to investigate seasonality, time series models fail to capture long term dependencies in
the presence of high volatility, which is an inherent characteristic of a cryptocurrency market. On
the contrary, machine learning methods like neural networks use iterative optimization algorithms
like “gradient descent” along with hyper parameter tuning to determine the best fitted optima
(Siami-Namini and Namin 2018). Thus, machine learning methods have been applied for asset
price/return prediction in recent years by incorporating non-linearity (Enke and Thawornwong 2005;
Huang et al. 2005; Sheta et al. 2015; Chang et al. 2009) with prediction accuracy higher than traditional
time series models (McNally et al. 2018; Siami-Namini and Namin 2018). However, there is a dearth of
machine learning application in the cryptocurrency price prediction literature. In contrast to traditional
linear statistical models such as ARMA, the artificial intelligence approach enables us to capture the
non-linear property of the high volatile crypto-currency prices.
Examples of machine learning studies to predict Bitcoin prices include random forests
(Madan et al. 2015), Bayesian neural networks (Jang and Lee 2017), and neural networks
(McNally et al. 2018). Deep learning techniques developed by Hinton et al. (2006) have been used
in literature to approximate non-linear functions with high accuracy (Cybenko 1989). There are a
number of previous works that have applied artificial neural networks to financial investment problems
(Chong et al. 2017; Huck 2010). However, Pichl and Kaizoji (2017) concluded that although neural
networks are successful in approximating Bitcoin log return distribution, more complex deep learning
methods such as recurrent neural networks (RNNs) and long short-term memory (LSTM) techniques
should yield substantially higher prediction accuracy. Some studies have used RNNs and LSTM to
forecast Bitcoin pricing in comparison with traditional ARIMA models (McNally et al. 2018; Guo and
Antulov-Fantulin 2018). McNally et al. (2018) showed that RNN and LSTM neural networks predict
prices better than traditional multilayer perceptron (MLP) due to the temporal nature of the more
advanced algorithms. Karakoyun and Çıbıkdiken (2018), in comparing the ARIMA time series model
to the LSTM deep learning algorithm in estimating the future price of Bitcoin, found significantly lower
mean absolute error in LSTM prediction.
In this paper, we focus on two aspects to predict Bitcoin price. We consider a set of exogenous
and endogenous variables to predict Bitcoin price. Some of these variables have not been investigated
in previous research studies on Bitcoin price prediction. This holistic approach should explain whether
Bitcoin is a financial asset. Additionally, we also study and compare RNN models with traditional
machine learning models and propose a GRU architecture to predict Bitcoin price. GRU’s train faster
than traditional RNN or LSTM and have not been investigated in the past for cryptocurrency price
prediction. In particular, we developed a gated recurring unit (GRU) architecture that can learn the
Bitcoin price fluctuations more efficiently than the traditional LSTM. We compare our model with
a traditional neural network and LSTM to check the robustness of the architecture. For application
J.
J. Risk
Risk Financial
Financial Manag.
Manag. 2020,
2020, 13,
13, x23FOR PEER REVIEW 33 of
of 17
16
architecture. For application purposes in algorithmic trading, we implemented our proposed

J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 3 of 17
purposes in algorithmic
architecture to test two simple trading,trading
we implemented
strategies for ourprofitability.
proposed architecture to test two simple trading
strategies
architecture.for profitability.
For application purposes in algorithmic trading, we implemented our proposed
2. Methodology
architecture to test two simple trading strategies for profitability.
2. Methodology
A survey of the current literature on neural networks, reveals that traditional neural networks
2. Methodology
have A survey of theincurrent
shortcomings effectivelyliterature
usingon neural
prior networks,for
information reveals
futurethat traditional
predictions (Wangneural et networks
al. 2015).
have shortcomings
RNN isAa survey class ofofneuralin effectively
the currentnetworks using
which
literature prior
onuses information
neuraltheir for
internal reveals
networks, future
state memory predictions (Wang
for processing
that traditional et al. 2015).
sequences.
neural networks
RNNhave
However,is shortcomings
a class
RNNs of on
neural
their networks
in effectively
own areusing which usesinformation
prior
not capable their internal
of learning state
forlong-term
future memory
predictions for processing
dependencies (Wangand et al.sequences.
they2015).
often
However,
RNN
suffer from RNNs
is a short-termon
class of neuraltheir
memory.own
networks are
Withnot
which capable
longuses of learning
their internal
sequences, especiallylong-term
state in memory dependencies
time series for processing and
modellingsequences. they often
and textual
suffer
analysis,from
However, short-term
RNNs RNNs suffer memory.
on from
their own With
vanishingare notlong sequences,
capable
gradient especially
of learning
problems long-term
during in time
back series modelling
dependencies
propagation and they
(Hochreiterand often
textual
1998;
analysis,
suffer RNNs
from suffer
short-term from vanishing
memory. With gradient
long problems
sequences, during
especially
Pascanu et al. 2013). If the gradient value shrinks to a very small value, then the RNNs fail to learn inback
time propagation
series modelling (Hochreiter
and textual1998;
analysis,
Pascanu et RNNs
al. 2013).suffer
If thefrom vanishing
gradient valuegradient
shrinks problems
to a very
longer past sequences, thus having short-term memory. Long short-term memory (Hochreiter and during
small back propagation
value, then the (Hochreiter
RNNs fail 1998;
to learn
Pascanu
longer
Schmidhuberpastetsequences,
al. 2013).
1997), is anIfthus
the
RNN gradient
having value shrinks
short-term
architecture to a very
withmemory.
feedback Longsmall
connections, value,designed
short-term then the RNNs
memory fail tothe
(Hochreiter
to regulate learnand
flow
longer
Schmidhuber past sequences,
1997), is an thus
RNN having short-term
architecture memory.
with feedback
of information. LSTMs are a variant of the RNN that are explicitly designed to learn long-termLong short-term
connections, memory
designed (Hochreiter
to regulate and the
Schmidhuber
flow of information. 1997),LSTMsis an RNN are architecture
a variant of with
the feedback
RNN that connections,
are explicitly designed
designed to regulate
to learn the flow
long-term
dependencies. A single LSTM unit is composed of an input gate, a cell, a forget gate (sigmoid layer
of information.
dependencies. LSTMs
A single LSTM are aunit variant of the RNN
is composed an that are explicitly designed to (sigmoid
learn long-term
and a tanh layer), and an output gate (Figure 1).ofThe input
gates gate,
control a cell,
the aflow
forget of gate
information inlayer
and andout
a dependencies.
tanh layer), and A ansingle
output LSTM
gate unit is
(Figure composed
1). The of
gates an input
control gate,
the a
flowcell,of a forget
information gate (sigmoid
in and outlayer
of the
of the LSTM cell. LSTMs are best suited for time-series forecasting. In the forget gate, the input from
and
LSTM a tanh layer),
cell. LSTMs and an output gate (Figure 1). The gates control the flow of information in and out
the previous hiddenare statebest suited for
is passed time-series
through a sigmoidforecasting.
function In alongthe with
forget the gate,
input thefrominput thefrom the
current
of the LSTM cell. LSTMs are best suited for time-series forecasting. In the forget gate, the input from
previous hidden state is passed through a sigmoid function along
state to generate forget gate output 𝑓 . The sigmoid function regulates values between 0 and 1; values with the input from the current state
the previous hidden state is passed through a sigmoid function along with the input from the current
to generate
closer forget gate output ft . The sigmoid function areregulates values between 0 and 1; values closer
statetoto0generate
are discarded
forget gate and output
only values
𝑓 . The closer to 1function
sigmoid considered.
regulatesThe input
values gate
between is0used
and 1;tovalues
update
to 0 cell
the are discarded andfrom only values closer to 1 are considered. The input gateareisgateused to update the cell
closerstate.
to 0 are Values
discarded theonly
and previous
valueshidden
closer tostate
1 areand current
considered. state
The input simultaneously
is used to update passed
state.
through Values from the previous hidden state and current state are simultaneously
andare 𝑐̃ ) simultaneouslypassed through a
the cella state.
sigmoid Valuesfunctionfrom and a tanh function,
the previous hidden stateand theandoutput
current(𝑖state from the two activation passed
sigmoid
functions
throughare
function
a sigmoid
and
multiplied. a tanh
In this
function
function,
andprocess, and
a tanhthe
the output
sigmoid
function, and
(i
function
t
the
and c )
decides
output
e t from
(𝑖 and
the
which two activation
information
𝑐̃ ) from the two is
functions
important
activation
are
multiplied.
to keep from In this
the process,
tanh output. the sigmoid function decides which
functions are multiplied. In this process, the sigmoid function decides which information is importantinformation is important to keep from
thetotanh output.
keep from the tanh output.
Figure 1. Architecture
Figure Architectureofofa along
Architecture
1. short-term
longshort-term memory
short-termmemory (LSTM)
memory (LSTM)
(LSTM) cell.
cell. +:
cell.+:
+:“plus”
“plus” operation;◉:◉:
“plus”operation; : “Hadamard
“Hadamard
“Hadamard
product” operation; 𝜎:
σ: “sigmoid”
“sigmoid” function;
function; tanh:
tanh: “tanh”
“tanh” function.
function.
product” operation; 𝜎: “sigmoid” function; tanh: “tanh” function.
The previous
The previous cell
cellstate
statevalue
valueisismultiplied
multiplied with the forget
with the forgetgate
gateoutput
outputand andthen
then added
added pointwise
pointwise
with thethe
with output
outputfrom
fromthe input
theinput gate
inputgate to
gateto generate
togenerate
generate the the new
the newcell
cell state𝑐𝑐ct, , as
cellstate
state as shown
showninin
asshown Equation
inEquation
Equation (1).
(1). The
The
output gate operation consists of two steps:
output gate operation consists of two steps: first, first,
first, the
the previous
previous hidden state and current
previous hidden state and current input values input values
areare
passed
passedthrough
througha asigmoid function;
sigmoidfunction;
function;andand
and secondly,
secondly, thethe last
last obtained
obtainedcell cellstate
statevalues
valuesareare passed
passed
through
through a tanh
a tanh function. Finally,
function.Finally,
Finally,the
thetanh
tanhoutput
output and the the sigmoid
sigmoidoutput
outputare aremultiplied
multiplied toto
produce
produce
thethe
new new hidden
hidden state, whichisiscarried
state,which over to
carriedover to the
the next
next step. Thus,the
step. Thus,
Thus, theforget
the forgetgate,
gate,input
inputgate,
gate,and
and
output
output
output gate
gate decide
decide what
what information
information to
to forget,
forget, what information
information
decide what information to forget, what information to add to
to add
add from
from the
thecurrent
currentstep, and
step, and
what
what information
information toto carryforward
carry forwardrespectively.
respectively.
respectively.
𝑐c𝑐t = 𝑓𝑓ft...c𝑐𝑐t−1 + i𝑖𝑖t .e
.. 𝑐̃𝑐̃
ct
(1)(1)
(1)
GRU, introduced by Cho et al. (2014), solves the problem of the vanishing gradient with a standard
RNN. GRU is
J. Risk Financial similar
Manag. 2020,to13,
LSTM, but itREVIEW
x FOR PEER combines the forget and the input gates of the LSTM into a 3single of 17
update gate. The GRU further merges the cell state and the hidden state. A GRU unit consists of a
cell containing
architecture. Formultiple operations
application purposeswhich inare repeated and
algorithmic each ofwe
trading, theimplemented
operations could our be a neural
proposed
network. Figure 2 below shows the structure of
architecture to test two simple trading strategies for profitability. a GRU unit consisting of an update gate, reset gate,
and a current memory content. These gates enable a GRU unit to store values in the memory for
2.a Methodology
certain amount of time and use these values to carry information forward, when required, to the
GRU, introduced by Cho et al. (2014), solves the problem of the vanishing gradient with a
current state to update at a future date. In Figure 2 below, the update gate is represented by zt , where
A survey
standard RNN. ofGRU
the current
is similarliterature
to LSTM, on but
neural networks,
it combines thereveals
forget that
and traditional
the input gates neuralof networks
the LSTM
at each step, the input xt and the output from the previous unit ht−1 are multiplied by the weight Wz
have shortcomings in effectively using prior information for
into a single update gate. The GRU further merges the cell state and the hidden state. A GRUfuture predictions (Wang et al. 2015).
unit
and added together, and a sigmoid function is applied to get an output between 0 and 1. The update
RNN
consistsis aofclass ofcontaining
a cell neural networksmultiple which uses their
operations whichinternal state memory
are repeated and each for ofprocessing
the operationssequences.
could
gate addresses the vanishing gradient problem as the model learns how much information to pass
However,
be a neuralRNNs on their
network. Figure own are not
2 below capable
shows of learning
the structure of long-term
a GRU unitdependencies
consisting of and they often
an update gate,
forward. The reset gate is represented by rt in Equation (2), where a similar operation as input gate is
suffer from short-term
reset gate, and a current memory.
memory Withcontent.
long sequences,
These gates especially
enableina time GRUseries
unit modelling
to store valuesand textual
in the
carried out, but this gate in the model is used to determine how much of the past information to forget.
analysis,
memory RNNs suffer from
for a certain amountvanishing
of time gradient
and0 use problems duringtoback
these values carry propagation
information (Hochreiter
forward, when 1998;
The current memory content is denoted by ht , where xt is multiplied by W and rt is multiplied by ht−1
Pascanu
required,etto al.the
2013). If thestate
current gradient value shrinks
to update at a futureto a very
date.small value,2then
In Figure below, the the
RNNs fail togate
update learn is
element wise (Hadamard product operation) to pass only the relevant information. Finally, a tanh
longer
represented by 𝑧 , where
past sequences, thusat having
each step, short-term 𝑥 and the
the inputmemory. Long short-term
output from the memoryprevious unit ℎ and
(Hochreiter are
activation function is applied to the summation. The final memory in the GRU unit is denoted by ht ,
Schmidhuber
multiplied by1997), the weight 𝑊 and
is an RNN architecture with feedback
added together, connections,
and a sigmoid function designed
is applied to regulate
to get an the flow
output
which holds the information for the current unit and passes it on to the network. The computation in
of information.
between 0 and 1.LSTMs
The update are agate
variant of thethe
addresses RNN that are
vanishing explicitly
gradient designed
problem as thetomodel
learn learns
long-term how
the final step is given in Equation (2) below. As shown in Equation (2), if zt is close to 0 ((1 − zt ) close to
dependencies.
much information A single
to passLSTM unit is composed
forward. The reset gate of anisinput gate, a cell,
represented by a𝑟 forget gate (sigmoid
in Equation (2), where layera
1), then most of the current content will be irrelevant and the network will pass the majority of the past
and a tanh
similar layer), and
operation an output
as input gate isgate (Figure
carried 1).but
out, Thethis
gatesgatecontrol
in thethe flow is
model ofused
information in andhow
to determine out
information and vice versa.
of the LSTM
much of the cell.
pastLSTMs
informationare bestto suited
forget.for Thetime-series
current memory forecasting. In the
content forget gate,
is denoted by ℎthe input from
, where 𝑥 is
z = σ(Wz .[ht−1 , xt ])
the previousby
multiplied hidden
W andstate 𝑟 isis multiplied
passed through by tℎ a sigmoid
element function along withproduct
wise (Hadamard the inputoperation)
from the current
to pass
state
only to thegenerate
relevantforget gate output
information. 𝑓 . The
Finally, = σactivation
sigmoid
arttanh (Wr .function t ])regulates
[ht−1 , xfunction values between
is applied 0 and 1; values
to the summation. The
closer to 0 are discarded
final memory in the GRUand unitonly
is denoted by ℎ ,to
values closer 1 areholds
which considered. The inputfor
the information gatetheiscurrent
used tounit updateand
ht = tan h(W.[rt .ht−1, xt ])
e
the cell itstate.
passes on toValues from theThe
the network. previous hiddeninstate
computation the and
finalcurrent
step is state
givenare simultaneously
in Equation (2) below. passedAs
shown in Equation (2), if 𝑧 is close to h0t ((1
through a sigmoid function and a tanh function,
= (−1 −𝑧 )ztclose
and the
).ht−1to+1),output
zt .hthen
t (𝑖 and 𝑐̃ ) from the two
most of the current content will (2) activation be
functions
irrelevantare andmultiplied.
the network In this
willprocess,
pass thethe sigmoid
majority of function
the past decides
information which and information
vice versa.is important
to keep from the tanh output.
Figure
Figure1.2.Architecture
Architectureofofa long short-term
a gated memory
recurring (LSTM)
unit (GRU) cell.+:+: “plus”
unit. operation; ◉:: “Hadamard
“plus” operation; “Hadamard
Figure 2. Architecture of a gated recurring unit (GRU) unit. +: “plus” operation; ◉: “Hadamard
product” operation; 𝜎:
product”operation; σ: “sigmoid” function; tanh: “tanh”
“tanh” function.
function.
product” operation; 𝜎: “sigmoid” function; tanh: “tanh” function.
Bothprevious
The LSTM andcell GRU are efficient
state value at addressing
is multiplied with thethe problem
forget of vanishing
gate output and thengradient
addedthat occurs
pointwise
in long sequence models. GRUs have fewer
with the output from the input gate to generate 𝑧 tensor
𝜎 𝑊the. new 𝑥 state 𝑐 , as shown in Equation (1). The
operations
ℎ ,cell and are speedier to train than LSTMs
(Chunggate
output et al. 2014). consists
operation The neural network
of two models
steps: first, the considered for the
previous hidden Bitcoin
state price prediction
and current are
input values
simple
are neural
passed network
through (NN), function;
a sigmoid LSTM, and 𝑟 GRU.
and 𝜎 𝑊The. ℎ neural
secondly, , 𝑥 last
the networks
obtainedwere trained
cell state withare
values optimized
passed
hyperparameters
through and tested
a tanh function. on the
Finally, thetest
tanhset.output
Finally, thethe
and best performing
sigmoid outputmodel with lowesttoroot
are multiplied mean
produce
ℎ tanh 𝑊. 𝑟 . ℎ , 𝑥
squared
the error (RMSE)
new hidden value was
state, which considered
is carried over toforthe
portfolio strategy
next step. Thus,execution.
the forget gate, input gate, and
ℎ forget,
output gate decide what information to 𝑧 . ℎ information
1 what 𝑧 .ℎ to add from the current step, and
(2)
what information to carry forward respectively.
Both LSTM and GRU are efficient at addressing the problem of vanishing gradient that occurs
in long sequence models. GRUs have fewer𝑐 tensor
𝑓 . 𝑐 operations
𝑖 . 𝑐̃ (1)
and are speedier to train than LSTMs
(Chung et al. 2014). The neural network models considered for the Bitcoin price prediction are simple

3. Data Collection and Feature Engineering
3.Data
Datafor
Collection andstudy
the present Feature
wasEngineering
collected from several sources. We have selected features that may
be drivingData Bitcoin
for theprices
present and have
study wasperformed feature
collected from engineering
several sources. We to obtain independent
have selected featuresvariables
that
for may
future beprice
driving prediction.
Bitcoin pricesBitcoin
and prices are drivenfeature
have performed by a engineering
combinationtoofobtain various endogenous
independent
andvariables
exogenous for factors
future price
(Bouriprediction. Bitcoin
et al. 2017). prices
Bitcoin are series
time drivendata by ain combination
USD were obtainedof variousfrom
endogenous andThe
bitcoincharts.com. exogenous factors
key features (Bouri et al.
considered in2017). Bitcoinstudy
the present time series
were data in USD
Bitcoin price,were obtained
Bitcoin daily lag
from bitcoincharts.com. The key features considered in the present study were
returns, price volatility, miners’ revenue, transaction volume, transaction fees, hash rate, money supply,Bitcoin price, Bitcoin
daily
block size,lag
andreturns, price volatility,
Metcalfe-UTXO. miners’ revenue,
Additional features oftransaction volume, transaction
broader economic and financial fees,indicators
hash rate, that
money supply, block size, and Metcalfe-UTXO. Additional features of broader economic and
may impact the prices are interest rates in the U.S. treasury bond-yields, gold price, VIX volatility
financial indicators that may impact the prices are interest rates in the U.S. treasury bond-yields, gold
index, S&P dollar returns, U.S. treasury bonds, and VIX volatility data were used to investigate the
price, VIX volatility index, S&P dollar returns, U.S. treasury bonds, and VIX volatility data were used
characteristics
to investigate of Bitcoin investors. Moving
the characteristics average
of Bitcoin convergence
investors. Moving divergence (MACD) was
average convergence constructed
divergence
to explore
(MACD) how
wasmoving
constructedaverages canhow
to explore predict
movingfuture Bitcoin
averages canprices.
predictThe roles
future of Bitcoin
Bitcoin prices.as a roles
The financial
asset, medium of exchange, and as a hedge have been studied in the past (Selmi
of Bitcoin as a financial asset, medium of exchange, and as a hedge have been studied in the past et al. 2018; Dyhrberg
2016). Dyhrberg
(Selmi (2016)
et al. 2018; proved2016).
Dyhrberg that Dyhrberg
there are several similarities
(2016) proved of Bitcoin
that there withsimilarities
are several that of gold and dollar
of Bitcoin
indicating
with that short
of goldtermand hedging capabilities.
dollar indicating shortSelmi et al. (2018)
term hedging studiedSelmi
capabilities. the role
et al.of(2018)
Bitcoin and gold
studied the in
role of
hedging Bitcoinoil
against and gold
price in hedgingand
movements against oil pricethat
concluded movements
Bitcoin can andbeconcluded that Bitcoin canand
used for diversification be for
riskused for diversification
management purposes. and for risk management purposes.
TheThe speculativebubble
speculative bubbleinincryptocurrency
cryptocurrency markets
marketsare areoften
oftendriven
driven bybyinternet search
internet and and
search
regulatory actions by different countries (Filippi 2014). In that aspect, internet
regulatory actions by different countries (Filippi 2014). In that aspect, internet search data can search data can be be
considered important for predicting Bitcoin future prices (Cheah and John 2015; Yelowitz and Wilson
considered important for predicting Bitcoin future prices (Cheah and Fry 2015; Yelowitz and Wilson
2015). Search data was obtained from Google trends for the keyword “bitcoin”. Price of
2015). Search data was obtained from Google trends for the keyword “bitcoin”. Price of cryptocurrency
cryptocurrency Ripple (XRP), which is the third biggest cryptocurrency in terms of market
Ripple (XRP), which is the third biggest cryptocurrency in terms of market capitalization, was also
capitalization, was also considered as an exogenous factor for Bitcoin price prediction (Cagli 2019).
considered as an
Bitcoin data forexogenous factor for
all the exogenous andBitcoin price factors
endogenous prediction
for the(Cagli 2019).
period Bitcointodata
01/01/2010 for all the
30/06/2019
exogenous
were collected and a total of 3469 time series observations were obtained. Figure 3 depicts the atime
and endogenous factors for the period 01/01/2010 to 30/06/2019 were collected and total of
3469series
time plot
seriesforobservations
Bitcoin prices. wereWeobtained.
provide the Figure 3 depicts
definitions the20
of the time seriesand
features plotthe
fordata
Bitcoin prices.
sources in We
provide the definitions
Appendix A. of the 20 features and the data sources in Appendix A.
Figure3.3.Time
Figure Time series plot of
series plot ofBitcoin
Bitcoinprice
priceinin USD.
USD.
3.1. Data Pre-Processing

3.1. Data Pre-Processing
Data from different sources were merged and certain assumptions were made. Since,
Data from different sources were merged and certain assumptions were made. Since,
cryptocurrencies
cryptocurrenciesget
gettraded
tradedtwenty-four hoursa a
twenty-four hours day
day andand seven
seven daysdays a week,
a week, we setwethe set
endthe end of
of day
dayprice
priceatat
1212 a.m.
a.m. midnight
midnight fortrading
for each each trading
day. It isday. It is alsothat
also assumed assumed
the stock,that theand
bond, stock, bond, and
commodity
commodity
prices maintain Friday’s price on the weekends, thus ignoring after-market trading. The data-set The
prices maintain Friday’s price on the weekends, thus ignoring after-market trading.
data-set values
values werewere normalized
normalized by first
by first demeaning
demeaning each each data-series
data-series and then
and then dividing
dividing it by itits
bystandard
its standard
deviation. After normalizing the data, the dataset is divided into a training set: observations between
deviation. After normalizing the data, the dataset is divided into a training set: observations between
1 January 2010–30 June 2018; a validation set: observations between 1 July 2018–31 December 2018;
1 January 2010–30 June 2018; a validation set: observations between 1 July 2018–31 December 2018;
and a test set: observations between 1 January 2019–30 June 2019. A lookback period of 15, 30, 45, and
and60a days
test set:
wereobservations
consideredbetween 1 January
to predict 2019–30
the future June 2019.
one-day priceAand
lookback period are
the returns of 15, 30, 45, and
evaluated
60accordingly.
days were considered to predict the future one-day price and the returns are evaluated accordingly.
3.2. Feature
3.2. FeatureSelection
Selection
One of the
One of the most
mostimportant
importantaspects
aspectsofofdata
datamining
miningprocess
processisisfeature
feature selection. Feature selection
selection. Feature selection is
basically concerned
is basically concernedwith extracting
with extractinguseful
usefulfeatures/patterns
features/patterns from datato
from data tomake
makeititeasier
easierforfor machine
machine
learning
learning models to perform their predictions. To check the behavior of the features with respect to to
models to perform their predictions. To check the behavior of the features with respect
Bitcoin
Bitcoinprices, we
prices, weplotted
plottedthe
thedata
datafor
forall
allthe
the20
20features
features for
for the entire time
time period,
period,asasshown
shownininFigure
Figure 4
below. A closer
4 below. looklook
A closer at the
atplot
the reveals that the
plot reveals thatendogenous features
the endogenous are more
features correlated
are more with with
correlated Bitcoin
prices than
Bitcoin the than
prices exogenous features.
the exogenous For the
features. exogenous
For features,
the exogenous Google
features, Googletrends, interest-rates,
trends, and
interest-rates,
and Ripple
Ripple price seems
price seems to bemost
to be the the most correlated.
correlated.
Figure
Figure4. 4.Plot
Plotshowing
showingthe
thebehavior
behavior of
of independent variableswith
independent variables withBitcoin
Bitcoinprice.
price. The
The blue
blue line
line plots
plots
thethe
different features used for Bitcoin price prediction and the orange line plots the Bitcoin price over
different features used for Bitcoin price prediction and the orange line plots the Bitcoin price over
time. Abbreviations:
time. Abbreviations:MACD,
MACD, Moving
Moving average convergencedivergence.
average convergence divergence.
Multicollinearity
Multicollinearityisisoften
oftenanan issue
issue in
in statistical learningwhen
statistical learning whenthethefeatures
features
areare highly
highly correlated
correlated
among
amongthemselves, and
themselves, andthus, thethe
thus, finalfinal
prediction output
prediction is based
output on a on
is based much smaller
a much number
smaller of features,
number of
which may which
features, lead to may
biased inferences
lead to biased(Nawata and Nagase).
inferences (Nawata To andfind the most
Nagase appropriate
1996). features
To find the mostfor
Bitcoin price prediction, the variance inflation factor (VIF) was calculated for the predictor variables
(see Table 1). VIF provides a measure of how much the variance of an estimated regression coefficient is
increased due to multicollinearity. Features with VIF values greater than 10 (Hair et al. 1992; Kennedy
1992; Marquardt 1970; Neter et al. 1989) is not considered for analysis. A set of 15 features were finally
selected after dropping Bitcoin miner revenue, Metcalf-UTXO, interest rates, lock size and U.S. bond
yields 2-years, and 10-years difference.
Table 1. Variance inflation factor (VIF) for predictor variables.
Predictor Variables VIF Predictor Variables VIF

Bitcoin daily lag returns 1.023 Block size 30.208
Daily transaction volume 2.823 MACD histogram 1.27
Price volatility 1.13 S&P lag returns 1.005
Transaction fees 4.671 VIX 1.671
Miner revenue 28.664 Dollar index 6.327
Hash rate 2.694 Gold 5.267
Interest rates 49.718 2 Yr yield 3.074
Google trend 9.847 10 Yr yield 8.101
Money supply 8.462 Ripple price 5.866
Metcalf UTXO 12.003 Diff 2 yr–10 yr diff 11.283
4. Model Implementation and Results

Even though Bitcoin prices follow a time series sequence, machine learning models are considered
due to their performance reported in the literature (Karasu et al. 2018; Chen et al. 2020). This approach
serves the purpose to measure the relative prediction power of the shallow/deep learning models, as
compared to the traditional models. The Bitcoin price graph in Figure 3 appears to be non-stationary
with an element of seasonality and trend, and neural network models are the best to capture that. At
first, a simple NN architecture was trained to explore the prediction power of non-linear architectures.
A set of shallow learning models was then used to predict the Bitcoin prices using various variants of
the RNN, as described in Section 2. RNN with a LSTM and GRU with dropout and recurrent dropouts
were trained and implemented. Keras package (Chollet 2015) was used with Python 3.6 to build, train,
and analyze the models on the test set. A deep learning model implementation approach to forecasting
is a trade-off between bias and variance, the two main sources of forecast errors (Yu et al. 2006). Bias
error is attributed to inappropriate data assumptions, while variance error is attributed to model data
sensitivity (Yu et al. 2006). A low-variance high-bias model leads to underfitting, while a low-bias
high-variance model leads to overfitting (Lawrence et al. 1997). Hence, the forecasting approach aimed
to find an optimum balance between bias and variance to simultaneously achieve low bias and low
variance. In the present study, a high training loss denotes a higher bias, while a higher validation
loss represents a higher variance. RMSE is preferred over mean absolute error (MAE) for model error
evaluation because RMSE gives relatively high weight to large errors.
Each of the individual models were optimized with hyperparameter tuning for price prediction.
The main hyperparameters which require subjective inputs are the learning rate alpha, number of
iterations, number of hidden layers, choice of activation function, number of input nodes, drop-out
ratio, and batch-size. A set of activation functions were tested, and hyperbolic tangent (TanH) was
chosen for optimal learning based on the RMSE error on the test set. TanH suffers from vanishing
gradient problem; however, the second derivative can sustain for a long time before converging to
zero, unlike the rectified linear unit (ReLU), which improves RNN model prediction. Initially, the
temporal length, i.e., the look-back period, was taken to be 30 days for the RNN models. The 30-day
period was kept in consonance with the standard trading calendar of a month for investment portfolios.
Additionally, the best models were also evaluated with a lookback period for 15, 45, and 60 days.
The Learning rate is one of the most important hyperparameters that can effectively be used for
bias-variance trade-off. However, not much improvement in training was observed by altering the
learning rate, thus the default value in the Keras package (Chollet 2015) wa used. We trained all the
models with the Adam optimization method (Kingma and Ba 2015). To reduce complex co-adaptations
in the hidden units resulting in overfitting (Srivastava et al. 2014), dropout was introduced in the
LSTM and GRU layers. Thus, for each training sample the network was re-adjusted and a new set of
neurons were dropped out. For both LSTM and GRU architecture, a recurrent dropout rate (Gal and
Ghahramani 2016) of 0.1 was used. For the two hidden layers GRU, a dropout of 0.1 was additionally
used along with the recurrent dropout of 0.1. The dropout and recurrent dropout rates were optimized
to ensure that the training data was large enough to not be memorized in spite of the noise, and to
avoid overfitting (Srivastava et al. 2014). For the simple NN, two dense layers were used with hidden
nodes 25 and 1. The LSTM layer was modelled with one LSTM layer (50 nodes) and one dense layer (1
node). The simple GRU and the GRU with recurrent dropout architecture comprised of one GRU layer
(50 nodes) and one dense layer with 1 node. The final GRU architecture was tuned with two GRU
layers (50 nodes and 10 nodes) with a dropout and recurrent dropout of 0.1. The optimized batch size
for the neural network and the RNN models are determined to be 125 and 100, respectively. A higher
batch size led to a higher training and validation loss during the learning process.
Figure 5 shows the training and validation loss for the neural network models. The difference
between training loss and validation loss reduces with a dropout and a recurrent dropout for the one
GRU layer model (Figure 5, bottom middle). However, with the addition of an extra GRU layer, the
difference between the training and validation loss increased. After training, all the neural network
models were tested on the test data. The RMSE for all the models on the train and test data are shown in
Table 1. As seen from Table 2, the LSTM architecture performed better than the simple NN architecture
due to memory retention capabilities (Hochreiter and Schmidhuber 1997). As seen from Table 2 the
GRU model with a recurrent dropout generates an RMSE of 0.014 on the training set and 0.017 on the
test set. RNN-GRU performs better than LSTM, and a plausible explanation is the fact that GRUs
are computationally faster with a lesser number of gates and tensor operations. The GRU controls
the flow of information like the LSTM unit; however, the GRU has no memory unit and it exposes
the full hidden content without any control (Chung et al. 2014). GRUs also tend to perform better
than LSTM on less training data (Kaiser and Sutskever 2016) as in the present case, while LSTMs are
more efficient in remembering longer sequences (Yin et al. 2017). We also found that the recurrent
dropout in the GRU layer helped reduce the RMSE on the test data, and the difference of RMSE
between training and test data was the minimum for the GRU model with recurrent dropout. These
results indicate that the GRU with recurrent dropout is the best performing model for our problem.
Recurrent dropouts help to mask some of the output from the first GRU layer, which can be thought as
a variational inference in RNN (Gal and Ghahramani 2016; Merity et al. 2017). The Diebold-Mariano
statistical test (Diebold and S 1995) was conducted to analyze if the difference in prediction accuracy
between a pair of models in decreasing order of RMSE is statistically significant. The p-values, as
reported in Tables 2 and 3, indicate that each of the models reported in decreasing order of RMSE, has
a significantly improved RMSE than its previous model in predicting Bitcoin prices. We also trained
the GRU recurrent dropout model with a lookback period of 15, 45, and 60 days and the results are
reported in Table 3. It can be concluded from Table 3 that the lookback period for 30 days is the optimal
period for the best RMSE results. Figure 6 shows the GRU model with recurrent dropout predicted
Bitcoin price in the test data, as compared to the original data. The model predicted price is higher
than the original price in the first few months of 2019; however, when the Bitcoin price shot up in
June–July 2019, the model was able to learn this trend effectively.
Table 2. Train test root mean squared error (RMSE) of 30 days lookback period for different models.
Models RMSE Train RMSE Test p-Value

Neural Network 0.020 0.031
LSTM 0.010 0.024 0.0000
GRU 0.010 0.019 0.0000
GRU-Dropout 0.014 0.017 0.0012
GRU-Dropout-GRU 0.012 0.034 0.0000
(50 nodes) and one dense layer (1 node). The simple GRU and the GRU with recurrent dropout
architecture comprised of one GRU layer (50 nodes) and one dense layer with 1 node. The final GRU
architecture was tuned with two GRU layers (50 nodes and 10 nodes) with a dropout and recurrent
dropout of 0.1. The optimized batch size for the neural network and the RNN models are determined
J.to beFinancial
Risk 125 andManag.
100, respectively.
2020, 13, 23 A higher batch size led to a higher training and validation loss during
9 of 16
the learning process.
Most Bitcoin exchanges, unlike stock exchanges, do now allow short selling of Bitcoin, yet this
results in higher volatility and regulatory risks (Filippi 2014). Additionally, volatility depends on how
Figure
Figure5.5. Training and
and validation
validationloss
lossfor
forsimple
simpleneural
neural network
network (NN)
(NN) (top
(top left),
left), LSTMLSTMwithwith dropout
dropout
close the model predictions are to the actual market price of Bitcoin at every point of time. As can be
(top
(topright),
right), GRU (bottom left),
GRU (bottom left),GRU
GRUwithwitha arecurrent
recurrent dropout
dropout (bottom
(bottom middle),
middle), andandGRU GRU
withwith dropout
dropout
seen from Figure 7, Bitcoin prices went down during early June 2019, and the buy-sell strategy
and
andrecurrent
recurrent dropout (bottom right).
dropout (bottom right).
correctly predicted the fall, with the trader selling the Bitcoins holding to keep the cash before
investing
Figureagain whenthe
5 shows the Tablestarts
price
training 3.
and Train test RMSE
rising
validation from for GRU
therecurrent
mid-June.
loss for model. models.
In comparison,
neural network due toThe short selling and
difference
taking long positions
between training loss simultaneously,
and validation the long-short
loss reduces with strategy
a dropout suffered
andTest during
a recurrent the same period of time
dropout for the one
Lookback Period (Days) RMSE Train RMSE p-Value
with
GRUvery
layerslow
model increase
(Figurein5, portfolio value. However,
bottom middle). However, withlong-short strategies
the addition of anmight be more
extra GRU powerful
layer, the
15 0.012 0.016
when we consider
difference between the a portfolio
training andconsisting
validation of loss
multiple cryptocurrencies
increased. After training,where all the investors can take
neural network
45 0.011 0.019 0.0010
simultaneous long and
models were tested short
on the test positions
60 data. in currencies,
The RMSE for all the which
0.011 modelshave
on thesignificant
0.017 train and growth
test
0.0006 potential
data are shown and
in Table 1. currencies.
overvalued As seen from Table 2, the LSTM architecture performed better than the simple NN
architecture due to memory retention capabilities (Hochreiter and Schmidhuber 1997). As seen from
Table 2 the GRU model with a recurrent dropout generates an RMSE of 0.014 on the training set and
0.017 on the test set. RNN-GRU performs better than LSTM, and a plausible explanation is the fact
that GRUs are computationally faster with a lesser number of gates and tensor operations. The GRU
Figure 6. Bitcoin price as predicted by the GRU one-layer model with dropout and recurrent dropout.
Figure 6. Bitcoin price as predicted by the GRU one-layer model with dropout and recurrent dropout.
5. Portfolio
close Strategy
the model predictions are to the actual market price of Bitcoin at every point of time. As can be
seen We
from Figure 7, two
implement Bitcoin pricesstrategies
trading went down during early
to evaluate our June 2019,
results and the buy-sell
in portfolio managementstrategy
of
correctly predicted the fall, with the trader selling the Bitcoins holding
cryptocurrencies. For simplicity, we considered only Bitcoin trading and we assumed thatto keep the cash before
the
investing
trader again
only buyswhen the price
and sells basedstarts
on therising from
signals mid-June.
derived fromIn comparison,
quantitative due to Based
models. short selling
on our and
test
taking long positions simultaneously, the long-short strategy suffered during the
set evaluation, we have considered the GRU one layer with recurrent dropout as our best model same period of time
for
with very slowtrading
implementing increase in portfolio
strategies. Twovalue.
typesHowever,
of tradinglong-short strategies
strategies were might be more
implemented, powerful
as discussed in
when we consider a portfolio consisting of multiple cryptocurrencies where investors
this section. The first strategy was a long-short strategy, wherein the buy signal predicted from the can take
simultaneous
model will leadlong and short
to buying positions
the Bitcoin andina currencies, which
sell signal will have significant
essentially growth potential
lead to short-selling and
the Bitcoin
overvalued currencies.
at the beginning of the day based on the model predictions for that day. If the model predicted price
on a given day is lower than the previous day, then the trader will short sell the Bitcoin and cover them
at the end of the day. An initial portfolio value of 1 is considered and the transaction fees is taken to be
0.8% of the invested or sold amount. Due to daily settlement, the long-short strategy is expected to
incur significant transaction costs which may reduce the portfolio value. The second strategy was a
buy-sell strategy where the trader goes long when a buy signal is triggered and sell all the Bitcoins
when a sell signal is generated. Once the trader sells all the coins in the portfolio, he/she waits for
the next positive signal to invest again. When a buy signal occurs, the trader invests in Bitcoin and
remains invested till the next sell signal is generated.
close the model predictions are to the actual market price of Bitcoin at every point of time. As can
be seen from Figure 7, Bitcoin prices went down during early June 2019, and the buy-sell strategy
correctly predicted the fall, with the trader selling the Bitcoins holding to keep the cash before investing
again when the price starts rising from mid-June. In comparison, due to short selling and taking long
positions simultaneously, the long-short strategy suffered during the same period of time with very
slow increase in portfolio value. However, long-short strategies might be more powerful when we
consider a portfolio consisting of multiple cryptocurrencies where investors can take simultaneous long
and short
Figurepositions
6. Bitcoinin currencies,
price which
as predicted have
by the significant
GRU growth
one-layer model potential
with dropoutand
andovervalued currencies.
recurrent dropout.
Figure 7. Above
Figure 7. Above shows
showsthe
thechange
changeininportfolio value
portfolio over
value time
over when
time whenthe the
strategies long-short
strategies (Left)
long-short and
(Left)
buy-sell (Right) are implemented on the test data. Due to short selling, daily settlement the long-short
and buy-sell (Right) are implemented on the test data. Due to short selling, daily settlement the long-
portfolio incursincurs
short portfolio transaction fees which
transaction reduces
fees which growth
reduces and increases
growth volatility
and increases in the portfolio.
volatility in the portfolio.
6. Conclusions
6. Conclusions
There have been a considerable number of studies on Bitcoin price prediction using machine
There have been a considerable number of studies on Bitcoin price prediction using machine
learning and time-series analysis (Wang et al. 2015; Guo et al. 2018; Karakoyun and Çıbıkdiken 2018;
learning and time-series analysis (Wang et al. 2015; Guo et al. 2018; Karakoyun and Çibikdiken 2018;
Jang and Lee 2017; McNally et al. 2018). However, most of these studies have been mostly based on
Jang and Lee 2017; McNally et al. 2018). However, most of these studies have been mostly based on
predicting the Bitcoin prices based on pre-decided models with a limited number of features like price
volatility, order book, technical indicators, price of gold, and the VIX. The present study explores
Bitcoin price prediction based on a collective and exhaustive list of features with financial linkages, as
shown in Appendix A. The basis of any investment has always been wealth creation either through
fundamental investment, or technical speculation, and cryptocurrencies are no exception to this. In
this study, feature engineering is performed taking into account whether Bitcoin could be used as an
alternative investment that offers investors diversification benefits and a different investment avenue
when the traditional means of investment are not doing well. This study considers a holistic approach
to select the predictor variables that might be helpful in learning future Bitcoin price trends. The
U.S. treasury two-year and ten-year yields are the benchmark indicators for short-term and long-term
investment in bond markets, hence a change in these benchmarks could very well propel investors
towards alternative investment avenues such as the Bitcoin. Similar methodology can be undertaken
for gold, S&P returns and dollar index. Whether it is good news or bad news, increasing attraction or
momentum-based speculation, google trends, and VIX price data are perfect for studying this aspect of
the influence on the prices.
We also conclude that recurrent neural network models such as LSTM and GRU outperform
traditional machine learning models. With limited data, neural networks like LSTM and GRU can
regulate past information to learn effectively from non-linear patterns. Deep models require accurate
training and hyperparameter tuning to yield results, which might be computationally extensive for
large datasets unlike conventional time-series approaches. However, for stock price prediction or
cryptocurrency price prediction, market data are always limited and computational complexity is not
a concern, and thus shallow learning models can be effectively used in practice. These benefits will
likely contribute significantly to quantitative finance in the coming years.
In deep learning literature, LSTM has been traditionally used to analyze time-series. GRU
architecture on the other hand, seems to be performing better than the LSTM model in our analysis.
The simplicity of the GRU model, where the forgetting and updating is occurring simultaneously,
was found to be working well in Bitcoin price prediction. Adding a recurrent dropout improves the
performance of the GRU architecture; however, further studies need to be undertaken to explore the
dropout phenomenon in GRU architectures. Two types of investment strategies have been implemented
with our trained GRU architecture. Results show that when machine learning models are implemented
with full understanding, it can be beneficial to the investment industry for financial gains and portfolio
management. In the present case, recurrent machine learning models performed much better than
traditional ones in price prediction; thus, making the investment strategies valuable. With proper back
testing of each of these models, they can contribute to manage portfolio risk and reduce financial losses.
Nonetheless, a significant improvement over the current study can be achieved if a bigger data set
is available. Convolutional neural network (CNN) has also been used to predict financial returns in
forecasting daily oil futures prices (Luo et al. 2019). To that end, a potential future research study can
explore the performance of CNN architectures to predict Bitcoin prices.
Author Contributions: Conceptualization, data curation, validation, and draft writing, S.K.; methodology, formal
analysis, draft preparation, and editing, A.D.; draft writing, plots, M.B. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: The authors would like to thank the staff at the Haas School of Business, University of
California Berkeley and Katz Graduate School of Business, University of Pittsburgh for their support. A.D.
Conflicts of Interest: The authors declare no conflicts of interest. The views expressed are personal.
Appendix A
Table A1. Definition of variables and data source.
Features Definition Source

Bitcoin Price Bitcoin prices. https://charts.bitcoin.com/btc/
The annualized daily volatility of price changes. Price volatility
BTC Price Volatility is computed as the standard deviation of daily returns, scaled by https://charts.bitcoin.com/btc/
the square root of 365 to annualize, and expressed as a decimal.
Total value of Coinbase block rewards and transaction fees paid https:
BTC Miner Revenue to miners. Historical data showing (number of bitcoins mined //www.quandl.com/data/BCHAIN/
per day + transaction fees) * market price. MIREV-Bitcoin-Miners-Revenue
BTC Transaction Volume The number of transactions included in the blockchain each day. https://charts.bitcoin.com/btc/
Total amount of Bitcoin Core (BTC) fees earned by all miners in
Transaction Fees https://charts.bitcoin.com/btc/
24-hour period, measured in Bitcoin Core (BTC).
The number of block solutions computed per second by all
Hash Rate https://charts.bitcoin.com/btc/
miners on the network.
Money Supply The amount of Bitcoin Core (BTC) in circulation. https://charts.bitcoin.com/btc/
Metcalfe’s Law states that the value of a network is proportional
Metcalfe-UTXO https://charts.bitcoin.com/btc/
to the square of the number of participants in the network.
Miners collect Bitcoin Core (BTC) transactions into distinct
packets of data called blocks. Each block is cryptographically
Block Size linked to the preceding block, forming a "blockchain." As more https://charts.bitcoin.com/btc/
people use the Bitcoin Core (BTC) network for Bitcoin Core
(BTC) transactions, the block size increases.
Google Trends This is the month-wise Google search results for the Bitcoins. https://trends.google.com
http://www.cboe.com/products/vix-
VIX is a real-time market index that represents the market’s
Volatility (VIX) index-volatility/vix-options-and-
expectation of 30-day forward-looking volatility.
futures/vix-index/vix-historical-data
https://www.quandl.com/data/WGC/
Gold price Level Gold price level. GOLD_DAILY_USD-Gold-Prices-
Daily-Currency-USD
https://finance.yahoo.com/quote/DX-Y.
The U.S. dollar index (USDX) is a measure of the value of the
NYB/history?period1=1262332800&
US Dollar Index U.S. dollar relative to the value of a basket of currencies of the
period2=1561878000&interval=1d&
majority of the U.S.’ most significant trading partners.
filter=history&frequency=1d
https://www.quandl.com/data/
US Bond Yields 2-year / short-term yields. USTREASURY/YIELD-Treasury-Yield-
Curve-Rates
US Bond Yields 10-year/ long term yields. USTREASURY/YIELD-Treasury-Yield-
Curve-Rates
Difference between 2 year and 10 year/ synonymous with yield
US Bond Yields USTREASURY/YIELD-Treasury-Yield-
inversion and recession prediction
Curve-Rates
MACD=12-Period EMA −26-Period EMA. We have taken the data of the MACD with the signal line.
MACD line = 12-day EMA Minus 26-day EMA
MACD
Signal line = 9-day EMA of MACD line
MACD Histogram = MACD line Minus Signal line
https://coinmarketcap.com/currencies/
Ripple Price The price of an alternative cryptocurrency. ripple/historical-data/?start=20130428&
end=20190924
https:
//finance.yahoo.com/quote/%5EGSPC/
One Day Lagged S&P
Stock market returns. history?period1=1230796800&
500 Market Returns
period2=1568012400&interval=1d&
filter=history&frequency=1d
http://www.fedprimerate.com/
The federal funds rate decide the shape of the future interest
Interest Rates fedfundsrate/federal_funds_rate_
rates in the economy.
history.htm#current
References
Baek, Chung, and Matt Elbeck. 2015. Bitcoin as an Investment or Speculative Vehicle? A First Look. Applied
Economics Letters 22: 30–34. [CrossRef]
Barrdear, John, and Michael Kumhof. 2016. The Macroeconomics of Central Bank Issued Digital Currencies. SSRN
Electronic Journal. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2811208 (accessed
on 2 February 2020).
Baronchelli, Andrea. 2018. The emergence of consensus: A primer. Royal Society Open Science 5: 172189. [CrossRef]
Bech, Morten L., and Rodney Garratt. 2017. Central Bank Cryptocurrencies. BIS Quarterly Review 2017: 5–70.
Blau, Benjamin M. 2017. Price Dynamics and Speculative Trading in Bitcoin. Research in Internatonal Business and
Finance 41: 493–99. [CrossRef]
Blundell-Wignall, Adrian. 2014. The Bitcoin Question: Currency versus Trust-less Transfer Technology. OECD
Working Papers on Finance, Insurance and Private Pensions 37: 1.
Bohme, Rainer, Nicolas Christin, Benjamin Edelman, and Tyler Moore. 2015. Bitcoin: Economics, technology, and
governance. Journal of Economic Perspectives (JEP) 29: 213–38. [CrossRef]
Bouri, Elie, Peter Molnár, Georges Azzi, and David Roubaud. 2017. On the hedge and safe haven properties of
Bitcoin: Is it really more than a diversifier? Finance Research Letters 20: 192–98. [CrossRef]
Briere, Marie, Kim Oosterlinck, and Ariane Szafarz. 2015. Virtual currency, tangible return: Portfolio diversification
with bitcoin. Journal Asset Management 16: 365–73. [CrossRef]
Cagli, Efe C. 2019. Explosive behavior in the prices of Bitcoin and altcoins. Finance Research Letters 29: 398–403.
[CrossRef]
Casey, Michael J., and Paul Vigna. 2015. Bitcoin and the digital-currency revolution. The Wall Street Journal.
Available online: https://www.wsj.com/articles/the-revolutionary-power-of-digital-currency-1422035061
(accessed on 2 February 2020).
Chang, Pei-Chann, Chen-Hao Liu, Chin-Yuan Fan, Jun-Lin Lin, and Chih-Ming Lai. 2009. An Ensemble of Neural
Networks for Stock Trading Decision Making. In Emerging Intelligent Computing Technology and Applications.
With Aspects of Artificial Intelligence 5755 of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer,
pp. 1–10. Available online: https://doi.org/10.1007/978-3-642-04020-7_1 (accessed on 2 February 2020).
Cheah, Eng-Tuck, and John Fry. 2015. Speculative bubbles in Bitcoin markets? An empirical investigation into the
fundamental value of Bitcoin. Economics Letters 130: 32–36. [CrossRef]
Chen, Zheshi, Chunhong Li, and Wenjun Sun. 2020. Bitcoin price prediction using machine learning: An approach
to sample dimension engineering. Journal of Computational and Applied Mathematics 365: 112395. [CrossRef]
Cho, Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Bougares Fethi, Schwenk Holger,
and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation. Available online: https://arxiv.org/abs/1406.1078.pdf (accessed on 2 February 2020).
Chollet, Francois. 2015. Keras: Deep Learning for humans. Available online: https://github.com/keras-team/keras
Chong, Eunsuk, Chulwoo Han, and Frank C. Park. 2017. Deep learning networks for stock market analysis and
prediction: methodology, data representations, and case studies. Expert System with Applications 83: 187–205.
[CrossRef]
Chung, Junyoung, Caglar Gulcehre, Kyung H. Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated
Recurrent Neural Networks on Sequence Modeling. Available online: https://arxiv.org/pdf/1412.3555.pdf
Ciaian, Pavel, Miroslava Rajcaniova, and d’Artis Kancs. 2016. The economics of Bitcoin price formation. Applied
Economics 48: 1799–815. [CrossRef]
Corbet, Shaen, Brian Lucey, Maurice Peat, and Samuel Vigne. 2018. Bitcoin Futures—What use are they? Economics
Letters 172: 23–27. [CrossRef]
Cusumano, Michael A. 2014. The Bitcoin ecosystem. Communications of the ACM 57: 22–24. [CrossRef]
Cybenko, George. 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems
2: 303–14. [CrossRef]
Diebold, Francis X., and Mariano Roberto S. 1995. Comparing Predictive Accuracy. Journal of Business and Economic
Statistics 13: 253–63.
Dow, Sheila. 2019. Monetary Reform, Central Banks and Digital Currencies. International Journal of Political
Economy 48: 153–73. [CrossRef]
Dyhrberg, Anne H. 2016. Bitcoin, gold and the dollar-A Garch volatility. Finance Research Letters 16: 85–92.
[CrossRef]
Dwyer, Gerald P. 2015. The economics of Bitcoin and similar private digital currencies. Journal of Financial Stability
17: 81–91. [CrossRef]
ElBahrawy, Abeer, Laura Alessandretti, Anne Kandler, Romualdo Pastor-Satorras, and Andrea Baronchelli. 2017.
Evolutionary dynamics of the cryptocurrency market. Royal Society Open Science 4: 170623. [CrossRef]
[PubMed]
Enke, David, and Suraphan Thawornwong. 2005. The use of data mining and neural networks for forecasting
stock market returns. Expert Systems with Applications 29: 927–40. [CrossRef]
Fama, Marco, Andrea Fumagalli, and Stefano Lucarelli. 2019. Cryptocurrencies, Monetary Policy, and New Forms
of Monetary Sovereignty. International Journal of Political Economy 48: 174–94. [CrossRef]
Fantacci, Luca. 2019. Cryptocurrencies and the Denationalization of Money. International Journal of Political
Economy 48: 105–26. [CrossRef]
Filippi, Primavera De. 2014. Bitcoin: A Regulatory Nightmare to a Libertarian Dream. Internet Policy Review 3.
[CrossRef]
Gajardo, Gabriel, Werner D. Kristjanpoller, and Marcel Minutolo. 2018. Does Bitcoin exhibit the same asymmetric
multifractal cross-correlations with crude oil, gold and DJIA as the Euro, Great British Pound and Yen? Chaos,
Solitons & Fractals 109: 195–205.
Gal, Yarin, and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty
in Deep Learning. Available online: https://arxiv.org/pdf/1506.02142.pdf (accessed on 2 February 2020).
Gal, Yarin, and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural
networks. Advances in Neural Information Processing Systems 2016: 1019–27.
Gandal, Neil, and Hanna Halaburda. 2016. Can we predict the winner in a market with network effects?
Competition in cryptocurrency market. Games 7: 16. [CrossRef]
Guo, Tian, Albert Bifet, and Nino Antulov-Fantulin. 2018. Bitcoin volatility forecasting with a glimpse into buy
and sell orders. Paper presented at 2018 IEEE International Conference on Data Mining (ICDM), Singapore,
November 17–20.
Guo, Tian, and Nino Antulov-Fantulin. 2018. Predicting Short-Term Bitcoin Price Fluctuations from Buy and Sell
Orders. Available online: https://arxiv.org/pdf/1802.04065v1.pdf (accessed on 2 February 2020).
Hair, Joseph F., Rolph E. Anderson, and Ronald L. Tatham. 1992. Multivariate Data Analysis, 3rd ed. New York:
Macmillan.
Hileman, G, and M. Rauchs. 2017. Global Cryptocurrency Bench- marking Study. Cambridge Centre for Alternative
Finance. Available online: https://www.jbs.cam.ac.uk/fileadmin/user_upload/research/centres/alternative-
finance/downloads/2017-04-20-global-cryptocurrency-benchmarking-study.pdf (accessed on 2 February
2020).
Hinton, Geoffrey E., Simon Osindero, and Yee-Whye The. 2006. A fast learning algorithm for deep belief nets.
Neural Computation 18: 1527–54. [CrossRef]
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9: 1735–80.
[CrossRef]
Hochreiter, Sepp. 1998. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem
Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6: 107–16. [CrossRef]
Huang, Wei, Yoshiteru Nakamori, and Shou-Yang Wang. 2005. Forecasting stock market movement direction with
support vector machine. Computers & Operations Research 32: 2513–22.
Huck, Nicolas. 2010. Pairs trading and outranking: The multi-step-ahead forecasting case. European Journal of
Operational Research 207: 1702–16. [CrossRef]
Jang, Huisu, and Jaewook Lee. 2017. An Empirical Study on Modeling and Prediction of Bitcoin Prices with
Bayesian Neural Networks Based on Blockchain Information. IEEE Access 6: 5427–37. [CrossRef]
Kaiser, Lukasz, and Ilya Sutskever. 2016. Neural GPUS Learn Algorithms. Available online: https://arxiv.org/pdf/
1511.08228.pdf (accessed on 2 February 2020).
Kaiser, Lars. 2019. Seasonality in cryptocurrencies. Finance Research Letters 31: 232–38. [CrossRef]
Karakoyun, Ebru Şeyma, and Ali Osman Çıbıkdiken. 2018. Comparison of ARIMA Time Series Model and
LSTM Deep Learning Algorithm for Bitcoin Price Forecasting. Paper presented at the 13th Multidisciplinary
Academic Conference in Prague 2018 (The 13th MAC 2018), Prague, Czech Republic, May 25–27.
Karasu, Seçkin, Aytaç Altan, Zehra Saraç, and Rifat Hacioğlu. 2018. Prediction of Bitcoin prices with machine
learning methods using time series data. Paper presented at 26th Signal Processing and Communications
Applications Conference (SIU), Izmir, Turkey, May 2–5.
Katsiampa, Paraskevi. 2017. Volatility estimation for Bitcoin: A comparison of GARCH models. Economics Letters
158: 3–6. [CrossRef]
Kennedy, Peter E. 1992. A Guide to Econometrics. Oxford: Blackwell.
Kim, Young B., Jun G. Kim, Wook Kim, Jae H. Im, Tae H. Kim, Shin J. Kang, and Chang H. Kim. 2016. Predicting
Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLoS ONE 11: e0161197.
[CrossRef]
Kingma, Diederik P., and Jimmy Ba. 2015. Adam: A method for stochastic optimization. arXiv 2015: 9.
Krafft, Peter M., Nicolas D. Penna, and Alex S. Pentland. 2018. An Experimental Study of Cryptocurrency Market
Dynamics. Paper presented at CHI Conference, Montreal, QC, Canada, April 21–26.
Kristoufek, Ladoslav. 2015. What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence
Analysis. PLoS ONE 10: e0123923. [CrossRef]
Lawrence, Steve, Giles C. Lee, and Ah C. Tsoi. 1997. Lessons in Neural Network Training: Overfitting May be
Harder than Expected. In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Menlo Park:
AAAI Press, pp. 540–45.
Lo, Stephanie, and J. Christina Wang. 2014. Bitcoin as Money? Working Paper 14. Boston, MA, USA: Federal
Reserve Bank of Boston.
Luo, Zhaojie, Xiaojing Cai, Katsuyuki Tanaka, Tetsuya Takiguchi, Takuji Kinkyo, and Shigeyuki Hamori. 2019.
Can we forecast daily oil futures prices? Experimental evidence from convolutional neural networks. Journal
of Risk and Financial Management 12: 9. [CrossRef]
Madan, Issax, Saluja Shaurya, and Zhao Aojja. 2015. Automated Bitcoin Trading via Machine Learning Algorithms.
Available online: https://pdfs.semanticscholar.org/e065/3631b4a476abf5276a264f6bbff40b132061.pdf (accessed
Malherbe, Leo, Matthieu Montalban, Nicolas Bedu, and Caroline Granier. 2019. Cryptocurrencies and Blockchain:
Opportunities and Limits of a New Monetary Regime. International Journal of Political Economy 48: 127–52.
[CrossRef]
Marquardt, Donald W. 1970. Generalized inverses, ridge regression, biased linear estimation, and nonlinear
estimation. Technometrics 12: 591–612. [CrossRef]
McNally, Sean, Jason Roche, and Simon Caton. 2018. Predicting the Price of Bitcoin Using Machine Learning.
Paper presented at 26th Euromicro International Conference on Parallel, Distributed and Network-based
Processing (PDP), Cambridge, UK, March 21–23.
Merity, Stephen, Nitish S. Keskar, and Richard Socher. 2017. Regularizing and Optimizing LSTM Language
Models. Available online: https://arxiv.org/abs/1708.02182 (accessed on 2 February 2020).
Muzammal, Muhammad, Qiang Qu, and Bulat Nasrulin. 2019. Renovating blockchain with distributed databases:
An open source system. Future Generation Computer Systems 90: 105–17. [CrossRef]
Nakamoto, Satoshi. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. Available online: https://bitcoin.org/
bitcoin.pdf (accessed on 2 February 2020).
Nawata, Kazumitsu, and Nobuko Nagase. Estimation of sample selection bias models. Econometric Reviews 15: 4.
[CrossRef]
Neter, John, William Wasserman, and Michael H. Kutner. 1989. Applied Linear Regression Models. Homewood:
Irwin.
Pichl, Lukas, and Taisei Kaizoji. 2017. Volatility Analysis of Bitcoin Price Time Series. Quantitative Finance and
Economics 1: 474–85. [CrossRef]
Poyser, Obryan. 2017. Exploring the Determinants of Bitcoin’s Price: An Application of Bayesian Structural Time
Series. Available online: https://arxiv.org/abs/1706.01437 (accessed on 2 February 2020).
Pascanu, Razvan, Tomas Mikolov, and Yochus Bengio. 2013. On the Difficulty of Training Recurrent Neural
Networks. Available online: https://arxiv.org/pdf/1211.5063.pdf (accessed on 2 February 2020).
Rogojanu, Angel, and Liana Badeaetal. 2014. The issue of competing currencies. Case study bitcoin. Theoretical
and Applied Economics 21: 103–14.
Selmi, Refk, Walid Mensi, Shawkat Hammoudeh, and Jamal Boioiyour. 2018. Is Bitcoin a hedge, a safe haven or a
diversifier for oil price movements? A comparison with gold. Energy Economics 74: 787–801. [CrossRef]
Sheta, Alaa F., Sara Elsir M. Ahmed, and Hossam Faris. 2015. A comparison between regression, artificial neural
networks and support vector machines for predicting stock market index. Soft Computing 7: 8.
Siami-Namini, Sima, and Akbar S. Namin. 2018. Forecasting Economics and Financial Time Series: ARIMA vs.
LSTM. Available online: https://arxiv.org/abs/1803.06386v1 (accessed on 2 February 2020).
Sovbetov, Yhlas. 2018. Factors influencing cryptocurrency prices: Evidence from bitcoin, ethereum, dash, litcoin,
and monero. Available online: https://mpra.ub.uni-muenchen.de/85036/1/MPRA_paper_85036.pdf (accessed
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhesvsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout:
A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15: 1929–58.
Wang, Lin, Yi Zeng, and Tao Chen. 2015. Back propagation neural network with adaptive differential evolution
algorithm for time series forecasting. Expert Systems with Applications 42: 855–63. [CrossRef]
White, Lawrence H. 2015. The market for cryptocurrencies. The Cato Journal 35: 383–402. [CrossRef]
Yin, Wenpeng, Katharina Kann, Mo Yu, and Hinrich Schutze. 2017. Comparative Study of CNN and RNN
for Natural Language Processing. Available online: https://arxiv.org/pdf/1702.01923.pdf (accessed on 2
February 2020).
Yelowitz, Aaron, and Matthew Wilson. 2015. Characteristics of Bitcoin users: an analysis of Google search data.
Applied Economics Letters 22: 1030–36. [CrossRef]
Yu, Lean, Kin K. Lai, Shouyang Wang, and Wei Huang. 2006. A Bias-Variance-Complexity Trade-Off Framework
for Complex System Modeling. In Computational Science and Its Applications-ICCSA 2006. Lecture Notes in
Computer Science. Berlin/Heidelberg: Springer, Volume 3980.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

A Gated Recurrent Unit Approach To Bitcoin

Uploaded by

A Gated Recurrent Unit Approach To Bitcoin

Uploaded by

Journal of

Risk and Financial

J. Risk Financial Manag. 2020, 13, 23; doi:10.3390/jrfm13020023 www.mdpi.com/journal/jrfm

architecture. For application purposes in algorithmic trading, we implemented our proposed

J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 5 of 17

3.1. Data Pre-Processing

Table 1. Variance inflation factor (VIF) for predictor variables.

Predictor Variables VIF Predictor Variables VIF

4. Model Implementation and Results

Models RMSE Train RMSE Test p-Value

J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 10 of 17

Table A1. Definition of variables and data source.

Features Definition Source

You might also like