A Gated Recurrent Unit Approach To Bitcoin
A Gated Recurrent Unit Approach To Bitcoin
Article
A Gated Recurrent Unit Approach to Bitcoin
Price Prediction
Aniruddha Dutta 1, *, Saket Kumar 1,2 and Meheli Basu 3
1 Haas School of Business, University of California, Berkeley, CA 94720, USA; saket_kumar@berkeley.edu
2 Reserve Bank of India, Mumbai, Maharashtra 400001, India
3 Joseph M. Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA 15260, USA;
meheli.basu@pitt.edu
* Correspondence: aniruddha_dutta@berkeley.edu
Received: 21 December 2019; Accepted: 27 January 2020; Published: 3 February 2020
Abstract: In today’s era of big data, deep learning and artificial intelligence have formed the backbone
for cryptocurrency portfolio optimization. Researchers have investigated various state of the art
machine learning models to predict Bitcoin price and volatility. Machine learning models like
recurrent neural network (RNN) and long short-term memory (LSTM) have been shown to perform
better than traditional time series models in cryptocurrency price prediction. However, very few
studies have applied sequence models with robust feature engineering to predict future pricing. In
this study, we investigate a framework with a set of advanced machine learning forecasting methods
with a fixed set of exogenous and endogenous factors to predict daily Bitcoin prices. We study and
compare different approaches using the root mean squared error (RMSE). Experimental results show
that the gated recurring unit (GRU) model with recurrent dropout performs better than popular
existing models. We also show that simple trading strategies, when implemented with our proposed
GRU model and with proper learning, can lead to financial gain.
Keywords: Bitcoin; trading strategy; artificial intelligence; cryptocurrency; neural networks; time
series analysis; deep learning; predictive model; risk management
1. Introduction
Bitcoin was first launched in 2008 to serve as a transaction medium between participants
without the need for any intermediary (Nakamoto 2008; Barrdear and Kumhof 2016). Since 2017,
cryptocurrencies have been gaining immense popularity, thanks to the rapid growth of their market
capitalization (ElBahrawy et al. 2017), resulting in a revenue of more than $850 billion in 2019. The
digital currency market is diverse and provides investors with a wide variety of different products. A
recent survey (Hileman and Rauchs 2017) revealed that more than 1500 cryptocurrencies are actively
traded by individual and institutional investors worldwide across different exchanges. Over 170
hedge funds, specialized in cryptocurrencies, have emerged since 2017 and in response to institutional
demand for trading and hedging, Bitcoin’s futures have been rapidly launched (Corbet et al. 2018).
The growth of virtual currencies (Baronchelli 2018) has fueled interest from the scientific community
(Barrdear and Kumhof 2016; Dwyer 2015; Bohme et al. 2015; Casey and Vigna 2015; Cusumano 2014;
Krafft et al. 2018; Rogojanu and Badeaetal 2014; White 2015; Baek and Elbeck 2015; Bech and Garratt
2017; Blau 2017; Dow 2019; Fama et al. 2019; Fantacci 2019; Malherbe et al. 2019). Cryptocurrencies
have faced periodic rises and sudden dips in specific time periods, and therefore the cryptocurrency
trading community has a need for a standardized method to accurately predict the fluctuating price
trends. Cryptocurrency price fluctuations and forecasts studied in the past (Poyser 2017) focused on the
analysis and forecasting of price fluctuations, using mostly traditional approaches for financial markets
analysis and prediction (Ciaian et al. 2016; Guo and Antulov-Fantulin 2018; Gajardo et al. 2018; Gandal
and Halaburda 2016). Sovbetov (2018) observed that crypto market-related factors such as market beta,
trading volume, and volatility are significant predictors of both short-term and long-term prices of
cryptocurrencies. Constructing robust predictive models to accurately forecast cryptocurrency prices
is an important business challenge for potential investors and government agencies. Cryptocurrency
trading is actually a time series forecasting problem, and due to high volatility, it is different from price
forecasting in traditional financial markets (Muzammal et al. 2019). Briere et al. (2015) found that
Bitcoin shows extremely high returns, but is characterized by high volatility and low correlation to
traditional assets. The high volatility of Bitcoin is well-documented (Blundell-Wignall 2014; Lo and
Wang 2014). Some econometric methods have been applied to predict Bitcoin volatility estimates such
as (Katsiampa 2017; Kim et al. 2016; Kristoufek 2015).
Traditional time series prediction methods include univariate autoregressive (AR), univariate
moving average (MA), simple exponential smoothing (SES), and autoregressive integrated moving
average (ARIMA) (Siami-Namini and Namin 2018). Kaiser (2019) used time series models to investigate
seasonality patterns in Bitcoin trading (Kaiser 2019). While seasonal ARIMA or SARIMA models
are suitable to investigate seasonality, time series models fail to capture long term dependencies in
the presence of high volatility, which is an inherent characteristic of a cryptocurrency market. On
the contrary, machine learning methods like neural networks use iterative optimization algorithms
like “gradient descent” along with hyper parameter tuning to determine the best fitted optima
(Siami-Namini and Namin 2018). Thus, machine learning methods have been applied for asset
price/return prediction in recent years by incorporating non-linearity (Enke and Thawornwong 2005;
Huang et al. 2005; Sheta et al. 2015; Chang et al. 2009) with prediction accuracy higher than traditional
time series models (McNally et al. 2018; Siami-Namini and Namin 2018). However, there is a dearth of
machine learning application in the cryptocurrency price prediction literature. In contrast to traditional
linear statistical models such as ARMA, the artificial intelligence approach enables us to capture the
non-linear property of the high volatile crypto-currency prices.
Examples of machine learning studies to predict Bitcoin prices include random forests
(Madan et al. 2015), Bayesian neural networks (Jang and Lee 2017), and neural networks
(McNally et al. 2018). Deep learning techniques developed by Hinton et al. (2006) have been used
in literature to approximate non-linear functions with high accuracy (Cybenko 1989). There are a
number of previous works that have applied artificial neural networks to financial investment problems
(Chong et al. 2017; Huck 2010). However, Pichl and Kaizoji (2017) concluded that although neural
networks are successful in approximating Bitcoin log return distribution, more complex deep learning
methods such as recurrent neural networks (RNNs) and long short-term memory (LSTM) techniques
should yield substantially higher prediction accuracy. Some studies have used RNNs and LSTM to
forecast Bitcoin pricing in comparison with traditional ARIMA models (McNally et al. 2018; Guo and
Antulov-Fantulin 2018). McNally et al. (2018) showed that RNN and LSTM neural networks predict
prices better than traditional multilayer perceptron (MLP) due to the temporal nature of the more
advanced algorithms. Karakoyun and Çıbıkdiken (2018), in comparing the ARIMA time series model
to the LSTM deep learning algorithm in estimating the future price of Bitcoin, found significantly lower
mean absolute error in LSTM prediction.
In this paper, we focus on two aspects to predict Bitcoin price. We consider a set of exogenous
and endogenous variables to predict Bitcoin price. Some of these variables have not been investigated
in previous research studies on Bitcoin price prediction. This holistic approach should explain whether
Bitcoin is a financial asset. Additionally, we also study and compare RNN models with traditional
machine learning models and propose a GRU architecture to predict Bitcoin price. GRU’s train faster
than traditional RNN or LSTM and have not been investigated in the past for cryptocurrency price
prediction. In particular, we developed a gated recurring unit (GRU) architecture that can learn the
Bitcoin price fluctuations more efficiently than the traditional LSTM. We compare our model with
a traditional neural network and LSTM to check the robustness of the architecture. For application
J.
J. Risk
Risk Financial
Financial Manag.
Manag. 2020,
2020, 13,
13, x23FOR PEER REVIEW 33 of
of 17
16
Figure 1. Architecture
Figure Architectureofofa along
Architecture
1. short-term
longshort-term memory
short-termmemory (LSTM)
memory (LSTM)
(LSTM) cell.
cell. +:
cell.+:
+:“plus”
“plus” operation;◉:◉:
“plus”operation; : “Hadamard
“Hadamard
“Hadamard
product” operation; 𝜎:
σ: “sigmoid”
“sigmoid” function;
function; tanh:
tanh: “tanh”
“tanh” function.
function.
product” operation; 𝜎: “sigmoid” function; tanh: “tanh” function.
The previous
The previous cell
cellstate
statevalue
valueisismultiplied
multiplied with the forget
with the forgetgate
gateoutput
outputand andthen
then added
added pointwise
pointwise
with thethe
with output
outputfrom
fromthe input
theinput gate
inputgate to
gateto generate
togenerate
generate the the new
the newcell
cell state𝑐𝑐ct, , as
cellstate
state as shown
showninin
asshown Equation
inEquation
Equation (1).
(1). The
The
output gate operation consists of two steps:
output gate operation consists of two steps: first, first,
first, the
the previous
previous hidden state and current
previous hidden state and current input values input values
areare
passed
passedthrough
througha asigmoid function;
sigmoidfunction;
function;andand
and secondly,
secondly, thethe last
last obtained
obtainedcell cellstate
statevalues
valuesareare passed
passed
through
through a tanh
a tanh function. Finally,
function.Finally,
Finally,the
thetanh
tanhoutput
output and the the sigmoid
sigmoidoutput
outputare aremultiplied
multiplied toto
produce
produce
thethe
new new hidden
hidden state, whichisiscarried
state,which over to
carriedover to the
the next
next step. Thus,the
step. Thus,
Thus, theforget
the forgetgate,
gate,input
inputgate,
gate,and
and
output
output
output gate
gate decide
decide what
what information
information to
to forget,
forget, what information
information
decide what information to forget, what information to add to
to add
add from
from the
thecurrent
currentstep, and
step, and
what
what information
information toto carryforward
carry forwardrespectively.
respectively.
respectively.
𝑐c𝑐t = 𝑓𝑓ft...c𝑐𝑐t−1 + i𝑖𝑖t .e
.. 𝑐̃𝑐̃
ct
(1)(1)
(1)
J. Risk Financial Manag. 2020, 13, 23 4 of 16
GRU, introduced by Cho et al. (2014), solves the problem of the vanishing gradient with a standard
RNN. GRU is
J. Risk Financial similar
Manag. 2020,to13,
LSTM, but itREVIEW
x FOR PEER combines the forget and the input gates of the LSTM into a 3single of 17
update gate. The GRU further merges the cell state and the hidden state. A GRU unit consists of a
cell containing
architecture. Formultiple operations
application purposeswhich inare repeated and
algorithmic each ofwe
trading, theimplemented
operations could our be a neural
proposed
network. Figure 2 below shows the structure of
architecture to test two simple trading strategies for profitability. a GRU unit consisting of an update gate, reset gate,
J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 4 of 17
and a current memory content. These gates enable a GRU unit to store values in the memory for
2.a Methodology
certain amount of time and use these values to carry information forward, when required, to the
GRU, introduced by Cho et al. (2014), solves the problem of the vanishing gradient with a
current state to update at a future date. In Figure 2 below, the update gate is represented by zt , where
A survey
standard RNN. ofGRU
the current
is similarliterature
to LSTM, on but
neural networks,
it combines thereveals
forget that
and traditional
the input gates neuralof networks
the LSTM
at each step, the input xt and the output from the previous unit ht−1 are multiplied by the weight Wz
have shortcomings in effectively using prior information for
into a single update gate. The GRU further merges the cell state and the hidden state. A GRUfuture predictions (Wang et al. 2015).
unit
and added together, and a sigmoid function is applied to get an output between 0 and 1. The update
RNN
consistsis aofclass ofcontaining
a cell neural networksmultiple which uses their
operations whichinternal state memory
are repeated and each for ofprocessing
the operationssequences.
could
gate addresses the vanishing gradient problem as the model learns how much information to pass
However,
be a neuralRNNs on their
network. Figure own are not
2 below capable
shows of learning
the structure of long-term
a GRU unitdependencies
consisting of and they often
an update gate,
forward. The reset gate is represented by rt in Equation (2), where a similar operation as input gate is
suffer from short-term
reset gate, and a current memory.
memory Withcontent.
long sequences,
These gates especially
enableina time GRUseries
unit modelling
to store valuesand textual
in the
carried out, but this gate in the model is used to determine how much of the past information to forget.
analysis,
memory RNNs suffer from
for a certain amountvanishing
of time gradient
and0 use problems duringtoback
these values carry propagation
information (Hochreiter
forward, when 1998;
The current memory content is denoted by ht , where xt is multiplied by W and rt is multiplied by ht−1
Pascanu
required,etto al.the
2013). If thestate
current gradient value shrinks
to update at a futureto a very
date.small value,2then
In Figure below, the the
RNNs fail togate
update learn is
element wise (Hadamard product operation) to pass only the relevant information. Finally, a tanh
longer
represented by 𝑧 , where
past sequences, thusat having
each step, short-term 𝑥 and the
the inputmemory. Long short-term
output from the memoryprevious unit ℎ and
(Hochreiter are
activation function is applied to the summation. The final memory in the GRU unit is denoted by ht ,
Schmidhuber
multiplied by1997), the weight 𝑊 and
is an RNN architecture with feedback
added together, connections,
and a sigmoid function designed
is applied to regulate
to get an the flow
output
which holds the information for the current unit and passes it on to the network. The computation in
of information.
between 0 and 1.LSTMs
The update are agate
variant of thethe
addresses RNN that are
vanishing explicitly
gradient designed
problem as thetomodel
learn learns
long-term how
the final step is given in Equation (2) below. As shown in Equation (2), if zt is close to 0 ((1 − zt ) close to
dependencies.
much information A single
to passLSTM unit is composed
forward. The reset gate of anisinput gate, a cell,
represented by a𝑟 forget gate (sigmoid
in Equation (2), where layera
1), then most of the current content will be irrelevant and the network will pass the majority of the past
and a tanh
similar layer), and
operation an output
as input gate isgate (Figure
carried 1).but
out, Thethis
gatesgatecontrol
in thethe flow is
model ofused
information in andhow
to determine out
information and vice versa.
of the LSTM
much of the cell.
pastLSTMs
informationare bestto suited
forget.for Thetime-series
current memory forecasting. In the
content forget gate,
is denoted by ℎthe input from
, where 𝑥 is
z = σ(Wz .[ht−1 , xt ])
the previousby
multiplied hidden
W andstate 𝑟 isis multiplied
passed through by tℎ a sigmoid
element function along withproduct
wise (Hadamard the inputoperation)
from the current
to pass
state
only to thegenerate
relevantforget gate output
information. 𝑓 . The
Finally, = σactivation
sigmoid
arttanh (Wr .function t ])regulates
[ht−1 , xfunction values between
is applied 0 and 1; values
to the summation. The
closer to 0 are discarded
final memory in the GRUand unitonly
is denoted by ℎ ,to
values closer 1 areholds
which considered. The inputfor
the information gatetheiscurrent
used tounit updateand
ht = tan h(W.[rt .ht−1, xt ])
e
the cell itstate.
passes on toValues from theThe
the network. previous hiddeninstate
computation the and
finalcurrent
step is state
givenare simultaneously
in Equation (2) below. passedAs
shown in Equation (2), if 𝑧 is close to h0t ((1
through a sigmoid function and a tanh function,
= (−1 −𝑧 )ztclose
and the
).ht−1to+1),output
zt .hthen
t (𝑖 and 𝑐̃ ) from the two
most of the current content will (2) activation be
functions
irrelevantare andmultiplied.
the network In this
willprocess,
pass thethe sigmoid
majority of function
the past decides
information which and information
vice versa.is important
to keep from the tanh output.
Figure
Figure1.2.Architecture
Architectureofofa long short-term
a gated memory
recurring (LSTM)
unit (GRU) cell.+:+: “plus”
unit. operation; ◉:: “Hadamard
“plus” operation; “Hadamard
Figure 2. Architecture of a gated recurring unit (GRU) unit. +: “plus” operation; ◉: “Hadamard
product” operation; 𝜎:
product”operation; σ: “sigmoid” function; tanh: “tanh”
“tanh” function.
function.
product” operation; 𝜎: “sigmoid” function; tanh: “tanh” function.
Bothprevious
The LSTM andcell GRU are efficient
state value at addressing
is multiplied with thethe problem
forget of vanishing
gate output and thengradient
addedthat occurs
pointwise
in long sequence models. GRUs have fewer
with the output from the input gate to generate 𝑧 tensor
𝜎 𝑊the. new 𝑥 state 𝑐 , as shown in Equation (1). The
operations
ℎ ,cell and are speedier to train than LSTMs
(Chunggate
output et al. 2014). consists
operation The neural network
of two models
steps: first, the considered for the
previous hidden Bitcoin
state price prediction
and current are
input values
simple
are neural
passed network
through (NN), function;
a sigmoid LSTM, and 𝑟 GRU.
and 𝜎 𝑊The. ℎ neural
secondly, , 𝑥 last
the networks
obtainedwere trained
cell state withare
values optimized
passed
hyperparameters
through and tested
a tanh function. on the
Finally, thetest
tanhset.output
Finally, thethe
and best performing
sigmoid outputmodel with lowesttoroot
are multiplied mean
produce
ℎ tanh 𝑊. 𝑟 . ℎ , 𝑥
squared
the error (RMSE)
new hidden value was
state, which considered
is carried over toforthe
portfolio strategy
next step. Thus,execution.
the forget gate, input gate, and
ℎ forget,
output gate decide what information to 𝑧 . ℎ information
1 what 𝑧 .ℎ to add from the current step, and
(2)
what information to carry forward respectively.
Both LSTM and GRU are efficient at addressing the problem of vanishing gradient that occurs
in long sequence models. GRUs have fewer𝑐 tensor
𝑓 . 𝑐 operations
𝑖 . 𝑐̃ (1)
and are speedier to train than LSTMs
(Chung et al. 2014). The neural network models considered for the Bitcoin price prediction are simple
J. Risk Financial Manag. 2020, 13, 23 5 of 16
Figure3.3.Time
Figure Time series plot of
series plot ofBitcoin
Bitcoinprice
priceinin USD.
USD.
3.2. Feature
3.2. FeatureSelection
Selection
One of the
One of the most
mostimportant
importantaspects
aspectsofofdata
datamining
miningprocess
processisisfeature
feature selection. Feature selection
selection. Feature selection is
basically concerned
is basically concernedwith extracting
with extractinguseful
usefulfeatures/patterns
features/patterns from datato
from data tomake
makeititeasier
easierforfor machine
machine
learning
learning models to perform their predictions. To check the behavior of the features with respect to to
models to perform their predictions. To check the behavior of the features with respect
Bitcoin
Bitcoinprices, we
prices, weplotted
plottedthe
thedata
datafor
forall
allthe
the20
20features
features for
for the entire time
time period,
period,asasshown
shownininFigure
Figure 4
below. A closer
4 below. looklook
A closer at the
atplot
the reveals that the
plot reveals thatendogenous features
the endogenous are more
features correlated
are more with with
correlated Bitcoin
prices than
Bitcoin the than
prices exogenous features.
the exogenous For the
features. exogenous
For features,
the exogenous Google
features, Googletrends, interest-rates,
trends, and
interest-rates,
and Ripple
Ripple price seems
price seems to bemost
to be the the most correlated.
correlated.
Figure
Figure4. 4.Plot
Plotshowing
showingthe
thebehavior
behavior of
of independent variableswith
independent variables withBitcoin
Bitcoinprice.
price. The
The blue
blue line
line plots
plots
thethe
different features used for Bitcoin price prediction and the orange line plots the Bitcoin price over
different features used for Bitcoin price prediction and the orange line plots the Bitcoin price over
time. Abbreviations:
time. Abbreviations:MACD,
MACD, Moving
Moving average convergencedivergence.
average convergence divergence.
Multicollinearity
Multicollinearityisisoften
oftenanan issue
issue in
in statistical learningwhen
statistical learning whenthethefeatures
features
areare highly
highly correlated
correlated
among
amongthemselves, and
themselves, andthus, thethe
thus, finalfinal
prediction output
prediction is based
output on a on
is based much smaller
a much number
smaller of features,
number of
which may which
features, lead to may
biased inferences
lead to biased(Nawata and Nagase).
inferences (Nawata To andfind the most
Nagase appropriate
1996). features
To find the mostfor
Bitcoin price prediction, the variance inflation factor (VIF) was calculated for the predictor variables
(see Table 1). VIF provides a measure of how much the variance of an estimated regression coefficient is
J. Risk Financial Manag. 2020, 13, 23 7 of 16
increased due to multicollinearity. Features with VIF values greater than 10 (Hair et al. 1992; Kennedy
1992; Marquardt 1970; Neter et al. 1989) is not considered for analysis. A set of 15 features were finally
selected after dropping Bitcoin miner revenue, Metcalf-UTXO, interest rates, lock size and U.S. bond
yields 2-years, and 10-years difference.
in the hidden units resulting in overfitting (Srivastava et al. 2014), dropout was introduced in the
LSTM and GRU layers. Thus, for each training sample the network was re-adjusted and a new set of
neurons were dropped out. For both LSTM and GRU architecture, a recurrent dropout rate (Gal and
Ghahramani 2016) of 0.1 was used. For the two hidden layers GRU, a dropout of 0.1 was additionally
used along with the recurrent dropout of 0.1. The dropout and recurrent dropout rates were optimized
to ensure that the training data was large enough to not be memorized in spite of the noise, and to
avoid overfitting (Srivastava et al. 2014). For the simple NN, two dense layers were used with hidden
nodes 25 and 1. The LSTM layer was modelled with one LSTM layer (50 nodes) and one dense layer (1
node). The simple GRU and the GRU with recurrent dropout architecture comprised of one GRU layer
(50 nodes) and one dense layer with 1 node. The final GRU architecture was tuned with two GRU
layers (50 nodes and 10 nodes) with a dropout and recurrent dropout of 0.1. The optimized batch size
for the neural network and the RNN models are determined to be 125 and 100, respectively. A higher
batch size led to a higher training and validation loss during the learning process.
Figure 5 shows the training and validation loss for the neural network models. The difference
between training loss and validation loss reduces with a dropout and a recurrent dropout for the one
GRU layer model (Figure 5, bottom middle). However, with the addition of an extra GRU layer, the
difference between the training and validation loss increased. After training, all the neural network
models were tested on the test data. The RMSE for all the models on the train and test data are shown in
Table 1. As seen from Table 2, the LSTM architecture performed better than the simple NN architecture
due to memory retention capabilities (Hochreiter and Schmidhuber 1997). As seen from Table 2 the
GRU model with a recurrent dropout generates an RMSE of 0.014 on the training set and 0.017 on the
test set. RNN-GRU performs better than LSTM, and a plausible explanation is the fact that GRUs
are computationally faster with a lesser number of gates and tensor operations. The GRU controls
the flow of information like the LSTM unit; however, the GRU has no memory unit and it exposes
the full hidden content without any control (Chung et al. 2014). GRUs also tend to perform better
than LSTM on less training data (Kaiser and Sutskever 2016) as in the present case, while LSTMs are
more efficient in remembering longer sequences (Yin et al. 2017). We also found that the recurrent
dropout in the GRU layer helped reduce the RMSE on the test data, and the difference of RMSE
between training and test data was the minimum for the GRU model with recurrent dropout. These
results indicate that the GRU with recurrent dropout is the best performing model for our problem.
Recurrent dropouts help to mask some of the output from the first GRU layer, which can be thought as
a variational inference in RNN (Gal and Ghahramani 2016; Merity et al. 2017). The Diebold-Mariano
statistical test (Diebold and S 1995) was conducted to analyze if the difference in prediction accuracy
between a pair of models in decreasing order of RMSE is statistically significant. The p-values, as
reported in Tables 2 and 3, indicate that each of the models reported in decreasing order of RMSE, has
a significantly improved RMSE than its previous model in predicting Bitcoin prices. We also trained
the GRU recurrent dropout model with a lookback period of 15, 45, and 60 days and the results are
reported in Table 3. It can be concluded from Table 3 that the lookback period for 30 days is the optimal
period for the best RMSE results. Figure 6 shows the GRU model with recurrent dropout predicted
Bitcoin price in the test data, as compared to the original data. The model predicted price is higher
than the original price in the first few months of 2019; however, when the Bitcoin price shot up in
June–July 2019, the model was able to learn this trend effectively.
Table 2. Train test root mean squared error (RMSE) of 30 days lookback period for different models.
Most Bitcoin exchanges, unlike stock exchanges, do now allow short selling of Bitcoin, yet this
results in higher volatility and regulatory risks (Filippi 2014). Additionally, volatility depends on how
Figure
Figure5.5. Training and
and validation
validationloss
lossfor
forsimple
simpleneural
neural network
network (NN)
(NN) (top
(top left),
left), LSTMLSTMwithwith dropout
dropout
close the model predictions are to the actual market price of Bitcoin at every point of time. As can be
(top
(topright),
right), GRU (bottom left),
GRU (bottom left),GRU
GRUwithwitha arecurrent
recurrent dropout
dropout (bottom
(bottom middle),
middle), andandGRU GRU
withwith dropout
dropout
seen from Figure 7, Bitcoin prices went down during early June 2019, and the buy-sell strategy
and
andrecurrent
recurrent dropout (bottom right).
dropout (bottom right).
correctly predicted the fall, with the trader selling the Bitcoins holding to keep the cash before
investing
Figureagain whenthe
5 shows the Tablestarts
price
training 3.
and Train test RMSE
rising
validation from for GRU
therecurrent
mid-June.
loss for model. models.
In comparison,
neural network due toThe short selling and
difference
taking long positions
between training loss simultaneously,
and validation the long-short
loss reduces with strategy
a dropout suffered
andTest during
a recurrent the same period of time
dropout for the one
Lookback Period (Days) RMSE Train RMSE p-Value
with
GRUvery
layerslow
model increase
(Figurein5, portfolio value. However,
bottom middle). However, withlong-short strategies
the addition of anmight be more
extra GRU powerful
layer, the
15 0.012 0.016
when we consider
difference between the a portfolio
training andconsisting
validation of loss
multiple cryptocurrencies
increased. After training,where all the investors can take
neural network
45 0.011 0.019 0.0010
simultaneous long and
models were tested short
on the test positions
60 data. in currencies,
The RMSE for all the which
0.011 modelshave
on thesignificant
0.017 train and growth
test
0.0006 potential
data are shown and
in Table 1. currencies.
overvalued As seen from Table 2, the LSTM architecture performed better than the simple NN
architecture due to memory retention capabilities (Hochreiter and Schmidhuber 1997). As seen from
Table 2 the GRU model with a recurrent dropout generates an RMSE of 0.014 on the training set and
0.017 on the test set. RNN-GRU performs better than LSTM, and a plausible explanation is the fact
that GRUs are computationally faster with a lesser number of gates and tensor operations. The GRU
Figure 6. Bitcoin price as predicted by the GRU one-layer model with dropout and recurrent dropout.
Figure 6. Bitcoin price as predicted by the GRU one-layer model with dropout and recurrent dropout.
J. Risk Financial Manag. 2020, 13, x FOR PEER REVIEW 10 of 17
J. Risk Financial Manag. 2020, 13, 23 10 of 16
Most Bitcoin exchanges, unlike stock exchanges, do now allow short selling of Bitcoin, yet this
results in higher volatility and regulatory risks (Filippi 2014). Additionally, volatility depends on how
5. Portfolio
close Strategy
the model predictions are to the actual market price of Bitcoin at every point of time. As can be
seen We
from Figure 7, two
implement Bitcoin pricesstrategies
trading went down during early
to evaluate our June 2019,
results and the buy-sell
in portfolio managementstrategy
of
correctly predicted the fall, with the trader selling the Bitcoins holding
cryptocurrencies. For simplicity, we considered only Bitcoin trading and we assumed thatto keep the cash before
the
investing
trader again
only buyswhen the price
and sells basedstarts
on therising from
signals mid-June.
derived fromIn comparison,
quantitative due to Based
models. short selling
on our and
test
taking long positions simultaneously, the long-short strategy suffered during the
set evaluation, we have considered the GRU one layer with recurrent dropout as our best model same period of time
for
with very slowtrading
implementing increase in portfolio
strategies. Twovalue.
typesHowever,
of tradinglong-short strategies
strategies were might be more
implemented, powerful
as discussed in
when we consider a portfolio consisting of multiple cryptocurrencies where investors
this section. The first strategy was a long-short strategy, wherein the buy signal predicted from the can take
simultaneous
model will leadlong and short
to buying positions
the Bitcoin andina currencies, which
sell signal will have significant
essentially growth potential
lead to short-selling and
the Bitcoin
overvalued currencies.
at the beginning of the day based on the model predictions for that day. If the model predicted price
on a given day is lower than the previous day, then the trader will short sell the Bitcoin and cover them
at the end of the day. An initial portfolio value of 1 is considered and the transaction fees is taken to be
0.8% of the invested or sold amount. Due to daily settlement, the long-short strategy is expected to
incur significant transaction costs which may reduce the portfolio value. The second strategy was a
buy-sell strategy where the trader goes long when a buy signal is triggered and sell all the Bitcoins
when a sell signal is generated. Once the trader sells all the coins in the portfolio, he/she waits for
the next positive signal to invest again. When a buy signal occurs, the trader invests in Bitcoin and
remains invested till the next sell signal is generated.
Most Bitcoin exchanges, unlike stock exchanges, do now allow short selling of Bitcoin, yet this
results in higher volatility and regulatory risks (Filippi 2014). Additionally, volatility depends on how
close the model predictions are to the actual market price of Bitcoin at every point of time. As can
be seen from Figure 7, Bitcoin prices went down during early June 2019, and the buy-sell strategy
correctly predicted the fall, with the trader selling the Bitcoins holding to keep the cash before investing
again when the price starts rising from mid-June. In comparison, due to short selling and taking long
positions simultaneously, the long-short strategy suffered during the same period of time with very
slow increase in portfolio value. However, long-short strategies might be more powerful when we
consider a portfolio consisting of multiple cryptocurrencies where investors can take simultaneous long
and short
Figurepositions
6. Bitcoinin currencies,
price which
as predicted have
by the significant
GRU growth
one-layer model potential
with dropoutand
andovervalued currencies.
recurrent dropout.
Figure 7. Above
Figure 7. Above shows
showsthe
thechange
changeininportfolio value
portfolio over
value time
over when
time whenthe the
strategies long-short
strategies (Left)
long-short and
(Left)
buy-sell (Right) are implemented on the test data. Due to short selling, daily settlement the long-short
and buy-sell (Right) are implemented on the test data. Due to short selling, daily settlement the long-
portfolio incursincurs
short portfolio transaction fees which
transaction reduces
fees which growth
reduces and increases
growth volatility
and increases in the portfolio.
volatility in the portfolio.
6. Conclusions
6. Conclusions
There have been a considerable number of studies on Bitcoin price prediction using machine
There have been a considerable number of studies on Bitcoin price prediction using machine
learning and time-series analysis (Wang et al. 2015; Guo et al. 2018; Karakoyun and Çıbıkdiken 2018;
learning and time-series analysis (Wang et al. 2015; Guo et al. 2018; Karakoyun and Çibikdiken 2018;
Jang and Lee 2017; McNally et al. 2018). However, most of these studies have been mostly based on
Jang and Lee 2017; McNally et al. 2018). However, most of these studies have been mostly based on
predicting the Bitcoin prices based on pre-decided models with a limited number of features like price
J. Risk Financial Manag. 2020, 13, 23 11 of 16
volatility, order book, technical indicators, price of gold, and the VIX. The present study explores
Bitcoin price prediction based on a collective and exhaustive list of features with financial linkages, as
shown in Appendix A. The basis of any investment has always been wealth creation either through
fundamental investment, or technical speculation, and cryptocurrencies are no exception to this. In
this study, feature engineering is performed taking into account whether Bitcoin could be used as an
alternative investment that offers investors diversification benefits and a different investment avenue
when the traditional means of investment are not doing well. This study considers a holistic approach
to select the predictor variables that might be helpful in learning future Bitcoin price trends. The
U.S. treasury two-year and ten-year yields are the benchmark indicators for short-term and long-term
investment in bond markets, hence a change in these benchmarks could very well propel investors
towards alternative investment avenues such as the Bitcoin. Similar methodology can be undertaken
for gold, S&P returns and dollar index. Whether it is good news or bad news, increasing attraction or
momentum-based speculation, google trends, and VIX price data are perfect for studying this aspect of
the influence on the prices.
We also conclude that recurrent neural network models such as LSTM and GRU outperform
traditional machine learning models. With limited data, neural networks like LSTM and GRU can
regulate past information to learn effectively from non-linear patterns. Deep models require accurate
training and hyperparameter tuning to yield results, which might be computationally extensive for
large datasets unlike conventional time-series approaches. However, for stock price prediction or
cryptocurrency price prediction, market data are always limited and computational complexity is not
a concern, and thus shallow learning models can be effectively used in practice. These benefits will
likely contribute significantly to quantitative finance in the coming years.
In deep learning literature, LSTM has been traditionally used to analyze time-series. GRU
architecture on the other hand, seems to be performing better than the LSTM model in our analysis.
The simplicity of the GRU model, where the forgetting and updating is occurring simultaneously,
was found to be working well in Bitcoin price prediction. Adding a recurrent dropout improves the
performance of the GRU architecture; however, further studies need to be undertaken to explore the
dropout phenomenon in GRU architectures. Two types of investment strategies have been implemented
with our trained GRU architecture. Results show that when machine learning models are implemented
with full understanding, it can be beneficial to the investment industry for financial gains and portfolio
management. In the present case, recurrent machine learning models performed much better than
traditional ones in price prediction; thus, making the investment strategies valuable. With proper back
testing of each of these models, they can contribute to manage portfolio risk and reduce financial losses.
Nonetheless, a significant improvement over the current study can be achieved if a bigger data set
is available. Convolutional neural network (CNN) has also been used to predict financial returns in
forecasting daily oil futures prices (Luo et al. 2019). To that end, a potential future research study can
explore the performance of CNN architectures to predict Bitcoin prices.
Author Contributions: Conceptualization, data curation, validation, and draft writing, S.K.; methodology, formal
analysis, draft preparation, and editing, A.D.; draft writing, plots, M.B. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: The authors would like to thank the staff at the Haas School of Business, University of
California Berkeley and Katz Graduate School of Business, University of Pittsburgh for their support. A.D.
Conflicts of Interest: The authors declare no conflicts of interest. The views expressed are personal.
J. Risk Financial Manag. 2020, 13, 23 12 of 16
Appendix A
References
Baek, Chung, and Matt Elbeck. 2015. Bitcoin as an Investment or Speculative Vehicle? A First Look. Applied
Economics Letters 22: 30–34. [CrossRef]
J. Risk Financial Manag. 2020, 13, 23 13 of 16
Barrdear, John, and Michael Kumhof. 2016. The Macroeconomics of Central Bank Issued Digital Currencies. SSRN
Electronic Journal. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2811208 (accessed
on 2 February 2020).
Baronchelli, Andrea. 2018. The emergence of consensus: A primer. Royal Society Open Science 5: 172189. [CrossRef]
Bech, Morten L., and Rodney Garratt. 2017. Central Bank Cryptocurrencies. BIS Quarterly Review 2017: 5–70.
Blau, Benjamin M. 2017. Price Dynamics and Speculative Trading in Bitcoin. Research in Internatonal Business and
Finance 41: 493–99. [CrossRef]
Blundell-Wignall, Adrian. 2014. The Bitcoin Question: Currency versus Trust-less Transfer Technology. OECD
Working Papers on Finance, Insurance and Private Pensions 37: 1.
Bohme, Rainer, Nicolas Christin, Benjamin Edelman, and Tyler Moore. 2015. Bitcoin: Economics, technology, and
governance. Journal of Economic Perspectives (JEP) 29: 213–38. [CrossRef]
Bouri, Elie, Peter Molnár, Georges Azzi, and David Roubaud. 2017. On the hedge and safe haven properties of
Bitcoin: Is it really more than a diversifier? Finance Research Letters 20: 192–98. [CrossRef]
Briere, Marie, Kim Oosterlinck, and Ariane Szafarz. 2015. Virtual currency, tangible return: Portfolio diversification
with bitcoin. Journal Asset Management 16: 365–73. [CrossRef]
Cagli, Efe C. 2019. Explosive behavior in the prices of Bitcoin and altcoins. Finance Research Letters 29: 398–403.
[CrossRef]
Casey, Michael J., and Paul Vigna. 2015. Bitcoin and the digital-currency revolution. The Wall Street Journal.
Available online: https://www.wsj.com/articles/the-revolutionary-power-of-digital-currency-1422035061
(accessed on 2 February 2020).
Chang, Pei-Chann, Chen-Hao Liu, Chin-Yuan Fan, Jun-Lin Lin, and Chih-Ming Lai. 2009. An Ensemble of Neural
Networks for Stock Trading Decision Making. In Emerging Intelligent Computing Technology and Applications.
With Aspects of Artificial Intelligence 5755 of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer,
pp. 1–10. Available online: https://doi.org/10.1007/978-3-642-04020-7_1 (accessed on 2 February 2020).
Cheah, Eng-Tuck, and John Fry. 2015. Speculative bubbles in Bitcoin markets? An empirical investigation into the
fundamental value of Bitcoin. Economics Letters 130: 32–36. [CrossRef]
Chen, Zheshi, Chunhong Li, and Wenjun Sun. 2020. Bitcoin price prediction using machine learning: An approach
to sample dimension engineering. Journal of Computational and Applied Mathematics 365: 112395. [CrossRef]
Cho, Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Bougares Fethi, Schwenk Holger,
and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical
Machine Translation. Available online: https://arxiv.org/abs/1406.1078.pdf (accessed on 2 February 2020).
Chollet, Francois. 2015. Keras: Deep Learning for humans. Available online: https://github.com/keras-team/keras
(accessed on 2 February 2020).
Chong, Eunsuk, Chulwoo Han, and Frank C. Park. 2017. Deep learning networks for stock market analysis and
prediction: methodology, data representations, and case studies. Expert System with Applications 83: 187–205.
[CrossRef]
Chung, Junyoung, Caglar Gulcehre, Kyung H. Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated
Recurrent Neural Networks on Sequence Modeling. Available online: https://arxiv.org/pdf/1412.3555.pdf
(accessed on 2 February 2020).
Ciaian, Pavel, Miroslava Rajcaniova, and d’Artis Kancs. 2016. The economics of Bitcoin price formation. Applied
Economics 48: 1799–815. [CrossRef]
Corbet, Shaen, Brian Lucey, Maurice Peat, and Samuel Vigne. 2018. Bitcoin Futures—What use are they? Economics
Letters 172: 23–27. [CrossRef]
Cusumano, Michael A. 2014. The Bitcoin ecosystem. Communications of the ACM 57: 22–24. [CrossRef]
Cybenko, George. 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems
2: 303–14. [CrossRef]
Diebold, Francis X., and Mariano Roberto S. 1995. Comparing Predictive Accuracy. Journal of Business and Economic
Statistics 13: 253–63.
Dow, Sheila. 2019. Monetary Reform, Central Banks and Digital Currencies. International Journal of Political
Economy 48: 153–73. [CrossRef]
Dyhrberg, Anne H. 2016. Bitcoin, gold and the dollar-A Garch volatility. Finance Research Letters 16: 85–92.
[CrossRef]
J. Risk Financial Manag. 2020, 13, 23 14 of 16
Dwyer, Gerald P. 2015. The economics of Bitcoin and similar private digital currencies. Journal of Financial Stability
17: 81–91. [CrossRef]
ElBahrawy, Abeer, Laura Alessandretti, Anne Kandler, Romualdo Pastor-Satorras, and Andrea Baronchelli. 2017.
Evolutionary dynamics of the cryptocurrency market. Royal Society Open Science 4: 170623. [CrossRef]
[PubMed]
Enke, David, and Suraphan Thawornwong. 2005. The use of data mining and neural networks for forecasting
stock market returns. Expert Systems with Applications 29: 927–40. [CrossRef]
Fama, Marco, Andrea Fumagalli, and Stefano Lucarelli. 2019. Cryptocurrencies, Monetary Policy, and New Forms
of Monetary Sovereignty. International Journal of Political Economy 48: 174–94. [CrossRef]
Fantacci, Luca. 2019. Cryptocurrencies and the Denationalization of Money. International Journal of Political
Economy 48: 105–26. [CrossRef]
Filippi, Primavera De. 2014. Bitcoin: A Regulatory Nightmare to a Libertarian Dream. Internet Policy Review 3.
[CrossRef]
Gajardo, Gabriel, Werner D. Kristjanpoller, and Marcel Minutolo. 2018. Does Bitcoin exhibit the same asymmetric
multifractal cross-correlations with crude oil, gold and DJIA as the Euro, Great British Pound and Yen? Chaos,
Solitons & Fractals 109: 195–205.
Gal, Yarin, and Zoubin Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty
in Deep Learning. Available online: https://arxiv.org/pdf/1506.02142.pdf (accessed on 2 February 2020).
Gal, Yarin, and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural
networks. Advances in Neural Information Processing Systems 2016: 1019–27.
Gandal, Neil, and Hanna Halaburda. 2016. Can we predict the winner in a market with network effects?
Competition in cryptocurrency market. Games 7: 16. [CrossRef]
Guo, Tian, Albert Bifet, and Nino Antulov-Fantulin. 2018. Bitcoin volatility forecasting with a glimpse into buy
and sell orders. Paper presented at 2018 IEEE International Conference on Data Mining (ICDM), Singapore,
November 17–20.
Guo, Tian, and Nino Antulov-Fantulin. 2018. Predicting Short-Term Bitcoin Price Fluctuations from Buy and Sell
Orders. Available online: https://arxiv.org/pdf/1802.04065v1.pdf (accessed on 2 February 2020).
Hair, Joseph F., Rolph E. Anderson, and Ronald L. Tatham. 1992. Multivariate Data Analysis, 3rd ed. New York:
Macmillan.
Hileman, G, and M. Rauchs. 2017. Global Cryptocurrency Bench- marking Study. Cambridge Centre for Alternative
Finance. Available online: https://www.jbs.cam.ac.uk/fileadmin/user_upload/research/centres/alternative-
finance/downloads/2017-04-20-global-cryptocurrency-benchmarking-study.pdf (accessed on 2 February
2020).
Hinton, Geoffrey E., Simon Osindero, and Yee-Whye The. 2006. A fast learning algorithm for deep belief nets.
Neural Computation 18: 1527–54. [CrossRef]
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9: 1735–80.
[CrossRef]
Hochreiter, Sepp. 1998. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem
Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6: 107–16. [CrossRef]
Huang, Wei, Yoshiteru Nakamori, and Shou-Yang Wang. 2005. Forecasting stock market movement direction with
support vector machine. Computers & Operations Research 32: 2513–22.
Huck, Nicolas. 2010. Pairs trading and outranking: The multi-step-ahead forecasting case. European Journal of
Operational Research 207: 1702–16. [CrossRef]
Jang, Huisu, and Jaewook Lee. 2017. An Empirical Study on Modeling and Prediction of Bitcoin Prices with
Bayesian Neural Networks Based on Blockchain Information. IEEE Access 6: 5427–37. [CrossRef]
Kaiser, Lukasz, and Ilya Sutskever. 2016. Neural GPUS Learn Algorithms. Available online: https://arxiv.org/pdf/
1511.08228.pdf (accessed on 2 February 2020).
Kaiser, Lars. 2019. Seasonality in cryptocurrencies. Finance Research Letters 31: 232–38. [CrossRef]
Karakoyun, Ebru Şeyma, and Ali Osman Çıbıkdiken. 2018. Comparison of ARIMA Time Series Model and
LSTM Deep Learning Algorithm for Bitcoin Price Forecasting. Paper presented at the 13th Multidisciplinary
Academic Conference in Prague 2018 (The 13th MAC 2018), Prague, Czech Republic, May 25–27.
J. Risk Financial Manag. 2020, 13, 23 15 of 16
Karasu, Seçkin, Aytaç Altan, Zehra Saraç, and Rifat Hacioğlu. 2018. Prediction of Bitcoin prices with machine
learning methods using time series data. Paper presented at 26th Signal Processing and Communications
Applications Conference (SIU), Izmir, Turkey, May 2–5.
Katsiampa, Paraskevi. 2017. Volatility estimation for Bitcoin: A comparison of GARCH models. Economics Letters
158: 3–6. [CrossRef]
Kennedy, Peter E. 1992. A Guide to Econometrics. Oxford: Blackwell.
Kim, Young B., Jun G. Kim, Wook Kim, Jae H. Im, Tae H. Kim, Shin J. Kang, and Chang H. Kim. 2016. Predicting
Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLoS ONE 11: e0161197.
[CrossRef]
Kingma, Diederik P., and Jimmy Ba. 2015. Adam: A method for stochastic optimization. arXiv 2015: 9.
Krafft, Peter M., Nicolas D. Penna, and Alex S. Pentland. 2018. An Experimental Study of Cryptocurrency Market
Dynamics. Paper presented at CHI Conference, Montreal, QC, Canada, April 21–26.
Kristoufek, Ladoslav. 2015. What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence
Analysis. PLoS ONE 10: e0123923. [CrossRef]
Lawrence, Steve, Giles C. Lee, and Ah C. Tsoi. 1997. Lessons in Neural Network Training: Overfitting May be
Harder than Expected. In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Menlo Park:
AAAI Press, pp. 540–45.
Lo, Stephanie, and J. Christina Wang. 2014. Bitcoin as Money? Working Paper 14. Boston, MA, USA: Federal
Reserve Bank of Boston.
Luo, Zhaojie, Xiaojing Cai, Katsuyuki Tanaka, Tetsuya Takiguchi, Takuji Kinkyo, and Shigeyuki Hamori. 2019.
Can we forecast daily oil futures prices? Experimental evidence from convolutional neural networks. Journal
of Risk and Financial Management 12: 9. [CrossRef]
Madan, Issax, Saluja Shaurya, and Zhao Aojja. 2015. Automated Bitcoin Trading via Machine Learning Algorithms.
Available online: https://pdfs.semanticscholar.org/e065/3631b4a476abf5276a264f6bbff40b132061.pdf (accessed
on 2 February 2020).
Malherbe, Leo, Matthieu Montalban, Nicolas Bedu, and Caroline Granier. 2019. Cryptocurrencies and Blockchain:
Opportunities and Limits of a New Monetary Regime. International Journal of Political Economy 48: 127–52.
[CrossRef]
Marquardt, Donald W. 1970. Generalized inverses, ridge regression, biased linear estimation, and nonlinear
estimation. Technometrics 12: 591–612. [CrossRef]
McNally, Sean, Jason Roche, and Simon Caton. 2018. Predicting the Price of Bitcoin Using Machine Learning.
Paper presented at 26th Euromicro International Conference on Parallel, Distributed and Network-based
Processing (PDP), Cambridge, UK, March 21–23.
Merity, Stephen, Nitish S. Keskar, and Richard Socher. 2017. Regularizing and Optimizing LSTM Language
Models. Available online: https://arxiv.org/abs/1708.02182 (accessed on 2 February 2020).
Muzammal, Muhammad, Qiang Qu, and Bulat Nasrulin. 2019. Renovating blockchain with distributed databases:
An open source system. Future Generation Computer Systems 90: 105–17. [CrossRef]
Nakamoto, Satoshi. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System. Available online: https://bitcoin.org/
bitcoin.pdf (accessed on 2 February 2020).
Nawata, Kazumitsu, and Nobuko Nagase. Estimation of sample selection bias models. Econometric Reviews 15: 4.
[CrossRef]
Neter, John, William Wasserman, and Michael H. Kutner. 1989. Applied Linear Regression Models. Homewood:
Irwin.
Pichl, Lukas, and Taisei Kaizoji. 2017. Volatility Analysis of Bitcoin Price Time Series. Quantitative Finance and
Economics 1: 474–85. [CrossRef]
Poyser, Obryan. 2017. Exploring the Determinants of Bitcoin’s Price: An Application of Bayesian Structural Time
Series. Available online: https://arxiv.org/abs/1706.01437 (accessed on 2 February 2020).
Pascanu, Razvan, Tomas Mikolov, and Yochus Bengio. 2013. On the Difficulty of Training Recurrent Neural
Networks. Available online: https://arxiv.org/pdf/1211.5063.pdf (accessed on 2 February 2020).
Rogojanu, Angel, and Liana Badeaetal. 2014. The issue of competing currencies. Case study bitcoin. Theoretical
and Applied Economics 21: 103–14.
Selmi, Refk, Walid Mensi, Shawkat Hammoudeh, and Jamal Boioiyour. 2018. Is Bitcoin a hedge, a safe haven or a
diversifier for oil price movements? A comparison with gold. Energy Economics 74: 787–801. [CrossRef]
J. Risk Financial Manag. 2020, 13, 23 16 of 16
Sheta, Alaa F., Sara Elsir M. Ahmed, and Hossam Faris. 2015. A comparison between regression, artificial neural
networks and support vector machines for predicting stock market index. Soft Computing 7: 8.
Siami-Namini, Sima, and Akbar S. Namin. 2018. Forecasting Economics and Financial Time Series: ARIMA vs.
LSTM. Available online: https://arxiv.org/abs/1803.06386v1 (accessed on 2 February 2020).
Sovbetov, Yhlas. 2018. Factors influencing cryptocurrency prices: Evidence from bitcoin, ethereum, dash, litcoin,
and monero. Available online: https://mpra.ub.uni-muenchen.de/85036/1/MPRA_paper_85036.pdf (accessed
on 2 February 2020).
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhesvsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout:
A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15: 1929–58.
Wang, Lin, Yi Zeng, and Tao Chen. 2015. Back propagation neural network with adaptive differential evolution
algorithm for time series forecasting. Expert Systems with Applications 42: 855–63. [CrossRef]
White, Lawrence H. 2015. The market for cryptocurrencies. The Cato Journal 35: 383–402. [CrossRef]
Yin, Wenpeng, Katharina Kann, Mo Yu, and Hinrich Schutze. 2017. Comparative Study of CNN and RNN
for Natural Language Processing. Available online: https://arxiv.org/pdf/1702.01923.pdf (accessed on 2
February 2020).
Yelowitz, Aaron, and Matthew Wilson. 2015. Characteristics of Bitcoin users: an analysis of Google search data.
Applied Economics Letters 22: 1030–36. [CrossRef]
Yu, Lean, Kin K. Lai, Shouyang Wang, and Wei Huang. 2006. A Bias-Variance-Complexity Trade-Off Framework
for Complex System Modeling. In Computational Science and Its Applications-ICCSA 2006. Lecture Notes in
Computer Science. Berlin/Heidelberg: Springer, Volume 3980.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).