0% found this document useful (0 votes)

100 views13 pages

Deep Learning in Investment Data

Deep learning has potential to help investors analyze unstructured data like financial documents and news. The paper discusses how transfer learning allows models trained on large general datasets to be applied to specialized investment applications using smaller financial datasets. It also presents two case studies: one shows how transfer learning improves earnings call sentiment analysis, and the other uses a dataset of 1,000 strategies to show limitations of using deep learning directly for asset price prediction.

Uploaded by

Vincy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views13 pages

Deep Learning in Investment Data

Uploaded by

Vincy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Learning in Investing:

Opportunity in Unstructured Data

July 2020 Executive Summary
We discuss the potential role of deep learning in investment management. We explain
how deep learning can help investors streamline their consumption of unstructured
Kai Wu data. We apply transfer learning to adapt models originally trained on large-scale,
Founder & Chief Investment Officer out-of-domain datasets for highly specialized investment applications. Transfer
kai@[Link] learning allows even small niche firms to harness the massive resources of big tech
companies. Despite its transformative potential in unstructured data, most investors
are still trying to apply deep learning directly to asset price prediction. We run
simulations on a large panel of alphas to demonstrate the limitations of this approach.

Introduction
Deep learning is a machine learning technique utilizing Investing is a niche industry with specialized documents
complex, multi-layered statistical models, o en with tens of only accessible to highly-trained domain experts. Transfer
millions or billions of parameters. Its recent ascent has been learning helps us transcend this limitation by bringing in
fueled by the rise of vast datasets and cheap computing. knowledge gained from bigger, broader domains. Transfer
learning lowers barriers to entry, so that deep learning is no
Deep learning is widely used in the fields of computer vision, longer the plaything of the big tech oligopoly. Multimillion-
natural language processing, and speech recognition, which dollar datasets and hardware not required!
are characterized by large, complex, unstructured datasets.
However, it has seen limited adoption in investment This paper revolves around two practical investment case
management. We believe this is because most investors are studies. First, we show how transfer learning can be used to
still trying to use it on traditional structured data to directly produce state-of-the-art results in earnings call sentiment
predict asset prices. However, structured financial data is analysis. Second, use a proprietary dataset of 1,000 alphas
not fertile ground for deep learning. to show the limitations of using deep learning directly to
predict asset prices.
Exhibit 1
Powered By Deep Learning
Part 1 Unstructured Data
Warning: Natural language processing (NLP) is an extremely
fast-moving field and it is possible that some of the ideas here
may become outdated or even contradicted in the near future.
Source: Sparkline, Waymo, Apple
From Word Vectors to Language Models
In general, artificial intelligence begins its wave of disruption
by first automating the most routine parts of our jobs. A Our June 2019 paper, Investment Management in the
significant portion of the financial analyst’s day is spent Machine Learning Age, discussed word embeddings
reading textual documents ranging from financial news to (word2vec). Introduced in 2013, word embeddings are
broker research. In the age of big data, this has become an matrices that encode the relationships between words. We
increasingly overwhelming task. Fortunately, deep learning showed the graphic below, which illustrates how the words
can greatly streamline the way we consume this data. used in 10-Ks cluster based on common meaning.

1

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 2
10-K Word Embeddings

Source: Sparkline, SEC

Word embeddings are matrices trained using a simple
However, word embeddings have a major limitation -- each two-layer neural network architecture. In order to capture
word can only have a single vector representation and thus a context, we can use deeper neural networks (i.e., add more
single meaning. They cannot utilize the full value of context. layers). The additional layers allow the model to learn more
Words derive much of their meaning from context. complex semantic representations.

Exhibit 3 Recall that word embeddings are trained using the
The Value of Context “continuous bag of words” algorithm, which tries to predict

a word using its neighbors. The deep learning models are
Margin profit margin vs. margin for error
trained to perform a conceptually identical task, called
Cloud cloud computing vs. cloud cover language modeling. Language models estimate the
probability distribution of words given previous or future
Turnover employee turnover vs. share turnover words. One representative training method is the “masked

Source: Sparkline language modeling” approach used by the popular BERT
model, in which we randomly mask words and train the
model to predict them from context.

2

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 4 Transfer Learning
Guess What’s Behind the [MASK]?
The big breakthrough came in early 2018 when language
A statistical language model is a probability [MASK] over modeling was combined with transfer learning. The idea
sequences of [MASK]. Given such a [MASK], it assigns a behind transfer learning is to first “pre-train” a model on a
probability to the [MASK] sequence. large general-purpose dataset, then “fine tune” it on a
smaller domain-specific dataset for a specialized task. In our
Source: Sparkline, Wikipedia example above, we could pre-train GPT-2 on 8 million web
pages then fine tune it on our 100,000 10-Ks. This avoids
A er training on millions of documents, language models having to train the model from scratch on a small dataset.
can do some cool stuﬀ. The most obvious application is
autocompletion, where we guess the word (or sequence of Exhibit 6
words) given a prompt. Transfer Learning

Exhibit 5
Autocompletion

Source: Google

One important feature of language models is that they do
not require humans to manually create the training data. Source: Sparkline
Text can be automatically parsed into training examples,
such as by randomly masking words. This enables us to Language models are extremely useful for the pre-training
cheaply create massive training corpuses from millions of stage of transfer learning. It turns out the ability to predict
websites, books, articles, and other written media. words requires a significant level of semantic awareness.
This broad linguistic understanding is foundational for many
However, moving from word embeddings to language other NLP tasks. For example, tasks as disparate as
models has its drawbacks. More complex models are more translation, question answering, and named entity
powerful but require more data and compute to train. Given recognition all benefit from starting with a pre-trained
its simple architecture, we showed that word2vec produced language model.
impressive results when trained on a relatively small sample
of 100,000 10-Ks. And the training process took only a few In practice, fine tuning involves starting with a pre-trained
minutes on standard hardware. language model and swapping out the final layer,
exchanging it for the specific building block that meets your
By comparison, the language model GPT-2 has 1.5 billion needs. For example, if we want to do classification, we
parameters and was trained on 8 million web pages. It has replace the final layer of the language model with a classifier
been estimated that training GPT-2 cost $20-50K in head. We then retrain the model for the new task, adjusting
computing budget spent over 1-10 months. Even putting the existing model’s weights to incorporate learnings from
aside time and money, there simply aren’t enough 10-Ks in the fine-tuning dataset.
existence to train a model of this size. We could of course use
a smaller model, but then we would have to sacrifice
performance.

3

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 7 We are extremely blessed that the NLP research community
One Model, Many Uses has embraced the open source philosophy. Anyone can

freely download massive language models that have been
pre-trained on millions of documents. This saves hundreds
of thousands of dollars, weeks of training time, and the
redundancy of researchers constantly having to reinvent the
wheel. With the heavy li ing out of the way, the fine tuning
process is quite cheap and tractable even for less-resourced
teams.

Source: Sparkline

The NLP 🚀
One way to better understand fine tuning is by analogy to The combination of language modeling and transfer
computer vision, where transfer learning had been widely learning opened the floodgates for a wave of innovation.
applied prior to its crossover to NLP. In these models, the Over the past couple years, Google, Facebook, Microso ,
lower layers capture basic features such as edges and OpenAI and others have introduced a succession of models
textures, while the higher layers depict more complete building on this foundational concept.
objects such as eyes, faces, legs, and dogs.
These models have gotten bigger and bigger as datasets,
In our context, the lower layers of the neural network computing resources and modeling techniques have
capture the fundamental building blocks of language (e.g. improved (Exhibit 9). In Feb 2018, the state-of-the-art ELMo
words), while the higher layers contain higher-level linguistic model had 94 million parameters. By Oct 2019, the T5
concepts. The final layer is dedicated to our specific task. transformer had pushed the frontier to 11 billion
Fine tuning allows our model to utilize the fundamental parameters. Last month, GPT-3 was released with 175 billion
knowledge from earlier layers, while adjusting the end parameters. The exponential trendline shows that we have
output to our specific task. experienced a 10x increase in model size every 8.5 months
since pre-trained language models were introduced in 2018.
Exhibit 8
Lower Layers Encode Lower-Level Features

Source: Sparkline, Distill (h/t Sebatian Ruder)

4

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 9
NLP 🚀

Source: Sparkline (Adapted from HuggingFace)

This new wave of models has delivered state-of-the-art Exhibit 10
results across every NLP benchmark. For example, Exhibit 10 SQuAD 2.0 Leaderboard
shows progress on SQuAD 2.0, a reading comprehension
benchmark where crowdworkers pose questions based on a
set of Wikipedia articles.

BERT produced an inflection point when it was introduced in
Oct 2018, breaking previous records and paving the way for
the dozens of BERT descendants (e.g., ALBERT, RoBERTa,
SemBERT) that have dominated the leaderboard ever since.
In Mar 2019, another milestone was achieved as deep
learning surpassed human performance on the test for the

first time (86.8% accuracy).
Source: Sparkline, Papers With Code

5

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

These breakthroughs have made their way into the real “Understood. I'd say that we probably lost $0.5 million
world. In Oct 2019, Google converted its search engine to to $0.75 million in the fourth quarter of the year due to
BERT. The results are so good that researchers are being some of those headwinds as an approximation for the
forced to confront their ethical implications. For example, combination of outages, weathers and the like.”
OpenAI decided to release GPT-2 in multiple phases to give

- Vincent J. Arnone, Chairman, CEO, and President, Fuel

cybersecurity oﬀicers more time to set up defenses against
Tech, Inc. (Mar 30, 2020)
bad actors attempting to use the model to spread fake news.
Sentiment: Negative
These technologies are clearly very useful for big tech
companies with enormous datasets. But are they also useful Source: Sparkline, S&P
for investment firms with smaller and more specialized
datasets? We will address this question now.
The Small Data Problem
Earnings Call Sentiment For supervised learning tasks such as sentiment analysis,
the biggest challenge is o en in obtaining large, high-quality
Sentiment analysis is a foundational NLP task. It involves labeled datasets. Data Labeling is the process of associating
classifying text into sentiment categories (e.g., positive vs. each text with a “ground truth” target (i.e., positive or
negative). Sentiment analysis allows us to convert complex negative sentiment).
unstructured data into concise numerical ratings. This is a
valuable tool for investors trying to avoid being drowned by Researchers found a hack to create large labeled datasets for
the modern firehose of information. sentiment analysis: crowdsourced online reviews. For
example, the IMDb Large Movie Review Dataset consists of
We use earnings calls as our test case. These quarterly calls 50K labeled movie reviews and is widely used in the field.
are a forum for public company executives to discuss their
financial results and outlook for the future. While regulatory Exhibit 12
filings, such as 10-Ks, tend to be written using standard IMDb Reviews: 🔋Included
templates and boilerplate legal language, earnings calls
oﬀer executives greater latitude to express sentiment
(whether intentionally or not).

The Q&A section of the conference call tends to be
particularly informative. For example, consider the two
highly polarized comments by CEOs in response to analyst
questions in recent calls.

Exhibit 11
Earnings Call Sentiment

“Yes. So we've never really disclosed beds per door,
Source: IMDb
anything like that. What I will say is, we actually just

completed a pretty big deep dive on this with cohort
These reviews arrive labeled right out of the box. Each
views. And no matter how we cut it, we are continuing
review has a star rating from 1 to 10. Researchers use similar
to see same-store sales increase, which is terrific, and
techniques to compile large training datasets from Yelp and
Q4 was no exception to that. So our strength in the
Amazon.
marketplace continues to grow.”

- Joe Megibow, CEO, Purple Innovation Inc. (Mar 13, 2020) However, not all datasets come so nicely pre-packaged. In
general, you will only have the raw text and be required to
Sentiment: Positive undertake the manual and time-consuming labeling process

6

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

yourself. Given its specificity, it shouldn’t be a surprise that So far, machine learning has failed. In order to get
no open-source dataset of earnings calls with binary reasonable results, we would need a significantly larger
sentiment labels exists. training sample. But now let’s see if transfer learning can
help. We use the training progression below.
This leads us to two more challenges faced by those in niche
domains such as investing. First, it is a general principle that Exhibit 13
cost per label increases with domain specificity. While pretty Transfer Learning for Earnings Call Sentiment
much anyone can identify images of stop signs, it requires
years of training to recognize signs of financial frauds.
Second, even if money were no object, large datasets in
niche industries may simply not exist. For instance, there are
only a finite number of observations on which to train a
model to find the next Enron Wirecard.

While the media are obsessed with hyping “big data”, in

many cases it is unrealistic to simply throw more data at the Source: Sparkline
problem. We may be better served working to extract the
most insight from the limited data we do have. BERT was originally pre-trained to perform language
modeling on a large corpus of books and Wikipedia articles.
Cross-Training for Computers Instead of initializing our model with random values, we can
use these pre-trained weights. But books and Wikipedia
With this in mind, we ran an experiment to see how well we articles diﬀer greatly from earnings calls in structure, tone,
could do in an extremely data-constrained environment. We and vocabulary. Thus, we continue BERT’s education. This
labeled 100 earnings call transcript snippets by hand, time we have it read earnings call transcripts. Fortunately,
classifying each as positive or negative. We used 50 to train language model training does not require us to manually
the model and 50 to evaluate its out-of-sample label any data. Thus, we can give BERT tens of thousands of
performance. Compared to the 25,000 training samples in unlabeled transcripts to study without our supervision.
IMDb, a 50 observation training set is extremely small.
BERT now understands both general english language and
We used BERT as our representative deep learning model. financial jargon. However, it has never done sentiment
BERT has 340 million parameters so it should be no surprise analysis. We correct this with one final transfer learning step.
that training on 50 observations did not work. We achieved We train BERT on the IMDb dataset from earlier. While movie
testing accuracy of 54%, indistinguishable from random reviews are quite diﬀerent from earnings calls, the
chance. For comparison, we also trained a simpler model -- sentiment analysis task is highly relevant. Think of all these
logistic regression. This also did not work. Natural language steps as cross-training for computers. Putting in thousands
is very complex. of reps in the pre-season allows BERT to perform on game
day.
As a benchmark, we tested the old-school dictionary
approach. We used the Loughran-McDonald lexicon, which
was created by two finance professors and is widely used in
the industry. We classified texts based on the net occurrence
of positive and negative words. Loughran-Mcdonald
achieved a respectable accuracy of 68%. In a sense,
dictionary methods are a form of transfer learning. Instead
of artificial neural networks, we rely on Profs. Loughran and
McDonald’s actual neurons, pre-trained over their many
years of experience in the field.

7

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 14
The Power of Transfer Learning

Source: Sparkline

We find that each transfer learning step increases the Exhibit 15
performance of the model. With all three, we achieve 89% Transfer Learning in the Matrix
accuracy. This is a full 21 percentage points better than
Loughran-McDonald. This result is kind of incredible. We
spent an hour labeling and now have a model that can
extract transcript sentiment automatically with much
greater accuracy than the current industry standard.

BERT and its successors are extremely large models. Thus, Source: Sparkline, The Matrix

one might assume they are only useful for huge companies
like Google or Facebook with their billions of search records
and social interactions. The beauty of transfer learning is
that it allows us to take advantage of the vast resources
baked into pre-trained language models for use with small,
specialized datasets.

The fundamental techniques demonstrated here can be
used for many other NLP tasks besides sentiment analysis.
Pre-trained language models are an incredibly powerful
tool, and we encourage you to think about other ways it can
be applied to improve the way we utilize unstructured data
in our industry.

8

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Part 2 Structured Data

Financial market data’s low signal-to-noise ratio is a huge
We now address the question that is probably on your mind, problem for machine learning models. Noise greatly dilutes
“Why not just directly apply deep learning to stock price the number of eﬀective observations in a dataset. As the old
prediction?” saying goes, “a cat image is worth a thousand EV/EBITDA
ratios!” Size should not be measured in rows or bytes, but
instead as the amount of signal we can hope to extract from
The Signal and the Noise the data.

These days, you can’t speak to a quant without him bragging
Except for high frequency trading, where we may be able to
about the size of his data. However, when it comes to
compensate for noise with an extremely high quantity of
training machine learning models, size isn’t everything.
data, we should be skeptical about applying deep learning

directly to asset price prediction. As we will soon see, the
Financial markets are extremely noisy places, where millions
problem of small data is compounded by the need of deep
of buyers and sellers converge in a chaotic, unstable
learning models for extremely large datasets.
equilibrium. Even the best investors are unable to forecast
stock prices with a great degree of accuracy. Investing lies in
a realm adjacent to pure noise, where even a 55% hit rate The Right Tool for the Job
makes one a top investor.
Deep learning can theoretically uncover much more
Noise makes it more diﬀicult to train machine learning complex relationships in data than traditional statistical
models. Noise dilutes the signal within a dataset, making the models. However, more powerful models also require more
model more likely to be fooled by randomness. Noise is the data to avoid overfitting. Overfitting occurs when you have
main reason deep learning has not gained traction in many trainable parameters relative to training observations.
investing. Just consider the areas where deep learning is It leads to models that look great in sample but do not
most widely used: computer vision and natural language generalize in real life.
processing. These datasets are much less noisy, as
evidenced by their much higher obtainable accuracies.

Exhibit 16
Financial Markets are Noisy

Source: Sparkline, Papers With Code

9

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 17
If Goldilocks Were a Statistician
Optimal Complexity
We illustrate this point empirically using our own data.
Sparkline has a library of thousands of alphas. These range
from standard quant factors like price-to-book ratios to
proprietary signals derived from crawling the public
internet. We use a random subset of 1,000 of these signals
for the experiment below.

Source: edpresso
Neural networks can be viewed as linear regression with

more layers. Conversely, linear regression can be viewed as
Every dataset has an optimal level of model complexity.
a neural network with only one layer. Thus, we begin with
Overly simple models underfit, failing to capture all the
linear regression and successively build more complex
nuances of the data. Overly complex models overfit, failing
architectures. We use feedforward neural networks with
to work out of sample.
batch normalization, ReLu, and dropout. Don’t worry about

the details -- the main takeaway is that these networks get
The point at which optimal complexity is achieved depends
more complex as we add depth.
on the size of the dataset. Bigger datasets can sustain more

complex models. The extremely stylized chart below
Exhibit 19
illustrates this point.
Neural Networks of Increasing Complexity

Exhibit 18
Model Complexity Should Match Data Size

Source: Sparkline, NN-SVG

Our 1,000 signals cover the Russell 3000 stock universe for
the 18-year period ending 3/31/2020. We use 2002-2012 as
the training period, 2012-2016 as the validation period, and
2016-2020 as the out-of-sample test period. We abstain from
standard cross-validation in order to maintain the temporal
order of the data.

Source: Sparkline
We train our models with a standard regression objective.

The goal is to predict the next month’s return for each stock
We have already shown that financial datasets are quite
relative to the market. We build a market-neutral strategy
small a er taking into account the dilutive eﬀect of their low
from the model’s predictions. The next exhibit shows the
signal-to-noise ratios. Thus, we should expect their optimal
simulated returns of each strategy (without fees or
model complexity to be rather low based on theory alone.
transaction costs).

10

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 20 Sharpe Ratio is lower in the test period than the validation
Simulated Strategy Returns period. This is expected as the validation period has the
benefit of hindsight and alphas should naturally decay as
they are discovered in the latter period.

Optimal complexity in the test period is achieved at 3 layers.
This implies that linear regression is too simple. It does not
capture the full intricacies of the data. On the other hand,
the 5-layer neural network is too complex. It overfits the
data so badly that, despite an incredible backtest, it
performs only a bit better than linear regression out of
sample.

Our optimal model produced a Sharpe Ratio of 1.6. This is a
meaningful improvement over linear regression, which
delivered a Sharpe Ratio of 1.0. We can conclude there is
room for improvement moving beyond the “simple” tier of
model complexity but venturing too far into the “complex”
zone leads to overfitting.
Source: Sparkline
Shallow Deep Learning
The le panel contains the validation period. The more
complex the model, the better the performance. The right Our optimal model has 3 layers and 105,501 parameters.
panel contains the test period. The 3-layer model does the This is a lot more than linear regression, with its measly 1
best out of sample, especially over the past couple years layer and 1,001 parameters. However, it pales in comparison
including the ongoing COVID-19 crisis. to the deep learning architectures used on unstructured
data. For example, here is ResNet-50, a popular computer
The next exhibit summarizes the results using Sharpe Ratio vision model with 50 layers and 25 million parameters.
(i.e., signal-to-noise ratio). The chart looks as if it were taken
straight out of a machine learning textbook! Exhibit 22
ResNet-50
Exhibit 21
Sharpe Ratio and Model Complexity

Source: Deep Residual Learning for Image Recognition

We added our optimal model to the chart of modern NLP
models from the prior section. We overfit our data at just
100,000 parameters. Yet this is 1,000 times smaller than
ELMo and over 1M times smaller than GPT-3.

Source: Sparkline

11

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 23 range from -2.5% to +2.5%. We spot checked a few standard
🚀
NLP ++ quant factors to ensure they lined up with intuition. Value,
momentum, reversal, quality and size work as expected.
Phew!

Exhibit 24
Deep Learning Surrogate Coeﬃcients

Source: Sparkline (Adapted from HuggingFace)

One might argue that our results are specific to our dataset
and model setup. Of course, we could further optimize the
hyperparameters and architecture. We could also go to daily
frequency data and further expand the number of signals.
However, this would not qualitatively change our

conclusion. Source: Sparkline

Deep learning models can offer an improvement upon linear One side benefit of this approach is we can evaluate the
regression. However, due to inherent limitations in financial R-squared, or the percent of variance explained by the linear
data, the models quickly start overfitting with even simple regression. If the deep learning model were completely
architectures. The whole point of deep learning models is linear -- which might happen if the underlying features were
that they are deep -- consisting of dozens of layers and truly linear -- the surrogate would capture 100% of its
millions of parameters. Forced to resort to “shallow deep variance. The less variance explained, the more
learning” means sacrificing most of the benefit of these nonlinearities and interactions the deep learning model is
models. picking up.

Exhibit 25
Explainability Financial Factors Are Mostly Linear
In addition, using deep learning is not without its tradeoffs.
One significant weakness of deep learning models is that
they are “black boxes”. Unlike linear regression, there is no
intuitive interpretation of its coefficients. With great power
comes great opacity. 🕷

Fortunately, this is an active branch of AI research. We will
utilize a simple technique called a “global surrogate”. The
idea is to train an interpretable model (in our case, linear
regression) to predict the predictions of the deep learning
model. To be clear, we are not trying to predict the market,
only the output of the deep learning model.

The main advantage of the surrogate model is that its Source: Sparkline
regression coefficients are interpretable. Weights (i.e., betas)

12

Deep Learning in Investing: Opportunity in Unstructured Data | July 2020

Exhibit 25 shows the variance explained for each of our
models. The 1-layer model is linear regression, so the
surrogate explains 100% of the variance. As we add layers,
the model begins finding interesting nonlinearities and
interactions in the data. The R-squared falls gradually to Kai Wu
62% as we increase model complexity up to 5 layers. Founder & CIO, Sparkline Capital LP

We found the 5-layer model overfits, so let’s focus instead on
the optimal 3-layer model. The linear surrogate captures Kai Wu is the founder and Chief Investment Officer of
70% of the deep learning model’s variance, while 30% can Sparkline Capital, an investment management firm applying
be explained only by nonlinearities and interactions. state-of-the-art machine learning and computing to uncover
alpha in large, unstructured data sets.
This 70/30 split is quite interesting. It implies that our data
are mostly linear. While complex models can add value, the Prior to Sparkline, Kai co-founded and co-managed
gains are limited. Furthermore, there are significant Kaleidoscope Capital, a quantitative hedge fund in Boston.
drawbacks to utilizing deep learning. These include opacity, With one other partner, he grew Kaleidoscope to $350
complexity and cost. There are plenty of machine learning million in assets from institutional investors. Kai jointly
algorithms occupying the “medium” complexity region managed all aspects of the company, including technology,
between linear regression and deep learning that might be investments, operations, trading, investor relations, and
worth first considering. recruiting.

Previously, Kai worked at GMO, where he was a member of
Conclusion Jeremy Grantham’s $40 billion asset allocation team. He
also worked closely with the firm's equity and macro
Deep learning is extremely powerful but requires very large investment teams in Boston, San Francisco, London, and
datasets to be effective. Traditional structured financial data Sydney.
is too small and linear to truly benefit from deep learning.
While “shallow deep learning” can be useful, researchers Kai graduated from Harvard College Magna Cum Laude and
may be better served to first consider simpler techniques. Phi Beta Kappa.

On the other hand, deep learning is highly effective on
unstructured data. Transfer learning provides the key to
unlocking its potential in niche domains such as investing. Disclaimer
Transfer learning enables us to leverage the creations of This paper is solely for informational purposes and is not an offer
large technology companies without having to gather the or solicitation for the purchase or sale of any security, nor is it to be
data or train the models ourselves. construed as legal or tax advice. References to securities and
strategies are for illustrative purposes only and do not constitute
buy or sell recommendations. The information in this report should
Unstructured data is a critical input to the investment
not be used as the basis for any investment decisions.
process. However, its unmitigated growth presents a

significant challenge for the industry. Fortunately, the
We make no representation or warranty as to the accuracy or
advances in natural language processing presented here can
completeness of the information contained in this report, including
greatly improve how we consume this data. Given that these third-party data sources. The views expressed are as of the
innovations are less than a few years old, we believe there is publication date and subject to change at any time.
opportunity for entrepreneurial individuals and firms to
profit from the impending transformation. Hypothetical performance has many significant limitations and no
representation is being made that such performance is achievable
in the future. Past performance is no guarantee of future
performance.

AI Unit 2
No ratings yet
AI Unit 2
93 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
16 pages
ML&DAP Module 1
No ratings yet
ML&DAP Module 1
41 pages
Machine Learning Unit - 1
No ratings yet
Machine Learning Unit - 1
7 pages
Deep Learning White Paper
No ratings yet
Deep Learning White Paper
8 pages
AI & ML Basics for Tech Enthusiasts
No ratings yet
AI & ML Basics for Tech Enthusiasts
4 pages
Deep Learning in Finance Sector
No ratings yet
Deep Learning in Finance Sector
5 pages
Deep Learning With R
No ratings yet
Deep Learning With R
18 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Abhijit Ghatak - Deep Learning With R-Springer (2019)
No ratings yet
Abhijit Ghatak - Deep Learning With R-Springer (2019)
259 pages
Deep Learning
No ratings yet
Deep Learning
52 pages
Article 3
No ratings yet
Article 3
74 pages
Mod 2
No ratings yet
Mod 2
43 pages
The Four Horsemen of Machine Learning in Finance
No ratings yet
The Four Horsemen of Machine Learning in Finance
24 pages
Machine Learning Overview & Applications
No ratings yet
Machine Learning Overview & Applications
7 pages
Machine Learning - Ii Unit 1
No ratings yet
Machine Learning - Ii Unit 1
21 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
ML Unit1.2
No ratings yet
ML Unit1.2
24 pages
04.1 PP 3 22 Introduction
No ratings yet
04.1 PP 3 22 Introduction
20 pages
Machine Learning-Basic Concepts
No ratings yet
Machine Learning-Basic Concepts
52 pages
11 Introduction To Machine Learning
No ratings yet
11 Introduction To Machine Learning
13 pages
Unit 1-2
No ratings yet
Unit 1-2
15 pages
Introduction to Machine Learning Methods
No ratings yet
Introduction to Machine Learning Methods
9 pages
AI UNIT - 4 Notes
No ratings yet
AI UNIT - 4 Notes
9 pages
Introducing Deep Learning1
No ratings yet
Introducing Deep Learning1
24 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
16 pages
Deep Learning in Business Applications
No ratings yet
Deep Learning in Business Applications
6 pages
Chapter 01 Notes
No ratings yet
Chapter 01 Notes
11 pages
SSRN 4988124
No ratings yet
SSRN 4988124
100 pages
Transfer Learning - Qiang Yang
No ratings yet
Transfer Learning - Qiang Yang
393 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
41 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
Advancements and Applications of Deep Learning
No ratings yet
Advancements and Applications of Deep Learning
4 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
9 pages
Research Paper
No ratings yet
Research Paper
9 pages
Nature14539 PDF
No ratings yet
Nature14539 PDF
9 pages
PAPER2
No ratings yet
PAPER2
30 pages
Ai Unit 4 Compiled Notes
No ratings yet
Ai Unit 4 Compiled Notes
66 pages
Deep Learning in AI: Methods & Challenges
No ratings yet
Deep Learning in AI: Methods & Challenges
6 pages
Lecun 2015
No ratings yet
Lecun 2015
9 pages
Reasons Against Using Deep Learning
No ratings yet
Reasons Against Using Deep Learning
1 page
What Is Machine Learning - IBM
No ratings yet
What Is Machine Learning - IBM
1 page
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
12 pages
Module 1
No ratings yet
Module 1
22 pages
Proyecto IA Inglés
No ratings yet
Proyecto IA Inglés
9 pages
Theobald O. Machine Learning With Python 2024
No ratings yet
Theobald O. Machine Learning With Python 2024
146 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
AI Learning Methods Explained
No ratings yet
AI Learning Methods Explained
11 pages
MCA Machine Learning Question Bank
No ratings yet
MCA Machine Learning Question Bank
139 pages
Machine Learning & Deep Learning Overview
No ratings yet
Machine Learning & Deep Learning Overview
9 pages
39
No ratings yet
39
3 pages
DL Unit I & II
No ratings yet
DL Unit I & II
51 pages
1 ML Landscape, ML Categories
No ratings yet
1 ML Landscape, ML Categories
3 pages
AI and Its Usage
No ratings yet
AI and Its Usage
1 page
Machine Learning in Finance
No ratings yet
Machine Learning in Finance
20 pages
The Platform Economy
No ratings yet
The Platform Economy
14 pages
QE HouseOfLords
No ratings yet
QE HouseOfLords
68 pages
Long-Term Gains: Value & Profitability
No ratings yet
Long-Term Gains: Value & Profitability
14 pages
The Joint Distribution of Value and Profitability: International Evidence
No ratings yet
The Joint Distribution of Value and Profitability: International Evidence
17 pages
YOLO: Efficient Object Detection Guide
No ratings yet
YOLO: Efficient Object Detection Guide
19 pages
Oroojlooyjadid Et Al 2021 A Deep Q Network For The Beer Game Deep Reinforcement Learning For Inventory Optimization
No ratings yet
Oroojlooyjadid Et Al 2021 A Deep Q Network For The Beer Game Deep Reinforcement Learning For Inventory Optimization
21 pages
Correlation
No ratings yet
Correlation
12 pages
Lecture 05
No ratings yet
Lecture 05
7 pages
Unit 1 - Control System - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Control System - WWW - Rgpvnotes.in
12 pages
Tutorial Sheet 2 Partial Differential Equations
No ratings yet
Tutorial Sheet 2 Partial Differential Equations
2 pages
Exact Wilsonian Renormalization Notes
No ratings yet
Exact Wilsonian Renormalization Notes
24 pages
Linear Systems Analysis Syllabus EECE 7200
No ratings yet
Linear Systems Analysis Syllabus EECE 7200
2 pages
AdCo Syllabus
No ratings yet
AdCo Syllabus
7 pages
EVS Homework 2
No ratings yet
EVS Homework 2
6 pages
Chapter 5 Numerical Differentiation and Integration
No ratings yet
Chapter 5 Numerical Differentiation and Integration
52 pages
Cs1401 Design and Analysis of Algorithms Unit Ii Decrease and Conquer and Divide-And-Conquer
No ratings yet
Cs1401 Design and Analysis of Algorithms Unit Ii Decrease and Conquer and Divide-And-Conquer
14 pages
Quantum Computing Unit - 5
No ratings yet
Quantum Computing Unit - 5
25 pages
Decision Analysis Ba 182
No ratings yet
Decision Analysis Ba 182
111 pages
EECS 20: Frequency Response Review
No ratings yet
EECS 20: Frequency Response Review
24 pages
Lab Manual Softcomputing-1
No ratings yet
Lab Manual Softcomputing-1
48 pages
Web Security
No ratings yet
Web Security
7 pages
Demultiplexer and Decoder Conversion Guide
No ratings yet
Demultiplexer and Decoder Conversion Guide
3 pages
Enigma Cryptanalysis & Cyclometer
No ratings yet
Enigma Cryptanalysis & Cyclometer
16 pages
2d Sampling
No ratings yet
2d Sampling
5 pages
Crypto - Week 2 HW
50% (2)
Crypto - Week 2 HW
6 pages
Theory of Computation: Undecidability
100% (1)
Theory of Computation: Undecidability
13 pages
QM Notes 3
No ratings yet
QM Notes 3
2 pages
Kunal Anarse: Professional Synopsis
No ratings yet
Kunal Anarse: Professional Synopsis
3 pages
Unit 2 Algebraic Structures
100% (1)
Unit 2 Algebraic Structures
8 pages
Automate Strategy Finding With LLM in Quant Investment
No ratings yet
Automate Strategy Finding With LLM in Quant Investment
21 pages
L15 - Naive Bayes Classifier
No ratings yet
L15 - Naive Bayes Classifier
27 pages
Chapter4 Machine Learning Part1
No ratings yet
Chapter4 Machine Learning Part1
39 pages
Computer Science IGCSE Review Material
No ratings yet
Computer Science IGCSE Review Material
6 pages
Big M Method in Linear Programming
No ratings yet
Big M Method in Linear Programming
17 pages

Deep Learning in Investment Data

Uploaded by

Deep Learning in Investment Data

Uploaded by

Deep Learning in Investing:

Opportunity in Unstructured Data

Source: Sparkline, SEC

Source: Sparkline, Distill (h/t Sebatian Ruder)

Source: Sparkline (Adapted from HuggingFace)

- Vincent J. Arnone, Chairman, CEO, and President, Fuel

Part 2 Structured Data

Source: Sparkline, Papers With Code

Source: Sparkline, NN-SVG

You might also like