In my first set of articles on machine learning, I investigated whether deep neural networks, when trained on financial ratios, could predict future stock prices. The conclusion I drew in the second article was that no, financial ratios didn’t predict future stock prices. This finding was consistent with other people’s observations that algorithm driven value investing hasn’t worked in the past decade or so.
But while value investing hasn’t worked, other relatively simple strategies have. Some of these strategies solely act on signals drawn from the behaviour of stock prices in recent months and years. Such strategies, as a group, are called “technical analysis”.
One well documented technical analysis strategy is momentum investing, which involves investing in the stock market’s best performing stocks. Momentum investing has posted great results over the past few years. The iShares Edge MSCI USA Momentum Factor ETF, which is the biggest momentum ETF in the U.S., has outperformed the broader stock market by around 5% per year since inception four years ago.
Another technical analysis strategy, called “trend following”, has enjoyed success as well. You can read the details of this strategy in my interview with a hedge fund manager who based his fund around this strategy.
The success of these strategies led me to think that machine learning algorithms could be trained on stock price history to predict future prices. At the very least, the algorithms should be able to “reinvent” the momentum and trend following strategies.
I have thus decided to train neural networks using several different architectures to see how well they could predict future stock prices. In this first instalment of the series, I’ve decided to investigate the performance of convolutional neural networks (CNNs), which is a well known neural network architecture.
CNNs are particularly known for their application in image processing, and work roughly as follows: A convolutional layer consists of a set of filters that tries to detect “features” in a data set. In image processing, features may include the edges of an object. So if you pass an image of a coffee mug through some filters, they may output the contours of the mug. Note that we don’t supply the filters - the machine learning algorithms figure them out on their own.
I believe there are some similarities between image processing and technical analysis, since technical analysis involves analyzing the shape of a stock’s price history. I therefore figured that CNNs could recreate some of these technical analysis strategies.
The data I fed into the CNNs consisted of the price and volume history of every TSX listed stock. Specifically, each input data set consisted of the daily trading volume and percentage price change for a period of 250 trading days (roughly a calendar year). The output data consisted of the change in each stock’s price a week after the last input date. There were about 160,000 data points available, but since I had limited amount of memory in my GPU, I only took the first 40,000 data points.
I split these data points into three groups: training, validation and testing. I explained the purpose of training and testing data sets in my previous article on deep neural networks. But I’ve had to further divide my data to create the validation set as well.
The reason I had to do this was because experimenting with CNNs brought fresh challenges to me as a relative newcomer to the machine learning world. There are several ways to configure each convolutional layer, and you typically use multiple convolutional layers within a CNN model. Furthermore, it takes a long time, typically more than an hour, in order to fully train a CNN.
Initially, I had tried to make some intuitive guess as to what a good CNN configuration would look like. However, I quickly found that it was hard to make good guesses, so I implemented a more methodical approach instead.
I first started with a more or less random CNN model. Then, I generated a set of CNNs that each had just one of the configurations tweaked. For example, if the original CNN contained a convolutional layer with a filter size of 10, the generated CNNs would contain a CNN with a filter size of 5, and another with a filter size of 15.
Once I evaluated every CNN in the generated set, I selected the CNN that best predicted future returns, and generated a new set of CNNs based on it. When determining which CNN was the best, I tested how well each CNN was able to predict the output contained in the validation set.
It’s important that the validation data doesn’t overlap with the testing data. In a previous article, I said that training a machine learning model is like training a robot to fly a plane. The training data is like flight simulations, while testing data is like a test flight.
Now, imagine that you have a number of robots trained on the same flight simulations. Then, you choose the best robot based on the performance of the test flight. Now, how would you determine whether the robot could deal well with every real life situation? Would you have the robot fly the same route it had flown before in the test flight? Of course, not - you’d give the robot a new route to fly, which it had never encountered before.
We give the robot a new route because we already know the robot does well in the test route, but there’s a chance the robot just happens to do well on that particular route. This is the same reason why we need to separate the validation data, which is like the test route, from the testing data, which is like the routes that the robot has never flown before.
In order to find the CNN that best fits the data, I repeated the process of selecting the best CNN from a set of CNNs, and generating a new set of CNNs based on this best CNN. I did this for a couple of dozen iterations. I’ve detailed the CNN’s configuration in the appendix of this article.
I then fed the best CNN with the testing data. The following plot shows the predicted stock price returns (Y-axis) vs. the actual returns (X-axis) of this test data set.
If the CNN can predict future prices very accurately, then we should see the shape of the green dots match that of the blue ellipse. Although we clearly see that this is not the case, we can nonetheless see that the green dots generally skew upward. This means that although the CNN is not a very good predictor of returns, it can nonetheless outperform random guesses.
One way to quantify the predictability of any statistical model is to measure its R-Squared (R2). The R2 measured on the CNN’s predictions indicate that the CNN is able to explain 6% of the variability of the the actual stock market returns. While 6% is rather low as a typical R2 value, it nonetheless beats the expected R2 of 0% if CNN actually had no predictive capability.
Because of the low R2 value, I don’t believe that using my CNN architecture will yield very useful trading algorithms. But that’s okay, as I never expected CNNs (used alone, at least) to produce great results. In the next instalment of this series, I’ll use the same data set to investigate the potential of recurrent neural networks.
Layer 1: Convolutional layer with kernel size of 5, 30 output channels, RELU activation function, and maxpool of 3 with stride of 2.
Layer 2: Convolutional layer with kernel size of 10, 70 output channels, RELU activation function, and maxpool of 3 with stride of 2.’
Layer 3: Full layer with output of 550 nodes, RELU activation function. Dropout of 50% applied for regularization purposes.
Final layer: 1 output node. Dropout of 50% applied.