Image Credit: Boo-Tique / Shutterstock.com

What is the best machine learning algorithm? The answer is, unsurprisingly, “it depends.” There are many different types of machine learning algorithms, each with their own unique way of modelling the real world in some fashion.

#### Tradeoffs on Model Complexity

The best algorithm is generally one that most closely mirrors the real world. Think of each machine learning algorithm as a set of lego boxes, each containing different shapes of bricks. If you’re going to build an airplane, you’re going to do better with lego boxes containing wing shaped bricks.

However, the complexity of the algorithm comes into play as well. Some people may think that more complexity is always better because it can more accurately model real world phenomena, but this would be a mistake. Indeed, complex algorithms have greater capacity to learn the “wrong” lessons. Let me show you an example.

Let’s say we want to create a model that predicts the probability of a coin coming up heads. We therefore toss a few coins, record their outcomes, and train an algorithm on that data. The outcome is as follows (‘H’ denotes head and ‘T’ denotes tail).

T T H T H H

A very simple algorithm may only look at the number of Hs relative to Ts, and conclude that the probability of an H coming up is 50%. But a more complex model that remembers past outcomes may think that the probability of H is more likely after T T or H T. The complex algorithm, though more powerful, has yielded an inferior model.

Another problem with using complex algorithms is that they are generally more difficult to interpret. One reason linear regression is still so popular in the investment community today is because linear regressions, being simple mathematically, are also simple to interpret. For instance, the Fama French 3 factor model says that the smaller the stock, the higher the expected return for that stock. By contrast, it’s rarely possible to explain the process behind machine learning models using just one or two lines.

There therefore exists a tension around choosing the complexity of the machine learning algorithm. Too simple, and the algorithm doesn’t have the ability to accurately model real world phenomena. Too complex, and the algorithm learns the wrong lessons, and/or it becomes too hard to explain the model’s inner workings. The best machine learning algorithm is one that can bring the benefits of complexity while paying as little as possible for them.

In the domain of investment selection, I find that algorithms that employ the ‘decision tree’ model often do a good job of achieving this balance. Let me show you why.

#### Strengths and Weaknesses of Decision Trees

First, let me give you an overview of how decision trees work. Let’s say that we start with some input data and their corresponding output data. For instance, the input data may include the return on equity, and the output may be the percentage change in the price of the stock.

To form a decision tree, first we identify the input variable which is the most important factor in determining what the output will be. For example, the tree algorithm may decide that P/E is the most important variable to consider.

We then analyze how the values of the variable are associated with the output, and divide the data into separate sets using a critical value. For instance, we may see big differences in outputs between stocks with P/E less than 15 versus those with P/E equal to or greater than 15. If this is the case, we would divide all stocks into two sets based on this criteria.

After the data has been split, we have the option of splitting the data set further using either the same or another input variable. For example, in the data set containing stocks with P/E less than 15, we may find that 6 month momentum matters most, and split the data set further based on whether 6 month momentum is positive or negative. On the other hand, for stocks with P/E more than 15, we may find that P/E still matters most, but we may choose to split on a P/E value of 40 this time.

After some point, it will no longer make sense to split data sets further. Instead, we would assign some score for all data that belongs to a set. Such scores may indicate that we should buy a stock, or vice versa. The chart below depicts the example we’ve followed thus far.

In this example, we may conclude that we should buy stocks with P/E less than 15, that have positive 6 month momentum. Perhaps this represents the GARP (growth at reasonable price) stocks, and the data indicates that the strategy works. On the other hand, the model may tell us to avoid stocks with P/E less than 15 that have negative 6 month momentum. Perhaps these are value traps, which the data says often leads to bad outcomes.

One of the major strengths of decision trees is the fact that they can model many different contexts. We’ve already seen an example of this, where stocks with P/E less than 15 were treated very differently depending on whether momentum was positive or negative. Not all machine learning algorithms are as flexible.

Another strength of decision tree algorithms is that they can handle missing data gracefully. In finance, missing data occurs often. This is sometimes due to an error with the data vendor, but at other times, the data simply doesn’t exist. For example, profitability margins don’t exist for companies that have yet to make a sale.

In decision trees, missing data can be treated as its own class of data, and the algorithm only needs to decide whether to go left or right down the tree path when it encounters missing data. Other machine learning models typically can’t handle missing data out of the box, and while there are good workarounds, I feel that they don’t handle missing data as gracefully as decision trees can.

Decision trees are particularly great when there are relatively few data points to work with. Because the outcome is modeled to be the same for all stocks that belong to the same context, decision trees tend to be more resistant to ‘overfitting’ than other more complex algorithms. In other words, a decision tree algorithm tends not to learn the “wrong lessons” we talked about earlier. However, this is a double edged sword since by the same token, decision trees can’t model differences between stocks that belong to the same context.

Finally, decision trees are easier to interpret compared to other more complex algorithms. Modern software packages allow us to view graphs of decisions that have been constructed, and through them, humans can at least trace the decision paths that have led to a particular outcome.

Now, does this mean that decision trees are always the best algorithm for handling investment data? Certainly not. Time series, for instance, may be better handled by some types of neural networks. However, in situations where there are relatively few data points and a high proportion of missing data, decision trees can shine, and in finance, those situations occur frequently.