Can Financial Ratios Predict Next Year's Stock Performance? (Part 1)

Last update on April 24, 2017.

Image Credit: Zapp2Photo /


Machine learning is arguably the hottest topic in technology today. It’s what has enabled us to shoot videos of ourselves spewing rainbows, and it’s what will enable cars to drive themselves in the future.

As I’ve read more about the topic, I’ve begun to wonder if machine learning could be applied to investing. To deepen my knowledge, I’ve spent the last few months reading books on the subject and practicing on data sets.

Fortunately, my educational background has allowed me to pick up the topic more easily. Machine learning, at its core, is automated statistics. To understand machine learning, one needs a solid foundation in linear algebra, calculus and statistics - all of which I had exposures to during my ten years spent in university.

After some study, I believe I now have enough of an understanding to make machine learning useful. Therefore, I’ve decided to start a new blog series that examines the application of machine learning to investing. This article is the first of such a series, and I hope to publish one every month or two.

As a first stab, I decided to investigate whether various financial ratios (e.g. price to earnings) could predict whether a stock would outperform the market over a 1 year period.

There are many machine learning models I could have used to investigate the problem. But since this is my first attempt, I decided to use the classic deep neural network (DNN). Let me explain what a DNN is, and why it’s useful.


What Deep Neural Networks Can Achieve That Linear Models Can’t

When researchers investigate the relationship between a potential cause and an outcome, they generally rely on a “linear” statistical model. For example, suppose a medical researcher wants to find a relationship between smoking and lung cancer. To investigate, she would collect data on a large number of smokers and non-smokers, and see whether the smokers tend to develop lung cancer more frequently. There’s a direct relationship between cause (smoking) and effect (cancer), and that’s what a linear model is good at capturing.

As with most other disciplines, finance practitioners have used linear models extensively. For example, the famous Fama-French three factor model is a linear model that decomposes the expected return of a stock into the risk free rate of return, beta, size and value components.

Unfortunately, a linear model is not good at capturing more complex relationships between cause and effect. Let me give you an example.

Suppose we want to know whether two magnets placed side by side will attract or repel. If we put a magnet’s positive end next to another magnet’s negative end, we know that they will attract. Therefore, we have two configurations where the magnets will attract: +-, and -+. In other cases (-- and ++), we know the magnets will repel.

Now, let’s suppose we want to create a linear model that tells us whether the magnets will attract or repel. To construct the model, we ask ourselves: given that the first magnet is showing +, will the magnets attract or repel?

Of course, as a human being, we will say that it depends on how the other magnet is oriented. However, linear models don’t have the capacity to consider that extra piece of information. All it can do is observe the direct relationship between the first magnet showing +, and the magnets attracting or repelling. Since the magnets would attract half the time (when the other magnet shows -) and repel half the time (when the other magnet shows +), the linear model would say there’s no relationship between the first magnet showing +, and the outcome.

DNNs, by contrast, are able to solve this problem using what’s known as “hidden layers”.  You can think of hidden layers as intermediate steps that connect the cause and effect. For example, I know I'm oversimplifying things, but a hidden layer may contain information about whether the two magnets are showing the same signs. Because of hidden layers, DNNs can discern relationships between cause and effect that linear models can’t.

I suspect that there’s a lot of investment phenomena that require the use of DNNs to explain. For example, history shows that value (i.e. cheap) stocks and momentum stocks tend to outperform the rest of the stock market. However, stocks that exhibit both value and momentum characteristics have not tended to outperform stocks that exhibit only one of either value or momentum characteristics. This suggests a complex interplay between value and momentum that linear models can’t tease out.

This relationship between value and momentum is something that I may investigate in the future. But this time around, I decided to focus on a different problem: I decided to see whether financial ratios can predict whether a stock will outperform the rest of the stock market.


The Input and Output of DNNs

Financial ratios are metrics that one can calculate using a company’s financial statements. For example, the popular price to earnings (P/E) ratio is the price of a stock today divided by its earnings per share. Stocks with low P/E ratios are considered to be cheap.

There are hundreds of different ratios. Some, such as return on assets (ROA), measure how efficient a company’s operations are. Some others, such as the current ratio, measure the financial stability of a company. Different ratios all measure different aspects of the company, and some investors rely heavily on these ratios to make investment decisions.

I decided to investigate the influence of 82 such financial ratios on stock returns. I didn’t choose these ratios for any well thought out reason, but because they were convenient. I use a data service provider called Intrinio to fetch each stock’s fundamental data, and these 82 ratios are already calculated by them. The list of ratios are as follows:

Dividend yield

Earnings yield



EV to free cash flow

EV to invested capital


EV to operating cash flow

EV to revenue

Altman Z score

Total debt to EBITDA

Long term debt to EBITDA

Long term debt to NOPAT

Net debt to EBITDA

Net debt to NOPAT

EBIT margin

EBITDA margin

Effective tax rate

Gross margin

Interest burden

NOPAT margin

Normalized NOPAT margin

Operating expenses to revenue

Operating margin

Pretax income margin

Profit margin

R&D to revenue

Selling, general & administrative to revenue

Tax burden

Current ratio

Debt-free net working capital to revenue

Debt-free, cash-free net working capital

Net working capital to revenue

Quick ratio

Compound leverage factor

Debt to equity

Leverage Ratio

Long term debt to equity

EBIT growth

EBITDA growth

Earnings per share growth

Free cash flow growth

Invested capital growth

Net income growth

NOPAT growth

Operating cash flow growth

Revenue growth

Accounts payable turnover

Accounts receivable turnover

Asset turnover

Cash conversion cycle

Days inventory outstanding

Days payable outstanding

Days sales outstanding

Fixed asset turnover

Inventory turnover

Invested capital turnover

Augmented payout ratio

Cash return on invested capital

Dividend payout ratio

Net nonoperating expense percent

Noncontrolling interest sharing ratio

Operating cash flow to capital expenditures

Operating return on assets

Return on assets

Return on common equity

Return on equity

Return on invested capital

Return on net nonoperating assets

ROIC less NNEP spread

EBIT less capital expenditures to interest expense

EBIT to interest expense

Free cash flow to interest expense

NOPAT less capital expenditures to interest expense

NOPAT to interest expense

Operating cash flow less capital expenditures to interest expense

Operating cash flow to interest expense

Common equity to total capital

Debt to total capital

Long term debt to capital

Noncontrolling interests to total capital

Short term debt to capital


The list of ratios provided by Intrinio is pretty comprehensive. It not only includes ratios favoured by novices (e.g. dividend yield), but also contains ratios favoured by seasoned professionals (e.g. return on invested capital). Some ratios, such as the EV to EBIT ratio, have been found by academics to predict stock performance. The input to the DNN model consists of these ratios for each stock, each year, going back to 2007.

The output of the model, or the measure that we’re trying to predict, is each individual stock performance in excess of S&P 500 index returns over the next 12 months. The S&P 500 measures the performance of the U.S. stock market as a whole. If a stock returned 20% and the S&P 500 returned 15%, then we say the output is 20 - 15 = 5%.

I used Yahoo Finance data for the performance data, which allowed me to easily account for dividends and stock splits. But as another consideration, I had to be careful about choosing the timeframe of the stock performances.

Companies release their end of year financial statements up to 3 months after the end of the fiscal year. If I measured the stock performance from the end of the fiscal year, my model would have trained as if it had foreknowledge of the numbers the company would report.

For example, suppose a company’s fiscal year ended in Dec 2014. This company probably wouldn’t have released its financial statements until some time in Feb 2015. But if I trained my model on the stock performance from Jan 2015 to Jan 2016 based on the financial statements up to Dec 2014, I would be training it based on knowledge that didn’t exist in Jan 2015. To get around this problem, I measured the stock performance from 3 months after the release of the financial statements (e.g from Mar 2015 to Mar 2016).

Once I gathered the input and output data, I then preprocessed that data. In this step, I added new metrics that showed whether a financial ratio was absent. Some ratios can’t be calculated for mathematical reasons. For example, the EV to EBIT ratio can’t exist if EBIT is 0. Other ratios were unfortunately absent because of data issues - while I think Intrinio’s data is good, it’s not perfect. This doubled the number of metrics to 164.

After this, I filled the missing financial ratios with the median value from the rest of the stocks. For example, if a profit margin was missing for company XYZ, and the median value of the rest of the stocks was 10%, I supplanted XYZ’s profit margin with 10%.

I had to preprocess the data this way because the DNN can’t process inputs with missing data. But at the same time, I didn’t want to lose any information due to the fact that some ratios were missing, which led me to create the new metrics. This way, if there’s any common behaviour between stocks that miss certain financial ratios, then the DNN should be able to understand how to deal with them.

After I had preprocessed the data, I was left with roughly 25000 input/output pairs. I fed this data through various DNN models for training. There’s an unlimited number of ways to configure DNN models, from choosing the number of hidden layers, to choosing training algorithms. I tried many different combinations of these configurations to get the best fit.

Training a model involves the following: First, we split the data into ‘train’ and ‘test’ data sets to detect whether we are “overfitting” a model (I’ll explain overfitting later). Then, we feed the ‘train’ data into the model so that it finds relationships between the input and output within the ‘train’ data set.

Once training is done, we plot the actual stock performance vs. the performance predicted by the trained model. We do this for both the ‘train’ and ‘test’ data sets.

Before I began training any models, I expected to find a weak relationship between financial ratios and stock performances. But is there such a relationship? I will talk about the results from my DNN models in Part 2 of this instalment next week.

If you enjoyed this article, you might be interested in our free newsletter. Enter your email to get free updates.

Web Analytics