The Dangers of Linear Correlation

Black and white big data bubbles computer generated abstract background

Correlations are present everywhere. The concept of correlation is one of the key constructs of statistics, modelling, simulation. It is used to design portfolios, to estimate risks, to perform VaR analysis, compute Probabilities of Default, etc.  A correlation expresses how strongly two variable are interdependent. It is therefore of paramount importance to measure correlations correctly.

Instead of computing correlations based on traditional approaches based on variance, we use a proprietary entropy-based measure of correlation, called generalized correlation. Unlike variance which measures concentration only around the mean, the entropy-based generalized correlation takes into account the actual distribution of data in the entire domain spanned by the data in question. This allows us to provide a more realistic and correct measure of correlation.

A popular and conventional correlation is the linear correlation by Pearson. Pearson’s correlations function properly when applied to data which is linear in character. In cases which include data concentrations, clustering, bifurcations or other forms of discontinuity, applying linear correlations is outright wrong. The results of linear correlation analysis may in fact provide outcomes which can induce unjustified optimism and distort significantly any risk-type calculations.

The surprising fact is that this shortcoming of linear correlations is widely known and yet neglected by the mainstream of fund managers and analysts. Traditional models which are used to compute risk or degree of asset diversification in investment portfolios may be easily proved to be incorrect. Two simple examples are illustrated below.



Stock 1 vs. Stock 2: generalized correlation is 76%, while linear is 89%.


Stock 3 vs. Stock 4: generalized correlation is 63% while linear is 24%. The plot shows a strongly non-linear situation. Linear correlations in this case are not applicable. In this case, linear correlations would suggest that the variables are independent. Difference is 39%!


Even in moderate size investment portfolios there are thousands of interdependencies. If these are analyzed using conventional correlations to estimate exposure, expected returns or degree of diversification there exists a concrete possibility that these analyses are simply wrong. Imagine what a 10% error in correlation can do in a covariance matrix-based portfolio design.


“The difficulty lies not so much in developing new ideas as in escaping from old ones” – John Maynard Keynes


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s