Analyzing Daily Returns of Equities

This tutorial will look at specific equities and analyze their performance and answer few questions like

  1. What is the value at risk (VAR) ?
  2. What is the probability of making a profit of certain percentage if invested in specific stocks?
  3. What is the probability of making a loss of certain percentage if invested in specific stocks?
  4. Volatility of stocks?
  5. Correlation between returns of different stocks?

For this purpose we will analyze the following stocks

  • Unitech
  • BEML
  • INFY
  • GLAXO
In [1]:
import pandas as pd
import numpy as np

Load the datasets

In [2]:
unitech_df = pd.read_csv( 'UNITECH.csv' )
beml_df = pd.read_csv( 'BEML.csv' )
infy_df = pd.read_csv( 'INFY.csv' )
glaxo_df = pd.read_csv( 'GLAXO.csv' )
In [3]:
unitech_df.head( 5 )
Out[3]:
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs)
0 2010-01-04 82.9 83.80 82.45 82.80 82.80 19904383.0 16526.18
1 2010-01-05 83.5 84.90 83.40 84.00 84.10 34197626.0 28820.01
2 2010-01-06 84.5 85.45 83.20 84.70 84.85 34704003.0 29360.29
3 2010-01-07 85.2 85.70 83.85 84.15 84.15 28855332.0 24461.35
4 2010-01-08 84.0 87.50 84.00 86.60 86.60 49805151.0 43006.54
In [4]:
beml_df.head( 5 )
Out[4]:
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs)
0 2010-01-04 1121.0 1151.00 1121.00 1134.0 1135.60 101651.0 1157.18
1 2010-01-05 1146.8 1149.00 1128.75 1135.0 1134.60 59504.0 676.47
2 2010-01-06 1140.0 1164.25 1130.05 1137.0 1139.60 128908.0 1482.84
3 2010-01-07 1142.0 1159.40 1119.20 1141.0 1144.15 117871.0 1352.98
4 2010-01-08 1156.0 1172.00 1140.00 1141.2 1144.05 170063.0 1971.42
In [5]:
unitech_df["gain"] = unitech_df.apply( lambda rec: round( ( rec.Close - rec.Open ) / rec.Open, 2 ), axis = 1 )
In [6]:
unitech_df.head( 5 )
Out[6]:
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs) gain
0 2010-01-04 82.9 83.80 82.45 82.80 82.80 19904383.0 16526.18 -0.00
1 2010-01-05 83.5 84.90 83.40 84.00 84.10 34197626.0 28820.01 0.01
2 2010-01-06 84.5 85.45 83.20 84.70 84.85 34704003.0 29360.29 0.00
3 2010-01-07 85.2 85.70 83.85 84.15 84.15 28855332.0 24461.35 -0.01
4 2010-01-08 84.0 87.50 84.00 86.60 86.60 49805151.0 43006.54 0.03

Calculate gains

In [7]:
beml_df["gain"] = beml_df.apply( lambda rec: round( ( rec.Close - rec.Open ) / rec.Open, 2 ), axis = 1 )
infy_df["gain"] = infy_df.apply( lambda rec: round( ( rec.Close - rec.Open ) / rec.Open, 2 ), axis = 1 )
glaxo_df["gain"] = glaxo_df.apply( lambda rec: round( ( rec.Close - rec.Open ) / rec.Open, 2 ), axis = 1 )
In [8]:
import matplotlib.pyplot as plt
import seaborn as sn
%matplotlib inline

Unitech

In [9]:
plt.plot( unitech_df.Close, 'r' )
Out[9]:
[<matplotlib.lines.Line2D at 0x119437518>]
In [10]:
unitech_df = unitech_df.set_index(pd.DatetimeIndex(unitech_df['Date']) )
In [12]:
unitech_df.head(5)
Out[12]:
Date Open High Low Last Close Total Trade Quantity Turnover (Lacs) gain
2010-01-04 2010-01-04 82.9 83.80 82.45 82.80 82.80 19904383.0 16526.18 -0.00
2010-01-05 2010-01-05 83.5 84.90 83.40 84.00 84.10 34197626.0 28820.01 0.01
2010-01-06 2010-01-06 84.5 85.45 83.20 84.70 84.85 34704003.0 29360.29 0.00
2010-01-07 2010-01-07 85.2 85.70 83.85 84.15 84.15 28855332.0 24461.35 -0.01
2010-01-08 2010-01-08 84.0 87.50 84.00 86.60 86.60 49805151.0 43006.54 0.03
In [13]:
plt.plot( unitech_df.Close, 'r' )
Out[13]:
[<matplotlib.lines.Line2D at 0x1144eed68>]

Plot distribution of daily returns

In [14]:
sn.distplot( unitech_df.gain, color = 'b' )
/Users/manaranjan/anaconda/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x11450c7b8>

Calculate summary statisitcs : Mean, Standard Deviation and Confidence Intervals

In [15]:
unitech_df.gain.describe()
Out[15]:
count    1739.000000
mean       -0.003191
std         0.038511
min        -0.360000
25%        -0.020000
50%         0.000000
75%         0.020000
max         0.220000
Name: gain, dtype: float64
In [16]:
unitech_df.gain.mean()
Out[16]:
-0.0031914893617021197
In [17]:
unitech_df.gain.std()
Out[17]:
0.038510734339961376
In [18]:
sn.boxplot( unitech_df.gain )
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x119a96240>
In [19]:
from scipy import stats
In [20]:
unitech_95_ci = stats.norm.interval( 0.95,
                loc=unitech_df.gain.mean(),
                scale=unitech_df.gain.std())
In [21]:
unitech_95_ci
Out[21]:
(-0.078671141686216306, 0.072288162962812069)

VAR - Value At Risk

  • What is the value at risk if invested one lakh?
In [22]:
invest_amt = 100000
In [23]:
invest_amt
Out[23]:
100000
In [24]:
unitech_var = invest_amt * unitech_95_ci[0]
In [25]:
unitech_var
Out[25]:
-7867.1141686216306

Note: Value at risk for 95% CI is 7867.11 rupees, if invested one lakh rupees for day trading in Unitech.

GLAXO

In [27]:
glaxo_df = glaxo_df.set_index(pd.DatetimeIndex(glaxo_df['Date']) )
plt.plot( glaxo_df.Close, 'r' )
Out[27]:
[<matplotlib.lines.Line2D at 0x119e71b70>]

Distribution of daily returns

In [28]:
sn.distplot( glaxo_df.gain, color = 'r' )
/Users/manaranjan/anaconda/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x119e6c8d0>

Summary Statistics

In [29]:
glaxo_df.gain.describe()
Out[29]:
count    1739.000000
mean        0.000127
std         0.014491
min        -0.060000
25%        -0.010000
50%         0.000000
75%         0.010000
max         0.090000
Name: gain, dtype: float64
In [30]:
glaxo_df.gain.mean()
Out[30]:
0.00012650948821161574
In [31]:
glaxo_df.gain.std()
Out[31]:
0.014491221260494113
In [32]:
sn.boxplot( glaxo_df.gain )
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x11a0c77b8>
In [33]:
glaxo_95_ci = stats.norm.interval( 0.95,
                loc=glaxo_df.gain.mean(),
                scale=glaxo_df.gain.std())
In [34]:
glaxo_95_ci
Out[34]:
(-0.02827576227435797, 0.028528781250781204)

VAR - Value At Risk

  • What is the value at risk if invested one lakh?
In [35]:
glaxo_var = invest_amt * glaxo_95_ci[0]
In [36]:
glaxo_var
Out[36]:
-2827.5762274357971

Note: Value at risk for 95% CI is 2827.57 rupees, if invested one lakh rupees for day trading in GLAXO.

BEML

In [38]:
beml_df = beml_df.set_index(pd.DatetimeIndex(beml_df['Date']) )
plt.plot( beml_df.Close, 'r' )
Out[38]:
[<matplotlib.lines.Line2D at 0x11a5f7a58>]

Distribution of daily returns

In [39]:
sn.distplot( beml_df.gain, color = 'r' )
/Users/manaranjan/anaconda/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x11a5faa58>

Summary Statistics

In [40]:
beml_df.gain.describe()
Out[40]:
count    1739.000000
mean       -0.002024
std         0.026170
min        -0.130000
25%        -0.020000
50%         0.000000
75%         0.010000
max         0.190000
Name: gain, dtype: float64
In [41]:
beml_df.gain.mean()
Out[41]:
-0.002024151811385843
In [42]:
beml_df.gain.std()
Out[42]:
0.02616970139990387
In [43]:
sn.boxplot( beml_df.gain )
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x11aabd748>
In [44]:
beml_95_ci = stats.norm.interval( 0.95,
                loc=beml_df.gain.mean(),
                scale=beml_df.gain.std())
In [45]:
beml_95_ci
Out[45]:
(-0.053315824041364861, 0.049267520418593176)

VAR - Value At Risk

  • What is the value at risk if invested one lakh?
In [46]:
beml_var = invest_amt * beml_95_ci[0]
In [47]:
beml_var
Out[47]:
-5331.5824041364858

Note: Value at risk for 95% CI is 5331.58 rupees, if invested one lakh rupees for day trading in BEML.

Comparing the distribution of daily returns for all stocks

In [48]:
sn.set(rc={"figure.figsize": (10, 6)});
sn.distplot( unitech_df.gain, hist = False, color = 'g' )
sn.distplot( glaxo_df.gain, hist = False, color = 'b' )
sn.distplot( infy_df.gain, hist = False, color = 'r' )
sn.distplot( beml_df.gain, hist = False, color = 'y' )
/Users/manaranjan/anaconda/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x11aa122e8>

Note:

  • Unitech stocks have highest variance compared to other stocks, which implies it has highest volatility.
  • Infosys Infosys and Glaxo have very similary behavior as far as daily returns are concerned

What is the probability that we will make a loss of 2% or more?

Unitech

In [57]:
stats.norm.cdf( -0.02,
             loc=unitech_df.gain.mean(),
             scale=unitech_df.gain.std())
Out[57]:
0.33125041931711174

INFY

In [58]:
stats.norm.cdf( -0.02,
             loc=infy_df.gain.mean(),
             scale=infy_df.gain.std())
Out[58]:
0.085747369808852225

GLAXO

In [59]:
stats.norm.cdf( -0.02,
             loc=glaxo_df.gain.mean(),
             scale=glaxo_df.gain.std())
Out[59]:
0.0824352291231088

BEML

In [60]:
stats.norm.cdf( -0.02,
             loc=beml_df.gain.mean(),
             scale=beml_df.gain.std())
Out[60]:
0.24607431479217257

Note:

  • Unitech has 33% probability ( almost one third probability ) of making a loss of 2%.

What is the probability that we will make a gain of 5% or more?

Unitech

In [61]:
1 - stats.norm.cdf( 0.05,
             loc=unitech_df.gain.mean(),
             scale=unitech_df.gain.std())
Out[61]:
0.083606889246355176

GLAXO

In [62]:
1 - stats.norm.cdf( 0.05,
             loc=glaxo_df.gain.mean(),
             scale=glaxo_df.gain.std())
Out[62]:
0.00028910521329439387

INFY

In [63]:
1 - stats.norm.cdf( 0.05,
             loc=infy_df.gain.mean(),
             scale=infy_df.gain.std())
Out[63]:
0.00028367972889942195

BEML

In [64]:
1 - stats.norm.cdf( 0.05,
             loc=beml_df.gain.mean(),
             scale=beml_df.gain.std())
Out[64]:
0.023408411581174793

Note:

  • Unitech has the highest probability of 8%, where as INFY and GLAXO has abosolutely no chance of making a profit o 5% in daily returns.

Correlation Plots

In [65]:
infy_unitech_df = infy_df[['Date', 'gain']].merge(
  unitech_df[['Date', 'gain']],
  on = ['Date'],
  suffixes=('_infy', '_unitech'), how = 'inner' )
In [66]:
infy_unitech_df.head()
Out[66]:
Date gain_infy gain_unitech
0 2011-06-29 0.00 0.00
1 2011-06-30 0.01 0.03
2 2011-07-01 -0.00 0.01
3 2011-07-04 -0.01 0.02
4 2011-07-05 -0.00 -0.02
In [67]:
infy_unitech_df.corr()
Out[67]:
gain_infy gain_unitech
gain_infy 1.00000 0.05476
gain_unitech 0.05476 1.00000
In [68]:
infy_unitech_beml_df = infy_unitech_df.merge(
  beml_df[['Date', 'gain']],
  on = ['Date'], how = 'inner' )
In [69]:
infy_unitech_beml_df[0:5]
Out[69]:
Date gain_infy gain_unitech gain
0 2011-06-29 0.00 0.00 0.03
1 2011-06-30 0.01 0.03 -0.03
2 2011-07-01 -0.00 0.01 0.02
3 2011-07-04 -0.01 0.02 0.00
4 2011-07-05 -0.00 -0.02 0.03
In [70]:
all_df = infy_unitech_beml_df.merge(
  glaxo_df[['Date', 'gain']], suffixes=('_beml', '_glaxo'),
  on = ['Date'], how = 'inner' )
In [71]:
all_df[0:5]
Out[71]:
Date gain_infy gain_unitech gain_beml gain_glaxo
0 2011-06-29 0.00 0.00 0.03 0.02
1 2011-06-30 0.01 0.03 -0.03 0.00
2 2011-07-01 -0.00 0.01 0.02 0.01
3 2011-07-04 -0.01 0.02 0.00 -0.00
4 2011-07-05 -0.00 -0.02 0.03 0.00
In [72]:
all_df.corr()
Out[72]:
gain_infy gain_unitech gain_beml gain_glaxo
gain_infy 1.000000 0.054760 0.035622 0.053890
gain_unitech 0.054760 1.000000 0.267022 0.111034
gain_beml 0.035622 0.267022 1.000000 0.088627
gain_glaxo 0.053890 0.111034 0.088627 1.000000
In [73]:
sn.set(rc={"figure.figsize": (6, 4)});
sn.heatmap( all_df.corr() )
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x11afaea20>

Note:

  • BEML and Unitech are highly correlated when compared to other stocks. That means there is a higher chance that if BEML stocks gain, the unitech stocks will also gain. Similiarily if one stock looses there is a higher chance that the other stock will also loose.