Overview

  • This data is taken from moneycontrol.com for different mututal funds returns, which primarily invest in large caps. The data contains returns for 1 year, 2 years, 3 years, 5 years.
  • We will do some basic statistical analysis to understand usage of scipy APIs provided to carry out these analysis.
  • Some of the questions we will answer while analyzing this data are,
    • What is the probability that there will zero or negative returns if remain invested for 1 Year in large cap MFs?
    • What is the probability that the returns will be more than 6% ( interest rates for fixed deposits )?
    • What is the range of returns for a 95% confidence interval?
    • What is the probability of zero or no returns when remain invested for 3 or 5 years?
    • Are the average returns of investing for 3 years and 5 years different?
    • Does the return on large caps higher than investing on fixed deposits? Assume average return for fixed deposits is 8%.
    • What is the difference in average returns between large caps funds with AUM less 500 crores and more than 500 crores?
    • At 95% confidence interval what are the returns for large caps with AUM > 500 and AUM < 500
  • Basic statistical functions like calculating cummulative distribution, confidence interval, doing one sample t-test or two sample t-test are provided by scipy.stats package.
In [2]:
import pandas as pd
import numpy as np
In [3]:
mf_large = pd.read_csv( 'mf-large.csv' )
In [4]:
mf_large.head( 10 )
Out[4]:
mfname aum ret_1yr ret_2yrs ret_3yrs ret_5yrs
0 LIC NOMURA Index Sen Adv-Direct (G) 0.09 15.5 25.8 -- --
1 LIC NOMURA Index - Sensex Adv (G) 1.46 14.8 25.1 18.6 10
2 Indiabulls Blue Chip Fund - Dir (G) 10.89 11.6 11.5 13.9 --
3 Escorts Leading Sectors -Direct (G) 0.52 11 26.2 27.5 --
4 Escorts Leading Sectors (G) 2.24 10.9 25.9 27.1 17
5 SBI Blue Chip Fund - Direct (G) 1256.49 10 19.4 22.5 --
6 Indiabulls Blue Chip Fund (G) 397.05 9.4 9.6 12.2 --
7 Escorts Growth Plan - Direct (G) 0.41 8.7 17.5 25.9 --
8 Escorts Growth Plan (G) 6.12 8.7 17.3 25.5 12.9
9 SBI Blue Chip Fund (G) 5981.62 8.7 18.3 21.5 16.2

Analyzing the large caps

Basic Statistics about large caps

In [5]:
mf_large.replace( "--", "", inplace = True )
mf_large = mf_large.convert_objects( convert_numeric=True )
In [6]:
mf_large.head( 10 )
Out[6]:
mfname aum ret_1yr ret_2yrs ret_3yrs ret_5yrs
0 LIC NOMURA Index Sen Adv-Direct (G) 0.09 15.5 25.8 NaN NaN
1 LIC NOMURA Index - Sensex Adv (G) 1.46 14.8 25.1 18.6 10.0
2 Indiabulls Blue Chip Fund - Dir (G) 10.89 11.6 11.5 13.9 NaN
3 Escorts Leading Sectors -Direct (G) 0.52 11.0 26.2 27.5 NaN
4 Escorts Leading Sectors (G) 2.24 10.9 25.9 27.1 17.0
5 SBI Blue Chip Fund - Direct (G) 1256.49 10.0 19.4 22.5 NaN
6 Indiabulls Blue Chip Fund (G) 397.05 9.4 9.6 12.2 NaN
7 Escorts Growth Plan - Direct (G) 0.41 8.7 17.5 25.9 NaN
8 Escorts Growth Plan (G) 6.12 8.7 17.3 25.5 12.9
9 SBI Blue Chip Fund (G) 5981.62 8.7 18.3 21.5 16.2
In [7]:
mf_large.describe()
Out[7]:
aum ret_1yr ret_2yrs ret_3yrs ret_5yrs
count 132.000000 118.000000 116.000000 115.000000 61.000000
mean 786.754773 2.326271 9.831897 15.845217 10.349180
std 2185.711721 4.449670 4.963214 3.858489 2.652774
min 0.000000 -16.700000 -0.900000 7.400000 4.400000
25% 3.487500 0.325000 6.800000 13.400000 8.500000
50% 73.090000 2.350000 9.500000 15.900000 10.500000
75% 422.302500 4.100000 12.025000 17.650000 11.600000
max 18402.000000 15.500000 26.200000 27.500000 17.000000
In [8]:
import matplotlib.pyplot as plt
import seaborn as sn
%matplotlib inline
In [9]:
plt.figure( figsize = ( 10, 8 ) )
sn.distplot( mf_large['ret_1yr'].dropna() )
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x91db978>
In [12]:
plt.figure( figsize = ( 10, 8 ) )
sn.distplot( mf_large['ret_3yrs'].dropna() )
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x93d9278>
In [108]:
plt.figure( figsize = ( 10, 8 ) )
sn.distplot( mf_large['ret_1yr'].dropna(), color = 'b' )
sn.distplot( mf_large['ret_3yrs'].dropna(), color = 'r' )
sn.distplot( mf_large['ret_5yrs'].dropna(), color = 'g' )
Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0xab09e80>

Exploration

  • What is the probability of making a loss for 1 year investment in largecap MFs?
In [86]:
sn.distplot( mf_large['ret_1yr'].dropna() )
Out[86]:
<matplotlib.axes._subplots.AxesSubplot at 0xa776ba8>
In [96]:
import scipy.stats as stats

Summary Statistics of all Columns

In [91]:
lcap_stats = mf_large.describe()
In [93]:
lcap_stats
Out[93]:
aum ret_1yr ret_2yrs ret_3yrs ret_5yrs
count 132.000000 118.000000 116.000000 115.000000 61.000000
mean 786.754773 2.326271 9.831897 15.845217 10.349180
std 2185.711721 4.449670 4.963214 3.858489 2.652774
min 0.000000 -16.700000 -0.900000 7.400000 4.400000
25% 3.487500 0.325000 6.800000 13.400000 8.500000
50% 73.090000 2.350000 9.500000 15.900000 10.500000
75% 422.302500 4.100000 12.025000 17.650000 11.600000
max 18402.000000 15.500000 26.200000 27.500000 17.000000
In [99]:
## How to access particular value in the above dataframe
lcap_stats['ret_1yr']['mean']
Out[99]:
2.3262711864406778

What is the probability that there will zero or negative returns if remain invested for 1 Year in large cap MFs?

In [101]:
stats.norm.cdf( 0,
             loc=lcap_stats['ret_1yr']['mean'],
             scale=lcap_stats['ret_1yr']['std'] )
Out[101]:
0.30055798098625741

What is the probability that the returns will be more than 6% ( interest rates for fixed deposits )?

In [102]:
1 - stats.norm.cdf( 6,
                 loc=lcap_stats['ret_1yr']['mean'],
                 scale=lcap_stats['ret_1yr']['std'] )
Out[102]:
0.20451031969712574

What is the range of returns for a 95% confidence interval?

In [104]:
stats.norm.interval( 0.95,
                  loc=lcap_stats['ret_1yr']['mean'],
                  scale=lcap_stats['ret_1yr']['std'] )
Out[104]:
(-6.3949211459713364, 11.047463518852693)

What is the probability of zero or no returns when remain invested for 3 or 5 years?

In [105]:
stats.norm.cdf( 0,
             loc=lcap_stats['ret_3yrs']['mean'],
             scale=lcap_stats['ret_3yrs']['std'] )
Out[105]:
2.0077490148176205e-05
In [106]:
stats.norm.cdf( 0,
             loc=lcap_stats['ret_5yrs']['mean'],
             scale=lcap_stats['ret_5yrs']['std'] )
Out[106]:
4.7845096012600167e-05

Are the average returns of investing for 3 years and 5 years different?

In [113]:
stats.ttest_ind( mf_large.ret_3yrs.dropna(), mf_large.ret_5yrs.dropna() )
Out[113]:
(9.9419011808405244, 1.0020210583927789e-18)
In [119]:
stats.ttest_ind( mf_large.ret_1yr.dropna(), mf_large.ret_5yrs.dropna() )
Out[119]:
(-12.933595021062192, 2.2946763121299832e-27)

Does the return on large caps higher than investing on fixed deposits? Assume average return for fixed deposits is 8%.

In [123]:
stats.ttest_1samp( mf_large.ret_5yrs.dropna(), 8.0 )
Out[123]:
(6.9164157296297644, 3.4728888928401157e-09)

What is the difference in average returns between large caps funds with AUM less 500 crores and more than 500 crores?

In [127]:
mf_large_500 = mf_large[mf_large.aum > 500.00]
In [128]:
mf_large_500.describe()
Out[128]:
aum ret_1yr ret_2yrs ret_3yrs ret_5yrs
count 30.000000 30.000000 30.000000 30.000000 21.000000
mean 3172.141667 3.113333 10.626667 17.170000 12.071429
std 3729.145562 3.040047 4.066510 2.817085 2.105740
min 524.450000 -2.700000 4.000000 12.500000 7.500000
25% 1125.095000 0.950000 7.025000 15.200000 10.700000
50% 1581.055000 3.050000 10.500000 16.650000 11.700000
75% 3428.535000 4.900000 13.250000 18.750000 13.700000
max 18402.000000 10.000000 19.400000 23.100000 16.200000
In [126]:
plt.figure( figsize = ( 10, 8 ) )
sn.distplot( mf_large_500['ret_1yr'].dropna(), color = 'b' )
sn.distplot( mf_large_500['ret_3yrs'].dropna(), color = 'r' )
sn.distplot( mf_large_500['ret_5yrs'].dropna(), color = 'g' )
Out[126]:
<matplotlib.axes._subplots.AxesSubplot at 0xaeff390>
In [133]:
stats.ttest_ind( mf_large_500.ret_5yrs.dropna(), mf_large.ret_5yrs.dropna() )
Out[133]:
(2.6936050749970497, 0.0086091288801141228)

At 95% confidence interval what are the returns for large caps with AUM > 500 and AUM < 500?

In [134]:
stats.norm.interval( 0.95,
                  loc=mf_large_500.ret_5yrs.mean(),
                  scale=mf_large_500.ret_5yrs.std() )
Out[134]:
(7.9442531219487114, 16.19860402090843)
In [135]:
stats.norm.interval( 0.95,
                  loc=mf_large.ret_5yrs.mean(),
                  scale=mf_large.ret_5yrs.std() )
Out[135]:
(5.1498396580148125, 15.548520997722893)