Chris IJ Hwang

I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.

View My GitHub Profile



Contents

Simple Linear Regression

###Assumption

The dependent variable –> A linear relationship with just one independent variable.
Ex) CAPM or single factor model.

Simple Linear Regression Example from Alexander(2008)

Excel

The excel implementation is introduced in the book. The result will be similar to the following:

alt text


Matlab

The book contains only excel implementation. If it is implemented with matlab, it will be similar to the following:

clear;
clc;
[num, txt] = xlsread('file', 'SheetName');

%% clean data
txtday = txt(2:end, 1);
datenumday = datenum(txtday, 'mm/dd/yyyy');
datestrday = datestr(datenumday, 'yyyymmdd');
tday = str2double(cellstr(datestrday));
idx = num(:,end);

%% Since the data is price, change it to return
r = diff(log(num));

%% Separate data
x = r(:,2); %SP
y = r(:,1); %Amex
list = {'R','beta', 'adjrsquare','rsquare', 'tstat'};
stats = regstats(y, x, 'linear', list);

%% Organize the result
cfs = stats.beta;
arsq = stats.adjrsquare;
rsq = stats.rsquare;
t = stats.tstat.t;
pv = stats.tstat.pval;

%% graph
p = polyfit(x,y,1);
f = polyval(p,x);
figure;
plot(x,y,'.');
hold on
plot(x,f,'-r');
xlabel('SP500');
ylabel('AMEX');
text(-0.02,0.6, ['y= ' , num2str(cfs(2)), 'x + ', num2str(cfs(1)), ', R^2 = ', num2str(rsq)]);

alt text

alt text


R

# Simple linear regression

# Data read from excel file
library(gdata)
data = read.xls("Filename", sheet = 2, header = TRUE) 

# See how it looks
names(data)
attach(data)
layout(1)
plot(SP500, Amex)

# Subsetting data
tickers = c("Amex", "SP500")
prices = data[tickers] 

returns = data.frame(diff(as.matrix(log(prices))))


# Regression
L1 = lm(returns$Amex ~ returns$SP500)
summary(L1)

# graph
plot(returns$Amex ~ returns$SP500)
abline(L1, col="red")

alt text alt text


Python

Method 1: scipy stats linregress

import numpy as np
from scipy import stats
import pandas as pd

dts = pd.read_excel('filep path', 'Sheet2', index_col=0)
log_dts = np.log(dts)

log_rts = log_dts[1:].values-log_dts[:-1]
slope, intercept, r_value, p_value, std_err = stats.linregress( log_rts["SP500"], log_rts["Amex"])
print slope
print intercept
print "r-squared:", r_value**2
print "std error:", std_err
degrees_of_freedom = len(log_rts)-2
predict_y = intercept + slope*log_rts["Amex"]
pred_error = log_rts["SP500"] - predict_y
residual_std_error = np.sqrt(np.sum(pred_error**2)/degrees_of_freedom)

Method 2: pandas OLS

import numpy as np
import pandas as pd
import statsmodels.api as sm

y = log_rts.Amex  # response
X = log_rts.SP500 # predictor
X = sm.add_constant(X) # Adds a constant term to the predictor
est = sm.OLS(y, X)
est = est.fit()
est.summary()

The summary result shows :

alt text

[References] [1] Alexander, Carol. Market Risk Analysis. Vol. I. Chichester, England: Wiley, 2008. Print.