I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.
###Assumption
The dependent variable –> A linear relationship with just one independent variable.
Ex) CAPM or single factor model.
The excel implementation is introduced in the book. The result will be similar to the following:
The book contains only excel implementation. If it is implemented with matlab, it will be similar to the following:
clear; clc; [num, txt] = xlsread('file', 'SheetName'); %% clean data txtday = txt(2:end, 1); datenumday = datenum(txtday, 'mm/dd/yyyy'); datestrday = datestr(datenumday, 'yyyymmdd'); tday = str2double(cellstr(datestrday)); idx = num(:,end); %% Since the data is price, change it to return r = diff(log(num)); %% Separate data x = r(:,2); %SP y = r(:,1); %Amex list = {'R','beta', 'adjrsquare','rsquare', 'tstat'}; stats = regstats(y, x, 'linear', list); %% Organize the result cfs = stats.beta; arsq = stats.adjrsquare; rsq = stats.rsquare; t = stats.tstat.t; pv = stats.tstat.pval; %% graph p = polyfit(x,y,1); f = polyval(p,x); figure; plot(x,y,'.'); hold on plot(x,f,'-r'); xlabel('SP500'); ylabel('AMEX'); text(-0.02,0.6, ['y= ' , num2str(cfs(2)), 'x + ', num2str(cfs(1)), ', R^2 = ', num2str(rsq)]);
# Simple linear regression # Data read from excel file library(gdata) data = read.xls("Filename", sheet = 2, header = TRUE) # See how it looks names(data) attach(data) layout(1) plot(SP500, Amex) # Subsetting data tickers = c("Amex", "SP500") prices = data[tickers] returns = data.frame(diff(as.matrix(log(prices)))) # Regression L1 = lm(returns$Amex ~ returns$SP500) summary(L1) # graph plot(returns$Amex ~ returns$SP500) abline(L1, col="red")
import numpy as np from scipy import stats import pandas as pd dts = pd.read_excel('filep path', 'Sheet2', index_col=0) log_dts = np.log(dts) log_rts = log_dts[1:].values-log_dts[:-1] slope, intercept, r_value, p_value, std_err = stats.linregress( log_rts["SP500"], log_rts["Amex"]) print slope print intercept print "r-squared:", r_value**2 print "std error:", std_err degrees_of_freedom = len(log_rts)-2 predict_y = intercept + slope*log_rts["Amex"] pred_error = log_rts["SP500"] - predict_y residual_std_error = np.sqrt(np.sum(pred_error**2)/degrees_of_freedom)
import numpy as np import pandas as pd import statsmodels.api as sm y = log_rts.Amex # response X = log_rts.SP500 # predictor X = sm.add_constant(X) # Adds a constant term to the predictor est = sm.OLS(y, X) est = est.fit() est.summary()
The summary result shows :
[References] [1] Alexander, Carol. Market Risk Analysis. Vol. I. Chichester, England: Wiley, 2008. Print.