I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.
Rather than modeling resoponse Y directly, Logistic regression models the probability that Y belongs to a particular category.
ex)
p(X) must fall between 0 and 1. So, we need to model p(X) using a function that gives outputs between 0 and 1.
Logistic Function
That is, increasing X by one unit changes the log odds by $\beta_1$
If $\beta_1 > 0$:
X increase --> p(x) increase
If $\beta_1 < 0$:
X increase --> p(x) decrease
Regression coefficients are estimated using maximum likelihood. Find the coeeficients that makes p(X) as close as 1 or 0 based on the information.
The estimates $\beta_0$ and $\beta_1$ are chosen to maximize the likelihood function.
$\pi_k $ Prior Probability. A randomly chosen observation comes from the kth class
$f_k(X) = Pr(X=x|Y=k)$
is refered to Posterior probability.
When:
Steps:
KNN is a completely non-parametric approach, that is no assumptions are made about the shape of the decision boundary. : boundary is highly non-linear
Steps:
K=1 | Flexible, Low bias, high variance |
K grows | Less flexible, a decision boundary is becoming linear. Low variance, high bias. |
p-dimensional hyperplan
Stpes:
The maximal margin hyperplane is the solutino to the optimzation problem.
Problems:
Rather than completely separate the observations, it could be worthwhile to misclassify a few training obsevations in order to do a better job in classifying the remaining observations.
The hyperplane is chosen to separate most of the training observations into the two classes, but may misclassify a few observations.
An observation that lies strictly on the correct side of the margin does not affect the support vector classifier. Changing the position of that obsrvation would not change the classifier at all as long as its position remains on the correct side fo the margint.
LDA depends on the mean of all of the observations within each class as well as within-class covariance amtrix computed using all of the observations. However, Support Vector Classifier is robust to the behavior of observations that are far away from the hyperplane.
For non-linearity, fit a support vector classifier using 2p features.
The support vector machine is an extension of the support vector classifier using kernels. We want to enlarge our feature space in order to accommodate a non-linear boundary between the classes.
[References] [1] James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. Print.
826899093220eaf984aa72f25e91acd80491894a