Chris IJ Hwang

I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.

View My GitHub Profile



Contents

Classification

Logistic Regression

Rather than modeling resoponse Y directly, Logistic regression models the probability that Y belongs to a particular category.

ex)

p(X) must fall between 0 and 1. So, we need to model p(X) using a function that gives outputs between 0 and 1.

Logistic Function

That is, increasing X by one unit changes the log odds by $\beta_1$

If $\beta_1 > 0$:

X increase --> p(x) increase

If $\beta_1 < 0$:

X increase --> p(x) decrease

Regression coefficients are estimated using maximum likelihood. Find the coeeficients that makes p(X) as close as 1 or 0 based on the information.

The estimates $\beta_0$ and $\beta_1$ are chosen to maximize the likelihood function.

Linear Discriminant Analysis

$\pi_k $ Prior Probability. A randomly chosen observation comes from the kth class
$f_k(X) = Pr(X=x|Y=k)$

is refered to Posterior probability.

When:

Steps:

  1. Model the distribution of repdictors (X)
  2. Flip it using Bayes' theorem.

K-Nearest Neighbors

KNN is a completely non-parametric approach, that is no assumptions are made about the shape of the decision boundary. : boundary is highly non-linear

Steps:

  1. Identify the K points in the training data that are closest to $x_0$, represented by $N_0$.
  2. Estimates the conditional probability for class j as the fraction of points in $N_0$ whose response values equal j: $$ Pr(Y=j|X=x_0) = \frac{1}{K}\sum_{i\ in \ N}(Y_i=j) $$
  3. KNN applies Bayes rule and classifies the test observation $x_0$ to the class with the largest probability.
K=1 Flexible, Low bias, high variance
K grows Less flexible, a decision boundary is becoming linear. Low variance, high bias.

Maximal Margin Classifier

p-dimensional hyperplan

Stpes:

The maximal margin hyperplane is the solutino to the optimzation problem.

Problems:

Support Vector Classifier

Rather than completely separate the observations, it could be worthwhile to misclassify a few training obsevations in order to do a better job in classifying the remaining observations.

The hyperplane is chosen to separate most of the training observations into the two classes, but may misclassify a few observations.

The differece from maximal margin classifier.

An observation that lies strictly on the correct side of the margin does not affect the support vector classifier. Changing the position of that obsrvation would not change the classifier at all as long as its position remains on the correct side fo the margint.

The difference from LDA

LDA depends on the mean of all of the observations within each class as well as within-class covariance amtrix computed using all of the observations. However, Support Vector Classifier is robust to the behavior of observations that are far away from the hyperplane.

Support Vector Machine

For non-linearity, fit a support vector classifier using 2p features.

The support vector machine is an extension of the support vector classifier using kernels. We want to enlarge our feature space in order to accommodate a non-linear boundary between the classes.



[References] [1] James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. Print.

826899093220eaf984aa72f25e91acd80491894a