Classification

Logistic Regression

Rather than modeling resoponse Y directly, Logistic regression models the probability that Y belongs to a particular category.

ex)

$Pr(default = Yes|balance)$

$p(X) = \beta_0 + \beta_1X$

p(X) must fall between 0 and 1. So, we need to model p(X) using a function that gives outputs between 0 and 1.

Logistic Function

$p(X) = \frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}$ $\frac{p(X)}{1-p(X)} = e^{\beta_0+\beta_1X}$ $log(\frac{p(X)}{1-p(X)}) = \beta_0+\beta_1X$

That is, increasing X by one unit changes the log odds by $\beta_1$

If $\beta_1 > 0$:

X increase --> p(x) increase

If $\beta_1 < 0$:

X increase --> p(x) decrease

Regression coefficients are estimated using maximum likelihood. Find the coeeficients that makes p(X) as close as 1 or 0 based on the information.

$l(\beta_0, \beta_1) = \prod_{i:y_i=1}p(x_i)\prod_{i':y_{i'}=0}(1-p(x_{i'}))$

The estimates $\beta_0$ and $\beta_1$ are chosen to maximize the likelihood function.

Linear Discriminant Analysis

$\pi_k $ Prior Probability. A randomly chosen observation comes from the kth class
$f_k(X) = Pr(X=x|Y=k)$
$Pr(X=x|Y=k) = \frac{Pr(X=x,Y=k)}{Pr(Y=k)}$ $Pr(Y=k| X=x) = \frac{Pr(X=x,Y=k)}{Pr(X=x)}$

$Pr(Y=k| X=x)$ is refered to Posterior probability.

$Pr(Y=k| X=x) = \frac{Pr(X=x|Y=k) Pr(Y=k)}{Pr(X=x)}$

When:

the classes are well-separated.
n is small and distribution of each X is normal.
more than 2 response classes.

Steps:

Model the distribution of repdictors (X)
Flip it using Bayes' theorem.

K-Nearest Neighbors

KNN is a completely non-parametric approach, that is no assumptions are made about the shape of the decision boundary. : boundary is highly non-linear

Steps:

Identify the K points in the training data that are closest to $x_0$, represented by $N_0$.
Estimates the conditional probability for class j as the fraction of points in $N_0$ whose response values equal j: $$ Pr(Y=j|X=x_0) = \frac{1}{K}\sum_{i\ in \ N}(Y_i=j) $$
KNN applies Bayes rule and classifies the test observation $x_0$ to the class with the largest probability.

K=1	Flexible, Low bias, high variance
K grows	Less flexible, a decision boundary is becoming linear. Low variance, high bias.

Maximal Margin Classifier

p-dimensional hyperplan

$\beta_0 + \beta_1X_1 + ...+\beta_pX_p = 0$

Stpes:

Compute the distance from each training observation to a given separating hyperplane.
margin(the minimal distance from the observations to the hyperplane
Maximal margin hyperplane that has the farthest minimum distance to the training observations.
Although the maximal margin classifier is oten successful, it can also lead to overfitting when p is large.

The maximal margin hyperplane is the solutino to the optimzation problem.

Problems:

The distance can be seen as a measure of the confidence.
Extremely sensitive to a change in a single observation.

Support Vector Classifier

Rather than completely separate the observations, it could be worthwhile to misclassify a few training obsevations in order to do a better job in classifying the remaining observations.

The hyperplane is chosen to separate most of the training observations into the two classes, but may misclassify a few observations.

The differece from maximal margin classifier.

An observation that lies strictly on the correct side of the margin does not affect the support vector classifier. Changing the position of that obsrvation would not change the classifier at all as long as its position remains on the correct side fo the margint.

The difference from LDA

LDA depends on the mean of all of the observations within each class as well as within-class covariance amtrix computed using all of the observations. However, Support Vector Classifier is robust to the behavior of observations that are far away from the hyperplane.

Support Vector Machine

For non-linearity, fit a support vector classifier using 2p features.

$X_1, X_1^2, X_2, X_2^2, .... , X_p, X_p^2$

The support vector machine is an extension of the support vector classifier using kernels. We want to enlarge our feature space in order to accommodate a non-linear boundary between the classes.

[References] [1] James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. Print.

826899093220eaf984aa72f25e91acd80491894a

Chris IJ Hwang

Contents

Home

Data Science/Machine Learning related

Quantitative Finance Modeling and Analysis

Visualization

Math Finance

Others