I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.
Performed Principal Component Analysis of High School students performances of 3 major subjects in Maryland State 2014.
Variables:
School Name
Algebra Adv Pct
Algebra Proficient Pct
Algebra Basic Pct
English Adv Pct
English Proficient Pct
English Basic Pct
Biology Adv Pct
Biology Pforiceint Pct
Biology Basic Pct
###Data Sources
[Maryland State][1]
[1]: http://www.mdreportcard.org
Files:
Algebra_2014.csv
Biology_2014.csv
English2_2014.csv
At /etc/Rserv.conf
remote enable
> Rserve(args ="--no-save")
alg = read.csv('files/algebra_2014.csv', header=T)
names(alg) # Check all names
keeps = c("School.Name","Advanced.Pct", "Proficient.Pct", "Basic.Pct" ) # The colomuns to keep
alg = alg[keeps]
names(alg)[2]= "alg.Advanced.Pct"
names(alg)[3] = "alg.Proficient.Pct"
names(alg)[4] = "alg.Basic.Pct"
alg$School.Name = as.character(alg$School.Name)
alg$School.Number = as.numeric(as.character(alg$School.Number))
alg$School.Name = paste(alg$School.Name, alg$School.Number, sep=".")
alg$School.Name = as.factor(alg$School.Name)
alg=alg[,-1]
alg_df = subset(alg, alg.Advanced.Pct!="*" )
eng_df = subset(eng, eng.Advanced.Pct!="*" )
bio_df = subset(bio, bio.Advanced.Pct!="*" )
# English
for (i in 2:dim(eng_df)[2]) {
eng_df[,i] = as.character(eng_df[,i])
eng_df[,i][eng_df[,i]=="<= 5.0"] = "5.0"
eng_df[,i] = as.factor(eng_df[,i])
}
total = merge(alg_df, eng_df, by="School.Name")
total = merge(total, bio_df, by="School.Name")
write.table(total, file='total.csv', sep=',')
pc.cr <- princomp(df, cor=TRUE) # PCA
pc.cr # print PCA object
summary(pc.cr) # summarize results
loadings(pc.cr) # list loadings
res =pc.cr$sdev^2/sum(pc.cr$sdev^2)
scores = pc.cr$score
as.numeric(scores[,1])
Tableau will call R with SCRIPT_*() at “calculated Fields” modifying the above R code as below:
SCRIPT_REAL(
"n <- max(.arg10)
df <- data.frame(.arg1,.arg2,.arg3,.arg4, .arg5, .arg6, .arg7, .arg8, .arg9)
pc <- princomp(df, cor = TRUE)
as.numeric(pc$scores[,n])
",
SUM([alg.Advanced.Pct]),SUM([alg.Proficient.Pct]),SUM([alg.Basic.Pct]),
SUM([bio.Advanced.Pct]), SUM([bio.Proficient.Pct]), SUM([bio.Basic.Pct]),
SUM([eng.Advanced.Pct]), SUM([eng.Proficient.Pct]), SUM([eng.Basic.Pct]),[Component x]
)
Overall look is as below:
You can donwload the workbook at my github page or from direct link in the upper-left side.
http://kb.tableausoftware.com/articles/knowledgebase/r-implementation-notes http://www.tableausoftware.com/about/blog/2013/10/tableau-81-and-r-25327 http://www.r-bloggers.com/dream-team-combining-tableau-and-r/ http://www.tableausoftware.com/learn/whitepapers/tableau-and-r-faq