I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.
Value-At-Risk(VaR) is an important and effective risk measure. There are several ways to compute this. One famous
method is the one using Monte Carlo Simulation. However, it requires heavy computation. In this project, Apache Spark (with python) is
used to compute the intraday VaR with yahoo historical 5 min data for example.
This project consists of 2 different jobs to implement of risk management by finding Value-At-Risk(VaR)
using 5-minute-tick-data with Apache Spark. One is data collecting job which is coded in Java.
The other is computation job for finding VaR coded in Python with Spark Python API.
This project will give an example of implementing VaR by utlizing not only Hadoop ditributed file system
and spark cluster computing platform but also its computing algorithm with iPython’s interactive analytic functionality.
We used Google Cloud Service.
Directory Structure of Hadoop File System
/– bdproject/
/– data/ : Storing 10-day-5min-Tick-data
/– dailyData/ : Storing 1-day-1min-Tick-data
/– sp500 : file containing all tickers
|-- bdproject/
|-- TickDataReadWrite.java
|-- ReadTickData.jar
|-- GetVar.py
|-- GetVar_least_spark.py
|-- sp500
For 10-day 5-min data
hadoop jar ReadTickData.jar com.cijhwang.hadoop.TickDataReadWrite 0
For 1-day 1-min data
hadoop jar ReadTickData.jar com.cijhwang.hadoop.TickDataReadWrite 1
% run GetVar.py
Basic Alrgorithm is as below:
here is the link to GitHub for this project.