Chris IJ Hwang

I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.

Value-At-Risk(VaR) is an important and effective risk measure. There are several ways to compute this. One famous method is the one using Monte Carlo Simulation. However, it requires heavy computation. In this project, Apache Spark (with python) is used to compute the intraday VaR with yahoo historical 5 min data for example.
This project consists of 2 different jobs to implement of risk management by finding Value-At-Risk(VaR) using 5-minute-tick-data with Apache Spark. One is data collecting job which is coded in Java. The other is computation job for finding VaR coded in Python with Spark Python API. This project will give an example of implementing VaR by utlizing not only Hadoop ditributed file system and spark cluster computing platform but also its computing algorithm with iPython’s interactive analytic functionality. We used Google Cloud Service.


Software Packages

|-- bdproject/
      |-- ReadTickData.jar
      |-- sp500


  1. Replace the Hadoop File System paths in the accordingly based on your particular directory structure. However, Yahoo Finance url should not change.
  2. Create Jar file then run it. Use Cron or other time based job scheduler for automatic data collecting. 10-day data will be collected once a day before market opens. 1-min data will be collected every 5 minutes while market opens
    • example:

    For 10-day 5-min data

    hadoop jar ReadTickData.jar com.cijhwang.hadoop.TickDataReadWrite 0

    For 1-day 1-min data

    hadoop jar ReadTickData.jar com.cijhwang.hadoop.TickDataReadWrite 1

  3. Start the ipython with pyspark loaded
  4. Connect to ipython server and run
    • example:

    % run


Basic Alrgorithm is as below:


here is the link to GitHub for this project.