Homework2: Stock Price Analysis
In this homework, you will work on stored procedures and user defined aggregate (extra credit). Total Credit: 10+2 (extra credit).
Dataset. You will download history datasets of three stocks: NVIDIA (nvda.csv), Johnson & Johnson (jnj.csv), and Carnival Corporation (ccl.csv) between 02/26/2020 and 02/25/2025 (inclusive).
Task 1 (2 points). Create a table cse532.stock (stockname, date, close, volume, open, high, low) with proper indexes based on the tasks below if needed. StockName represents the name of the stock, e.g., NVDA, JNJ and CCL. Create a load script to have both datasets loaded. (You can use a staging table or alter a table for adding stock name.)
SQL scripts: createtable.sql (both tables and indexes); load.sql (loading script)
Task 2 (4 points). Discover Anomalies in Stock Prices using a stored procedure. Anomalies in stock prices refer to sudden, unexpected, and statistically significant deviations from the typical price patterns. Here we use Z-Score for such detection. (A discussion on similar topic can be found here.)
Z = (X-μ)/σ
where
X = Current closing price
μ = Mean closing price over the period
σ = Standard deviation of closing prices over the period
Create a stored procedure to return abnormal stock based on a Z-score threshold. Uses window based approach to compute 30-day (30 days PRECEDING) rolling mean and standard deviation. Compute Z-Score for each stock price and filter for anomalies where |Z-Score| > threshold (e.g., 3). Return the result into a table CSE532.stockanomaly (stockname, date, close, rolling_avg, rolling_std, z_score). Test it with an example of Z-score=3 for NVDA in past year: 02/26/2024 and 02/25/2025. Store all SQL scripts in sp.sql. Store the test result in result1.txt.
CREATE PROCEDURE CSE532.DetectStockanomalies (
IN stockname VARCHAR(5),
IN startdate DATE,
IN enddate DATE,
IN threshold FLOAT
)
Task 3 (4 points). Implement the same procedure with JDBC (not Java Stored Procedure). Please consider minimizing querying the table for aggregation to maximize performance. Your Java code can run as: java Stockanomalies (String dbname, String login, String passwd, String stockname, String startDate, String endDate, Double threshold).
Java program: Stockanomalies.java
Note, you will need to include ...IBM/SQLLIB/java/db2jcc4.jar in your Java build path.
Extra Credit (2 point): Implement a user defined aggregate function cse532.PearsonCCuda , which takes two stock names, stock1 and stock2 and return the correlation coefficient as cc. It is like implementing the CORRELATION function by yourself. SQL script: PearsonCCudf.sql. Test your codes with stocks JNJ and CCL.
Pearson correlation coefficient (or Pearson R statistical test) measures the strength between different variables and their relationships. It can be expressed as r in the formula below, where
r = Pearson Coefficient
n= number of the pairs of the stock
∑xy = sum of products of the paired stocks
∑x = sum of the x scores
∑y= sum of the y scores
∑x2 = sum of the squared x scores
∑y2 = sum of the squared y scores
Homework Submission
Please zip all your SQL scripts, Java program and results:
createtable.sql
load.sql
sp.sql
result1.txt
Stockanomalies.java
PearsonCCuda.sql
A readme.txt file explaining how to run each script/program.
Please go to BrightSpace and submit the zip file under Homework 2.