Portfolio

Course Work

Data Structures, Algorithms, Calculus, Statistics theory, Statistical Methods, Probability, Machine Learning, Deep Learning, Reinforcement Learning, Artificial Intelligence, Operating Systems, Databases, Computer Architecture, Programming Languages.

Projects

1. Linear Regression for Ames Housing Dataset.
Built an explanatory linear model for the data.
Performed model selection using backward elimination and lasso regression.
Applied box cox tranformation while performing model checking.
Used qqplots and residual vs fitted values plot to check for violation of assumptions. code link

2. Crime Data integration for different metropolitan cities.
Collected 20 million crime records for 12 different metropolitan cities.
Performed ETL operation to sanitize data.
Created a common SQL schema with useful attributes to build a MySQL database of 20 million arrest records.
Performed queries on the MySQL database to generate a summary of crimes in particular cities.

2. Sentiment Analysis of Yelp Reviews:
Performed data cleaning.
Implemented a TF-IDF model using PySpark to obtain an AUROC score of 0.9. Created a custom dataset and dataloader using Pytorch for data cleaning and validation.
Built a neural network model that obtained a test accuracy of 93% . Code link
Framework and platform used: PyTorch, Apache Spark and Google Cloud Dataproc cluster for spark job, OSU HPC cluster.

3. Recommendation System for MovieLens Dataset
Created a recommendation system for 10 million movies of MovieLens dataset by implementing collaborative filtering and matrix factorization.
Used nearest neighbor algorithm for the collaborative filtering method and alternating least squares method for matrix factorization.
Framework and platform used: Apache Spark and Google Cloud Platform. You may find the code here.

4. NYPD Crime Data Analysis
Reverse geocoded 4.8 million addresses using open source GIS database Pelias.
Performed webscrapping to map zipcodes to city names and county names.
Ananlysed and generated summary statistics and visualizations for the crimes happened in New York between 2014 and 2017.
Framework and platform used: Google Cloud Platform, Google Maps Platform, BeautifulSoup, Pandas, Seaborn,Tableau, Pelias. Code link

5. Monte Carlo Tree Search For Atari Game Pong
My teammate Ravi and I, created a Monte Carlo Tree Search and deep learning model, to generate a policy for action selection for the atari game Pong. We implemented ideas from the famous deep-learning paper Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning. The optimal policy generated could beat random policy by a significant margin.Framework and platform used: PyTorch, OpenAI gym, OSU pelican cloud. Code link

6. Exploration Map Inpainting
For the deep learning class, I alongwith my friend Manish Saroya, worked on the Darpa Subterranean Challenge to develop a model that could rapidly map complex environments. We created a synthetic map database of 60K images consisting of grid-based maps. Our model could learn loop-closures, T-points and fill image holes. For this project, we used ideas from Partial Convolution Paper to implement our own UNET structure for image inpainting.Framework and platform used: PyTorch, OSU cloud cluster. You may find a complete report on the project here. Code link

7. Lottery Scheduling
Implemented lottery scheduling to schedule processes in xv6 environment. Each process when created would be given a ticket. Whenever a process in the ready queue is to be scheduled, a random ticket is generated and tickets for the processess in the queue are summed up. As soon as the sum goes beyond the random number, a process is scheduled. Framework and platform: C, xv6, OSU server. Here is the code link.

8. Parallel Reinforcement Learning
My teammate Aashish and I, implemented reinforcement learning algorithms like value iteration, policy iteration, Q-learning, SARSA and Deep-Q-Network using Ray library to leverage computation power of multiple CPUs. The parallel implementation significantly improved the overall computation time for larger environments without affecting the resultant policy. All the implementation was done on Intel Dev Cloud. The code can be found here.

9. Gesture Recognition
Trained a deep neural network with softmax cross entropy loss and adam optimization for images containing hand gestures. The trained model attained an accuracy of over 91%. Framework and platform: Tensorflow and pelican cluster of Oregon State University. The jupyter notebook containing code can be found here.

10. Apparel Classification
Trained a CNN model on Nvidia GPUs, to classify 10 types of apparel and other clothing wear. Used Adam optimizer and Cross Entropy loss for training to attain a final loss of 0.0035. jupyter notebook link

11. Web crawling
Used BeautifulSoup library to implement a python script that could scrape reviews of restaurants in San Francisco from Yelp website. Code link

Graduate Research

Deep Learning and Reinforcement Learning

My advisor, Prof. Prasad Tadepalli and I, are working on developing an AI-player algorithm using Monte Carlo Tree Search and Graph neural networks for the game of Klondike Solitaire.