Data Science Projects
Call Center Optimization Capstone (Python, R) Spring 2020
• Designed a simulation model for Bushnell’s call center to minimize costs while reducing wait times by 20%.
• Liaised with senior management for requirement gathering and workforce configuration solutions.
• Handled data mining, data warehousing, and data cleaning in collaboration with the CSR regional manager.
• STL and ETS forecasting of call arrivals on a weekly and seasonal basis to support recommendations.
Experience ATX Web Analytics (Facebook Analytics, Google Analytics) Spring 2020
• Developed analytics dashboards and reporting packages with actionable insights for the content creation team.
• Revised planned content to include keywords “coronavirus”, “COVID-19”, and “quarantine” boosted views by 14%
• Discovered that Dynamic Ads on Facebook generated nearly 4x more clicks than organic users.
• Created QR codes and filters for user segmentation and tracking of physical advertising.
Tweet Text Analysis (Python) Fall 2019
• Text analysis of #impeachment tweets using sentiment analysis and Latent Dirichlet allocation (LDA).
• Attributed sentiment for public opinion in swing states across different demographics.
• Scraped 83,560 tweets from Twitter using Tweepy and Twitter API keys.
• Data processed with tokenization, stop words removal, and lemmatization.
Art Classifier Using Neural Nets (Python) Fall 2019
• Used CNN, AlexNet, VGG16, and ResNet for classification of artworks by genre and artist
• Preprocessed data with the ImageDataGenerator class from Keras for better feature extraction.
• Achieved a validation accuracy of 71% for genres and 82% for artists with ResNet.
Google Play Store Apps Project (Python) Summer 2019
• Employed a random forest predictive model to predict whether a Google Play Store app would be popular.
• Applied sentiment analysis on reviews of apps to reveal the key phrases identifying positive and negative traits.
• Tuned hyperparameters and implemented 3-fold cross-validation to increase our model accuracy to 81%.
NYC Yellow Cabs Project (R) Summer 2019
• Trained a predictive model utilizing XGBoost to predict trip durations of taxi cabs in New York City with a RMSE of less than 5 minutes and a prediction power of 82%.
• Performed data cleaning on the Kaggle dataset to remove several outliers which improved RMSE by 3 minutes.
• Applied 8 different predictive models in total to compare the efficiency and accuracy of each model.
Reddit r/worldnews Post (Python, PySpark,) Summer 2019
• LDA
• Logistic Regression
• Sentiment within the subreddit. Flag inciteful phrases, words. Evaluate moderators.
Social Media Team (Digital Marketing)
Instacart Recommendation System (Kaggle Dataset, Python)
Tools used: Rule Association Mining, Clustering, Regression, and PCA