Data Science Projects

Call Center Optimization Capstone (Python, R) Spring 2020Permalink

• Designed a simulation model for Bushnell’s call center to minimize costs while reducing wait times by 20%.

• Liaised with senior management for requirement gathering and workforce configuration solutions.

• Handled data mining, data warehousing, and data cleaning in collaboration with the CSR regional manager.

• STL and ETS forecasting of call arrivals on a weekly and seasonal basis to support recommendations.

Experience ATX Web Analytics (Facebook Analytics, Google Analytics) Spring 2020Permalink

• Developed analytics dashboards and reporting packages with actionable insights for the content creation team.

• Revised planned content to include keywords “coronavirus”, “COVID-19”, and “quarantine” boosted views by 14%

• Discovered that Dynamic Ads on Facebook generated nearly 4x more clicks than organic users.

• Created QR codes and filters for user segmentation and tracking of physical advertising.

Tweet Text Analysis (Python) Fall 2019Permalink

• Text analysis of #impeachment tweets using sentiment analysis and Latent Dirichlet allocation (LDA).

• Attributed sentiment for public opinion in swing states across different demographics.

• Scraped 83,560 tweets from Twitter using Tweepy and Twitter API keys.

• Data processed with tokenization, stop words removal, and lemmatization.

Art Classifier Using Neural Nets (Python) Fall 2019Permalink

• Used CNN, AlexNet, VGG16, and ResNet for classification of artworks by genre and artist

• Preprocessed data with the ImageDataGenerator class from Keras for better feature extraction.

• Achieved a validation accuracy of 71% for genres and 82% for artists with ResNet.

Google Play Store Apps Project (Python) Summer 2019Permalink

• Employed a random forest predictive model to predict whether a Google Play Store app would be popular.

• Applied sentiment analysis on reviews of apps to reveal the key phrases identifying positive and negative traits.

• Tuned hyperparameters and implemented 3-fold cross-validation to increase our model accuracy to 81%.

NYC Yellow Cabs Project (R) Summer 2019Permalink

• Trained a predictive model utilizing XGBoost to predict trip durations of taxi cabs in New York City with a RMSE of less than 5 minutes and a prediction power of 82%.

• Performed data cleaning on the Kaggle dataset to remove several outliers which improved RMSE by 3 minutes.

• Applied 8 different predictive models in total to compare the efficiency and accuracy of each model.

Reddit r/worldnews Post (Python, PySpark,) Summer 2019Permalink

• LDA

• Logistic Regression

• Sentiment within the subreddit. Flag inciteful phrases, words. Evaluate moderators.

Instacart Recommendation System (Kaggle Dataset, Python)

Tools used: Rule Association Mining, Clustering, Regression, and PCA

Victor Nguyen