Hi there!
My name is Selina Wang. I recently graduated from Brown University with a MS in Data Science and completed my undergrad at Tufts where I studied Math and Quantitative Economics.
Currently, I am working as a Data Science Volunteer Intern at The Nature Conservancy, volunteering at the Cultivate Hope Urban Farm in Cedar Rapids, and working on my own data science projects related agriculture, food and sustainability. Check my featured projects below! My most recent projects will be updated soon.
I'm passionate about working at the intersection of data and sustainability, and
I hope to use the power of data in a way that benefits the greater good.
When I'm not nerding out about math or data, I spend my time taking long walks in nature, staring at squirrels, writing short fiction and singing.
Featured Projects
Iowa Corn Yield Prediction Using Machine Learning
I built an end to end pipeline from data loading, cleaning, preprocessing and model training in Python to predict county level corn yield in Iowa. Skills demonstrated in this project include Object-Oriented Programming, maching learning, geospatial data processing.
This is a rather complex project that I split into three stages. Stage 1 is complete while stages 2 and 3 are still in progress.
Visualizing US Beef Consumption and Deforestation
Using data visualizations created in ggplot and Adobe Illustrator, I crafted a data narrative untangling the complex relationship between US beef consumption and deforestation across the globe.
Other Projects
Groundwater Time Series Modeling
This project is based on our team's submission for the groundwater time series modeling challenge hosted by the European Geoscience Union. The goal of this challenge is to predict the groundwater level of a well in Germany using meteorological data. We tried three different models: linear regression with regularization (lasso), Support Vector Machine (SVR), and a Random Forest Regressor. I recently updated our original code to include an LSTM model.
Injury Prediction for Distance Running Athletes
The goal is to predict whether an athlete will get injured on a given day based on their recent training history. I trained 4 different models: Logistic Regression, Random Forest Classifier, SVC and KNN. The most predictive model is Logistic Regression, which gives a mean F_2 score of 0.506 (0.5 standard deviations avobe baseline).
The three biggest challenges I faced while working on this project were working with imbalanced data, dealing with overfitting, and setting up train-val-test sets for a time-series data.
Analyze Gender Bias Using Language Models
Our team used deep learning to analyze gender bias in children's fairy tales. We trained an LSTM model and a transformer model to performance a masked language task, and analyzed the gender bias in these texts by examining the embedding matrices and eigenvalues.
Plant Coverage Rate Estimation Using Classification
This project was part of my internship at Shanghai Roots & Shoots: Million Tree Project, a non-profit organization that plants trees to fight deforestation. I trained a classification algorithm to classifiy pixels in drone images of tree plantations to estimate plant coverage rates.
Image Segmentation for Leaf Disease Detection
I perform image segmentation using spectral clustering to segment leaf images and identify tar spots on maple leaves. I explore the theoretical aspect of constructing a normalized Laplacian, as well as the impacts of different parameter choices on the segmentation.