This summer, I had the opportunity to intern with Zirous, researching and building a project for machine learning. After finishing my first year of college at Iowa State University and majoring in mathematics and data science, I was thrilled to be a part of this project.
When I started my internship, I had a basic understanding of Python and R. I had done some very basic machine learning in Python using Scikit-Learn, but I had limited knowledge on the subject. I had no idea about what to expect from an internship focused specifically on ML. Of course, I knew about the big, flashy examples of ML: Google’s ability to classify spam emails, Netflix’s movie suggestions, and Facebook’s facial recognition. I had little idea of how it could be important to a business.
Why care about machine learning?
This summer, I learned a lot about the importance and potential machine learning has in the world of business. ML allows companies to perform better targeted marketing and customer segmentation, driving revenue and business growth. In today’s world, people are busy; they don’t have time to search through a list of 50 coupons to find one they might use. With machine learning, a company will have the ability to personalize the ads for each consumer.
Additionally, record linkage and database management have ties to machine learning. Often, identifying duplicate records within one database or matching records within two is done manually. ML tools can be used to make this process more efficient.
These are only two of the many use cases for machine learning. A company not prepared to keep up with the evolving technology and to consider what benefits ML could have for them will fall behind.
How to do machine learning?
In order to do machine learning well, it requires background knowledge. I spent the first portion of my internship doing research. Machine learning is a broad subject. The internet is full of tutorials and sample datasets for gaining ML experience.
The next part was a little more difficult. We were ready to create a machine learning demo, but first, we needed data. After scouring the internet for available data that could be a business use case, we discovered New York’s Citi Bike data. This bike share program releases data about each time a bike is checked out. Our goal with this data was to predict the number of riders on a given day.
How did we create our model?
One of the biggest things I took away from my internship this summer is the importance of data. Quality machine learning can not be done without quality data. Machine learning is not magic; it’s math. The algorithms need to be able to find relationships, patterns, structure, and correlations within the data.
This means there is some prep work involved in creating a meaningful model. The most time consuming part of creating our prediction model was not the actual machine learning itself; it was preparing our data. We needed to clean and reshape the data so it fit our needs. We used techniques such as feature engineering to accomplish this.
With our prepared data, we are ready to create our ML model. Through trial and error, our final machine learning project was successful. When given the date and weather conditions, we could predict the number of riders.
As I complete my internship with Zirous, I am excited to continue my education. I want to keep developing the skills I gained this summer to prepare myself for a career in this industry.