At the start of summer, I was told I’d be working on a machine learning project for the length of my internship. I was a little nervous, as I only had the knowledge that I learned in my classes to apply. I’d just finished my junior year at Drake University. I am majoring in Data Analytics, Information Systems, and Computer Science. At that point in my education, I had taken a class on artificial intelligence and machine learning.
Getting Started with Machine Learning
It turned out that I didn’t need to worry about being unprepared or that I wasn’t knowledgeable enough. The first month of my internship, I spent much of my time researching the many different ways to apply machine learning. I learned more precisely what it is and how it could be used. Machine learning is such a broad topic that I could have spent the whole summer researching, and I still would have been learning new things about it every day.
When researching, I looked not only at how machine learning both works and can be implemented, but also the different ways you can develop your models. For example, did we have to code everything, or was there a program that could do that for us? It turned out that either of those was an option. We spent weeks doing tutorials for systems, like Azure Machine Learning Studio or Amazon Web Services Machine Learning, and testing the different ways to make models in Python and R.
After researching how machine learning works, we started focusing on different ways the company could implement it. We found many applications that would be a good demo for clients, but from there, we ran into a common problem with machine learning.
Where could we find our data?
This was a problem that stumped us for a while. The internet is full of sources for data, but the ones we found never quite met our needs. We also learned some valuable lessons. While you can use any data set for machine learning, you need to have an understanding of what the data is saying to know if the results are right. The data set should also be fairly large. The more data there is, the better the model will be able to learn the patterns. After searching, and some trial and error, we found the Citi Bike data set.
Creating Our Model
After finding a data set, we started to prepare the data for use. This process is the most time consuming when making a model. Not only do you have to clean the data, as you would when using data for anything, but you have to do feature engineering. Figuring out what features are the most important is a long process that you will likely have to repeat again further down the line. For our problem, the data provided was a lot more granular than we needed. Even once we transformed the data, it was still a large amount.
Once the data was all prepared, we started training the data with different algorithms to see which was the best. There are many popular algorithms for machine learning. With Python and R, there are packages that come with the algorithms pre-coded. The two systems we used had preset algorithms that you could choose from when making your model.
Launching Our Project
When we first started making the models with our data, they didn’t do a good job at predicting. This caused us to have to go back and do more feature engineering. We also tuned the models to have the best parameters for the data. When it came to picking the model to move forward with, we had to look at a few metrics to see the overall accuracy. One metric that the algorithm we ended up choosing used was R2. Our final model had an R2 of .95, which was a good value. We didn’t want one that was much higher than that as it would mean the model could be overfit for future data. Once we had picked our model, the summer was about over. The full-time team members I worked with will continue refining the model to use for demos, and explore the different uses of machine learning in business.
As I finish up my internship, I am looking forward to my senior year at Drake University. The skills that I learned have made me more prepared to start working in this industry after graduation. The time I spent at Zirous this summer has been full of great experiences, learning and otherwise. Everyone knows that internships are a great way to get experience in the industry and figure out the future you want for yourself, but they are also an excellent way to make relationships for when you start your career.