I haven't given an overview on what the next few months will look like. I've broken my self study into 3 concurrent paths:
This path takes a bottom-up approach to machine learning. I'll be spending ~3 hours here daily:
Sequence of courses on Algebra, Linear Algebra, Calculus and Probability & Stats (~2 months).
Mathemathics for Machine Learning (~1 month).
Stanford Machine Learning (~1 month).
deeplearning.ai's Deep Learning (~1 month).
deeplearning.ai's Natural Language Processing (~1 month).
This path takes a top-down approach to machine learning. I'll be spending ~4-5 hours here daily:
DataQuest's data science course (~2 months)
DataQuest's data engineering course (~1 month)
The rest of my time, ~4-5 hours, will be spent on capstone projects. I'm currently working on Nia, though I actually think this may take me the full 5-6 months. It's also my Master's thesis project so there's some admin that goes with it (as well as other taught modules).
With all that said, here's what I've been up to:
Finshed all quizzes for Become an Algebra Master. This was a great recap, covering systems of equations, manipulation of functions and more.
In this project, I visualised how the gender balance in college degrees changed over time. I investigated the following categories:
STEM degrees, like Physics and Computer Science.
Arts degrees, like English and Foreign Languages.
Other degrees, like Business and Public Admin.
This project gave me some more practice with matplotlib.
In this project, I cleaned and analyzed exit surveys to learn how different factors affect employee resignation. I specifically looked at the following:
Length of service at the workplace.
Employment status at the time of resignation.
Gender of the employee.
This project involved a lot of data cleaning, so it was good practice on that front.
In this project, I investigated how different student demographics correlated with SAT scores. I looked at the following:
The perception of school safety.
The percentage of students from a certain race.
The gender balance in a school.
The school percentage who took the Advanced Placement test
I had to work with 6 different datasets on this project, so pretty good practice with cleaning and merging data sets on common columns. It also made me rethink how I structure my notebooks.
I also implemented text search with Postgres. I opted for similarity-based search instead of match-based search. Here's how the approaches would differ if a user typed
hmburger and the only accompanying entry in the database was
double bacon hamburger
Similarity-based: Tries to find the similarity between the word
double bacon hamburger using a trigram approach. Would match with a low confidence score because they're not so similar.
Match-based: Specifically checks if the words
hamburger are in the search query. Would fail completely because "hamburger" has been misspelled.
Pretty much, I opted for similarity-based because it's more like a fuzzy search and I'm able to decide what the success threshold looks like.