Isaac Aderogba


Changelog #2 | ML study path

November 29th, 2020

I haven't given an overview on what the next few months will look like. I've broken my self study into 3 concurrent paths:

Maths

This path takes a bottom-up approach to machine learning. I'll be spending ~3 hours here daily:

  1. Sequence of courses on Algebra, Linear Algebra, Calculus and Probability & Stats (~2 months).

  2. Mathemathics for Machine Learning (~1 month).

  3. Stanford Machine Learning (~1 month).

  4. deeplearning.ai's Deep Learning (~1 month).

  5. deeplearning.ai's Natural Language Processing (~1 month).

Data Science

This path takes a top-down approach to machine learning. I'll be spending ~4-5 hours here daily:

  1. DataQuest's data science course (~2 months)

  2. DataQuest's data engineering course (~1 month)

  3. Fast AI's deep learning course, mixed in with weekly kaggle projects (~3 months)

Shifting to the deep learning section will mean more interactive apps with Streamlit instead of static notebooks with Deepnote.

Capstone

The rest of my time, ~4-5 hours, will be spent on capstone projects. I'm currently working on Nia, though I actually think this may take me the full 5-6 months. It's also my Master's thesis project so there's some admin that goes with it (as well as other taught modules).

With all that said, here's what I've been up to:

Maths

Finshed all quizzes for Become an Algebra Master. This was a great recap, covering systems of equations, manipulation of functions and more.

Data Science

Visualising the gender gap by college degree

In this project, I visualised how the gender balance in college degrees changed over time. I investigated the following categories:

  • STEM degrees, like Physics and Computer Science.

  • Arts degrees, like English and Foreign Languages.

  • Other degrees, like Business and Public Admin.

This project gave me some more practice with matplotlib.

Analyzing the resignations of dissatisfied employees

In this project, I cleaned and analyzed exit surveys to learn how different factors affect employee resignation. I specifically looked at the following:

  • Length of service at the workplace.

  • Employment status at the time of resignation.

  • Gender of the employee.

This project involved a lot of data cleaning, so it was good practice on that front.

Investigating the fairness of the scholastic aptitude test (SAT)

In this project, I investigated how different student demographics correlated with SAT scores. I looked at the following:

  • The perception of school safety.

  • The percentage of students from a certain race.

  • The gender balance in a school.

  • The school percentage who took the Advanced Placement test

I had to work with 6 different datasets on this project, so pretty good practice with cleaning and merging data sets on common columns. It also made me rethink how I structure my notebooks.

Capstone

With Nia, I integrated my custom spaCy food model with Rasa. Rasa will take what entities are part of a sentence into account when trying to discern the intent of a user's message.

I also implemented text search with Postgres. I opted for similarity-based search instead of match-based search. Here's how the approaches would differ if a user typed hmburger and the only accompanying entry in the database was double bacon hamburger

  • Similarity-based: Tries to find the similarity between the word hmbrger and double bacon hamburger using a trigram approach. Would match with a low confidence score because they're not so similar.

  • Match-based: Specifically checks if the words double, bacon or hamburger are in the search query. Would fail completely because "hamburger" has been misspelled.

Pretty much, I opted for similarity-based because it's more like a fuzzy search and I'm able to decide what the success threshold looks like.