Machine Learning Hello World in Less than 50 lines

Supervised Learning has proved to be useful when it comes to learning from labeled data. This article aims to help beginners and intermediate students and implement their own machine learning models. Machine learning models can be grouped into three main classes namely Supervised , Unsupervised and reinforcement learning.

What is Supervised Learning?

Imagine having a group of oranges and apples mixed together. You want to somehow sort these but the number of apples and oranges you have is insanely many making the work a little tedious for you. You however have a machine that could do the sorting except it needs to know what distinguishes an apple from am orange. So all you need now is to build a classifier and then embed it into your machine. Let just say you were successful in building the classifier using some features from both fruits and your machine now sorts the apples from the oranges. It's not by magic that your machine sorts the apples but instead through a process called Supervised Learning. 

First Machine Learning Project

Now that you have a general idea of how supervised learning works, we are going to implement a model to classify some labels based on a given dataset we have. 

Things you will need installed:
  • Python ( 2 or 3) 
  • pandas
  • Numpy
  • sci-kit learn
If you have anaconda installed you don't need to install these as anaconda comes with python and the libraries above already installed. Having Problem? feel free to shoot me a message of the problem and I will be glad to help.

I will be using google's open source online Jupyther Notebook for this tutorial. Our dataset will be the classic Iris dataset. the first thing we need to do is import the data. Fortunately sci-kit learn already comes with this dataset so no need to download. the rest of the project is pure python code, a snippet of which can be found below. Full code is on my github repository.


# Import libraries needed
from sklearn import datasets
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Get the iris Dataset
ds = datasets.load_iris()
print('label_names: {}'.format(ds.target_names))
print('feature_names: {}'.format(ds.feature_names))


Full version of code with outputs can be found here.

Comments

Popular Posts