Machine Learning and Data Analytics initially involves going through the data. In this Hands On tutorial, we would be using Python with Pandas for data filtering. The way we filter data in SQL, Pandas also provides several ways to filter the data to perform analysis on a specific set of data.
For this Hands On tutorial for Machine Learning, we would be using IRIS data from UCI Machine Learning Repository:
Lets fetch data and define it as a Pandas Data frame:
import pandas as pd df = pd.read_csv( filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None, sep=',') df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) # drops the empty line at file-end
You may print and check the data by:
# Select rows where df.petal_len is greater than 4.5 df[df['petal_len'] > 4.5]
FILTER WITH ‘AND’ LOGICAL OPERATOR in PANDAS
# Select rows where df.petal_len is greater than 4.5 AND less than 5.5 df[(df['petal_len'] > 4.5) & (df['petal_len'] < 5.5)]
FILTER WITH ‘OR’ LOGICAL OPERATOR in PANDAS
# Select rows where df.petal_len is greater than 5.5 OR less than 1.0 df[(df['petal_len'] > 5.5) | (df['petal_len'] < 2.0)]
FILTER WITH ‘NOT’ OPERATOR in PANDAS
# Select all the classes (Iris flower types) except Iris-virginica df[~(df['class'] == 'Iris-virginica')]
Ali Raza received his Masters Degree in Electronics Engineering which involved Research focused on Machine Learning. He is currently working as a Chief Technical Officer at BitWits (Pvt) Limited, CEO & Founder at DataLysis.io and CEO & Founder at LearningByDoing.io.