Data Analytics in Python – Data Filtering with Pandas – Learning By Doing

Python Pandas Machine Learning

 

Machine Learning and Data Analytics initially involves going through the data. In this Hands On tutorial, we would be using Python with Pandas for data filtering. The way we filter data in SQL, Pandas also provides several ways to filter the data to perform analysis on a specific set of data.

For this Hands On tutorial for Machine Learning, we would be using IRIS data from UCI Machine Learning Repository:

Lets fetch data and define it as a Pandas Data frame:

 

import pandas as pd

df = pd.read_csv(
    filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
    header=None,
    sep=',')
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']

df.dropna(how="all", inplace=True) # drops the empty line at file-end

You may print and check the data by:

df.head()
df.tail()

Filter Results:

# Select rows where df.petal_len is greater than 4.5
df[df['petal_len'] > 4.5]

FILTER WITH ‘AND’ LOGICAL OPERATOR in PANDAS

# Select rows where df.petal_len is greater than 4.5 AND less than 5.5
df[(df['petal_len'] > 4.5) & (df['petal_len'] < 5.5)]

FILTER WITH ‘OR’ LOGICAL OPERATOR in PANDAS

# Select rows where df.petal_len is greater than 5.5 OR less than 1.0
df[(df['petal_len'] > 5.5) | (df['petal_len'] < 2.0)]

FILTER WITH ‘NOT’ OPERATOR in PANDAS

# Select all the classes (Iris flower types) except Iris-virginica
df[~(df['class'] == 'Iris-virginica')]
Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *