Data Analytics in Python – Data Filtering with Pandas – Learning By Doing

Python Pandas Machine Learning

 

Machine Learning and Data Analytics initially involves going through the data. In this Hands On tutorial, we would be using Python with Pandas for data filtering. The way we filter data in SQL, Pandas also provides several ways to filter the data to perform analysis on a specific set of data.

For this Hands On tutorial for Machine Learning, we would be using IRIS data from UCI Machine Learning Repository:

Lets fetch data and define it as a Pandas Data frame:

 

import pandas as pd

df = pd.read_csv(
    filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
    header=None,
    sep=',')
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']

df.dropna(how="all", inplace=True) # drops the empty line at file-end

You may print and check the data by:

df.head()
df.tail()

Filter Results:

# Select rows where df.petal_len is greater than 4.5
df[df['petal_len'] > 4.5]

FILTER WITH ‘AND’ LOGICAL OPERATOR in PANDAS

# Select rows where df.petal_len is greater than 4.5 AND less than 5.5
df[(df['petal_len'] > 4.5) & (df['petal_len'] < 5.5)]

FILTER WITH ‘OR’ LOGICAL OPERATOR in PANDAS

# Select rows where df.petal_len is greater than 5.5 OR less than 1.0
df[(df['petal_len'] > 5.5) | (df['petal_len'] < 2.0)]

FILTER WITH ‘NOT’ OPERATOR in PANDAS

# Select all the classes (Iris flower types) except Iris-virginica
df[~(df['class'] == 'Iris-virginica')]

Electronics Engineer by book, Software Architect and Technopreneur by passion, Open Source Enthusiast, Problem Hacker, Enabler, Do-Tank, Blogger, Autodidact, Yogi and an avid Reader. Involved in Building Products. Having loads of experience and technical expertise in areas ranging from Full Stack Web Application Development to Big Data Analysis, Modeling, Processing and Visualization, he is currently involved in working on Python, Django, Javascript, SQL, Bootstrap, PostgreSQL, RRD (Round Robin Database), MySQL, MonetDB, LevelDB, BerkeleyDB, Redis, Apache Spark, Pandas, SciPy, NumPy etc.

Ali Raza received his Masters Degree in Electronics Engineering which involved Research focused on Machine Learning. He is currently working as a Chief Technical Officer at BitWits (Pvt) Limited, CEO & Founder at DataLysis.io and CEO & Founder at LearningByDoing.io.

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *