In the field of Data Science and Machine Learning, the very first thing after getting access to data is to Analyze it. Data Analysis is the most important part of extracting any valuable information from the data.
Before applying any Machine Learning Model or Techniques it is necessary to get to know the data attributes and dimensions in order to treat it accordingly. In this tutorial, we will be using Hands On approach to go through and analyze an actual data which is used for Machine Learning. We will be using Python and Pandas for this purpose and use .loc, .iloc, .ix in Pandas. We will start with loading the data and defining its Labels and Classes as per Data description mentioned in the Machine Learning Data Repository.
import pandas as pd df = pd.read_csv( filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None, sep=',') df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class'] df.dropna(how="all", inplace=True) # drops the empty line at file-end df.head() df.tail() df = df.set_index('class')
SELECTING A COLUMN IN PANDAS:
SELECTING MULTIPLE COLUMN IN PANDAS:
SELECTING ALL ROWS BY INDEX LABEL:
# Select all rows with class 'Iris-virginica' df.loc['Iris-virginica']
SELECTING ROWS IN PANDAS
# Select every row up to 5 df.iloc[:4] # Select the forth and fifth row df.iloc[3:4] # Select every row after the fifth row df.iloc[4:]
SELECTING COLUMNS IN PANDAS
# Select the first 2 columns df.iloc[:,:2]
Ali Raza received his Masters Degree in Electronics Engineering which involved Research focused on Machine Learning. He is currently working as a Chief Technical Officer at BitWits (Pvt) Limited, CEO & Founder at DataLysis.io and CEO & Founder at LearningByDoing.io.