Overview Indexing, Slicing, Filtering, and Shorting

Before we start learning Indexing, Slicing, Filtering, and Shorting, it's a good idea to review and recall learning materials related to the Pandas Library in the Python programming language, because the concepts of Indexing, Slicing, Filtering, and Shorting will usually be used by a Data Scientist and Data Analyst to make it easier to analyze and manage data based on datasets provided using the pandas library.

A little bit of review, what is the Pandas Library? The Pandas library is a library in the Python programming language that is often used and popular around people who often interact with data (such as Data Scientists and Data Analysts), this is because a Data Scientist and Data Analyst usually use the Pandas library to perform data analysis, manage data, manipulate data, and so on. The Pandas library provides data structures and functions that are very useful for a data scientist and data analyst working with tabular data. The pandas library provides a DataFrame and Series which are the two main elements of the Pandas Library that are used to clean data, combine data, and manipulate data.

To be clearer and more complete in learning Library pandas, you can read the article through the following link: https://medium.com/@myskill.id/intro-to-pandas-45536013a6 and through MySkill learning videos through the following link: https://myskill.id/course/intro-to-pandas

Indexing the Pandas Library in Python Language

Indexing is a key element in the Pandas Library that allows a Data Scientist and Data Analyst to identify data, access data, and manipulate data in DataFrames and Series more efficiently and easily according to the desired purpose and based on the given dataset. In the Pandas Library, indexing is usually used for two main purposes, namely as a row index and as a column index.

Indeks Baris atau Row Index

Row Index is usually used to identify each row in a DataFrame. Usually row indexes are formed by labels or integers that represent specific rows. Row indexes allow a Data Scientist and Data Analyst to access the data in a DataFrame based on their row labels that correspond to the data in a given dataset.

# import pandas
import pandas as pd

# List of Tuples
employees = [('Stuti', 28, 'Varanasi', 20000),
						('Saumya', 32, 'Delhi', 25000),
						('Aaditya', 25, 'Mumbai', 40000),
						('Saumya', 32, 'Delhi', 35000),
						('Saumya', 32, 'Delhi', 30000),
						('Saumya', 32, 'Mumbai', 20000),
						('Aaditya', 40, 'Dehradun', 24000),
						('Seema', 32, 'Delhi', 70000)
						]

# Create a DataFrame object from list
df = pd.DataFrame(employees,
								columns  =['Name', 'Age',
								'City', 'Salary'])

# Set index on a DataFrame
df.set_index("Name",
						inplace = True)

# Using the operator .loc[ ] 
# to select multiple rows
result = df.loc[["Stuti", "Seema"]]

# Show the dataframe
result

Untitled

Example of Colum Index in Pandas Library. Source: GeeksforGeeks.

Column Index or Column Index

Column Index is usually used to identify each column in a DataFrame. A Column Index is a form of column name or label that represents a specific column. Column indexes allow a Data Scientist and Data Analyst to access data columns with column names corresponding to the data in a given dataset.

# Get column index from column name i.e column 3
idx=df.columns.get_loc("Duration")
print("Column Index : "+ str(idx))

# Dictionary of Column name with associated index.
idx_dic = {}
for col in df.columns:
		idx_dic[col] = df.columns.get_loc(col)
print(idx_dic)

# Get Index for Multiple Column Labels/Names
query_cols=['Fee','Courses']
cols_index = [df.columns.get_loc(col) for col in query_cols]
print(cols_index)

# Column index from column name using get_indexer().
cols_index = df.columns.get_indexer(query_cols)

Example of Column Index in Pandas Library. Source: Spark By