Pandas is a Python-based library which can be used for Data Manipulation and Analysis. Nowadays a number of companies and even small businesses are collecting tonne of data literally everyday, this has happened because of lowering of costs for storing Data specifically in Clouds-like AWS or Microsoft Azure.
But the next question for Businesses to ask is, how to use that data for business decision making? Otherwise, there is no use of paying to Cloud Companies for collecting data.
So for drawing decisions from collected data and using those decisions for either improving Business Decision Making or for improving the products which Business is offering.
So in order to improve Business Decision Making or Improving Products, data already collected need to be well analysed. For doing businesses are continuously hiring people who can help for analyzing data.
For doing this analysis of data, there are a lot of tools available in the Software Market, Pandas Python Library is one of these.
Here in this article, I’ve collected together some commonly asked Pandas-related questions in the Job Interviews.
Just to give some context around why learning Pandas is important, I just did a search on Linkedin for the keyword “Pandas” under Job Search and filtered down results for “Past Month” and “United States”. It popped up with “2701 results” meaning there were atleast 2701 jobs posted on Linkedin in USA in last month. From this it can be estimated how many Businesses are looking for people who know “Pandas Python Library“.

Similar is true for other countries like Canada, Australia, India etc.
So in this day and age, its crucial to learn how to use “Pandas Python Library” for getting a job role like Data Engineer or Data Scienctist.
Table of Contents
Q. 1 – What is Pandas Python Library?
Pandas is a Python Library that helps in the easier representation of data in memory to perform analysis. Pandas help in faster representation and processing of data.
Q. 2 – How does Pandas represent data?
Pandas data representation is in similar line to an excel sheet which consists of row and columns.

Columns in Pandas are known as | Series |
The collection of series is called | Data Frame |
Q. 3 – How to create Series in Pandas?

Q. 4 – How to create Data Frame in Pandas?
Data Frame in Pandas can be created either directly from a dictionary or by combining various series.
import pandas as pd
country_population = {'India': 1100000, 'China': 45679000, 'USA': 3400000}
population = pd.Series(country_population)
#print(population)
country_land = {'India': '2000 hectares', 'China': '4000 hectares', 'USA': '3000 hectares'}
area = pd.Series(country_land)
#print(area)
df = pd.DataFrame({'Population': population, 'SpaceOccupied': area})
print(df)
Output of Above Code

Q. 5 – How are missing values represented in Pandas DataFrame?
In Pandas DataFrame missing values are represented as NaN
import pandas as pd
missing = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
# {'a': 1, 'b': 2} and {'b': 3, 'c': 4}
# Value of c is not in first dictionary, but value of a is not in second dictionary
print(missing)
Output of Above Code

Q. 6 – Explain the process of creating indexes in pandas?
Indexes can be created using Pandas’ Index function. Indexes support intersection and union.
import pandas as pd
index_A = pd.Index([1, 3, 5, 7, 9])
index_B = pd.Index([2, 3, 5, 7, 11])
Q. 7 – Explain various attributes associated with Pandas Series.
Pandas Series attribute | Description |
---|---|
Series.axes | Stands for row |
Series.dtype | Data type of the object is given by this attribute |
Series.empty | Check if Series is empty |
Series.ndim | Dimensions of data are given back |
Series.size | Size or number of elements from data are given |
Series.values | Gets the values in the form of ndarray |
Series.head() | First n rows are returned |
Series.tail() | Last n rows are returned |
Q. 8 – Explain various statistical measures supported by Pandas.
Statistical Measure in Pandas | Description |
---|---|
axes | Print row index as well as column index |
sum | Calculates sum of all series |
mean | Calculates mean of all series |
median | Calculates median of all series |
std | Calculates standard deviation |
count | Calculates sum of various series |
cumsum | Calculates cumulative sum |
Q. 9 – Explain reindexing in Pandas.
Reindexing allows us to modify the index of one data frame by keeping the other data frame as a reference.
Q. 10 – Explain bfill and ffill.
While reindexing NaN can be introduced .bfill and .ffill are used to handle NaN
- bfill – Fills the value from ahead value into the previous NaN value
- ffill – Fills the value from behind value into the missing NaN value
Reindexing without using any method(bfill or ffill)
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['col1', 'col2', 'col3'])
print(df2.reindex_like(df1))
Output of Above Code

Reindexing with using methods(bfill or ffill)
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(4, 3), columns=['col1', 'col2', 'col3'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['col1', 'col2', 'col3'])
print(df2.reindex_like(df1, method='ffill')) # Or method='bfill'
Output of above code if method=’ffill’

Output of above code if method=’bfill’

Q. 11 – What all type of iterations are provided in Pandas Data Frame?
Iterator for Pandas Data Frame | Description |
---|---|
iteritems() | To iterate over the (key, value) pairs |
iterrows() | Iterate over the rows as (index, series) pairs |
itertuples() | Iterate over the rows as namedtuples |
Q. 12 – How to sort column of Data Frame in Pandas?
- sort_index – Allows sorting based rowwise or column wise
- sort_values – Allows sorting based on values in a column
Creating a Sample DataFrame
import pandas as pd
d = {'col1': [10, 93, 16, 23, 81, 283, 10], 'col2': [19, 145, 195, 1952, 785, 543, 83782]}
df = pd.DataFrame(data=d)
Output of Above Code

Using sort_index for Sorting Sample Data Frame
sorted_df = df.sort_index(ascending=False)
print(sorted_df)
Output of Above Code

Using sort_values for Sorting Sample Data Frame
sorted_df = df.sort_values(by='col1')
print(sorted_df)
Output of Above Code

Q. 13 – How to override default reload option in Pandas?

Q. 14 – Explain various DataFrame slicing options available in Pandas?
- .loc() – Slicing DataFrame based upon Label
- .iloc() – Slicing DataFrame based on Interger
- .ix() – Slicing DataFrame based on both Label and Integer
Q. 15 – How can we handle NaN values in Pandas DataFrame?
NaN values in a Pandas DataFrame can be handled in the following three ways: –
- dropna – Removing all the rows in DataFrame for which values in column are NaN
- pad – Replacing NaN values with previous non NaN values meaning replacing NaN with value just above it in same column
- backfill – Replacing NaN values with ahead non NaN values meaning replacing NaN with value just below it in same column
Q. 16 – Explain “group by” function in Pandas?
group_by allows to group data in a DataFrame based on single or multiple columns.
Q. 17 – Explaing “merge function” in Pandas?
Data Frame in Pandas support merge operations in which two related data from diverse data frames can be brought in a single view.
There are different ways through which different DataFrames can be merged together. Below are some of these ways: –
Merging using a column as id
So if we have two dataframes let’s say – df1 and df2 having data as in following table. Then merging by ‘Name’ will create a new DataFrame containing all rows for Name which are in df2 but not in df1.
See below tables, these will make scenario little bit clearer.
df1
Index | Name | Age |
---|---|---|
0 | Bob | 24 |
1 | John | 34 |
2 | Garry | 18 |
3 | Smith | 26 |
df2
Index | Name | Birth Place |
---|---|---|
0 | Bob | Austin |
1 | John | Miami |
2 | Garry | New York |
import pandas as pd
merged_dataframe = pd.merge(df1, df2, on='Name')
print(merged_dataframe)
Output of Above Code
Index | Name | Age | Birth Place |
---|---|---|---|
0 | Bob | 24 | Austin |
1 | John | 34 | Miami |
2 | Garry | 18 | New York |
Doing a Left Merger of DataFrames
In Left Merger, all data from left side will come and only those matching from right would come.
Below is the code for doing Left Merger of two dataframes – df1, df2. (See picture just below code to better understand how Left Merger works in Pandas)
import pandas as pd
merged_dataframe = pd.merge(df1, df2, on='Name', how='left')
# df1 being left side
# df2 being right side
print(merged_dataframe)

Doing a Right Merger of DataFrames
In right merge everything from right side comes and only matching in left would come else it would come as NaN. Below is the code for doing Right Merger of two dataframes – df1, df2. (See picture just below code to better understand how Right Merger works in Pandas)
import pandas as pd
merged_dataframe = pd.merge(df1, df2, on='Name', how='right')
# df1 being left side
# df2 being right side
print(merged_dataframe)

Doing an Outer Merger of DataFrames
Data from both left and right DataFrames will come together and all non-existing values will be replaced by NaN. Below is the code for doing Outer Merger of two dataframes – df1, df2. (See picture just below code to better understand how Outer Merger works in Pandas)

Q. 18 – Explain Pandas’s concat method?
concat method can be used for combining two different data frames either at row level or column level.
- For combining rows, just putting one DataFrame on top of another – pd.concat([Top DataFrame, Bottom DataFrame]) can be used.
- For combining columns, just putting one DataFrame on right side of another – pd.concat([Left DataFrame, Right DataFrame], axis=1) can be used.
For better understanding Pandas’s concat method let’s have a look at two examples. Suppose that we have two DataFrames df1, df2 which contain following data.
df1
Index | Name | Age |
---|---|---|
0 | Bob | 24 |
1 | John | 34 |
2 | Garry | 18 |
3 | Smith | 26 |
df2
Index | Name | Birth Place |
---|---|---|
0 | Bob | Austin |
1 | John | Miami |
2 | Garry | New York |
Putting one DataFrame on Top of another – pd.concat([Top DataFrame, Bottom DataFrame])
print(pd.concat([df1, df2]))
Output of Above Code
Index | Name | Age | Birth Place |
---|---|---|---|
0 | Bob | 24.0 | NaN |
1 | John | 34.0 | NaN |
2 | Garry | 18.0 | NaN |
3 | Smith | 26.0 | NaN |
0 | Bob | NaN | Austin |
1 | John | NaN | Miami |
2 | Garry | NaN | New York |
Putting one DataFrame on Left side of another – pd.concat([Top DataFrame, Bottom DataFrame], axis=1)
print(pd.concat([df1, df2], axis=1))
Output of Above Code
Index | Name | Age | Name | Birth Place |
---|---|---|---|---|
0 | Bob | 24 | Bob | Austin |
1 | John | 34 | John | Miami |
2 | Garry | 18 | Garry | New York |
3 | Smith | 26 | NaN | NaN |
Q. 19 – Explain all Data File Types which either can be read or written by Pandas?
Data File Type | Pandas Function For Reading | Pandas Function For Writing |
---|---|---|
CSV | read_csv | to_csv |
JSON | read_json | to_json |
HTML | read_html | to_html |
MS Excel | read_excel | to_excel |
HDF5 Format | read_hdf | to_hdf |
Feather Format | read_feather | to_feather |
Parquet Format | read_parquet | to_parquet |
Msgpack | read_msgpack | to_msgpack |
Stata | read_stata | to_stata |
SAS | read_sas | No Function for this |
Python Pickle Format | read_pickle | to_pickle |
SQL | read_sql | to_sql |
Google Big Query | read_gbq | to_gbq |
Q. 20 – Compare functions in Pandas and R?
Filtering, Sampling, Querying Functions in Pandas versus R
R | Pandas |
---|---|
dim(dataframe) | dataframe.shape |
head(dataframe) | dataframe.head() |
slice(dataframe, 1:100) | dataframe.iloc[:99] |
filter(dataframe, column1 == 1, column2 == 1) | dataframe.query(‘column1 == 1 & column2 == 1’) |
select(dataframe, column1, column2) | dataframe[[‘column1’, ‘column2’]] |
select(dataframe, column1:column3) | dataframe.loc[:, ‘column1′:’column3’] |
distinct(select(dataframe. column1)) | dataframe[[‘column1’]].drop_duplicates() |
sample_n(dataframe, 10) | dataframe.sample(n=10) |
sample_frac(dataframe, 0.01) | dataframe.sample(frac=0.01) |
Sorting Functions in Pandas versus R
R | Pandas |
---|---|
arrange(dataframe, column1, column2) | dataframe.sort_values([‘column1’, ‘column2’]) |
arrange(dataframe, desc(column1)) | dataframe.sort_values(‘column1’, ascending=False) |
Transforming Functions in Pandas versus R
R | Pandas |
---|---|
select(dataframe, col_one = column1) | dataframe.rename(columns={‘column1′:’col_one’})[‘col_one’] |
rename(dataframe, col_one = column1) | dataframe.rename(columns={‘column1′:’col_one’}) |
mutate(dataframe, c = a – b) | dataframe.assign(c = dataframe.a-dataframe.b) |
Aggregate/Grouping Functions in Pandas versus R
R | Pandas |
---|---|
summary(dataframe) | dataframe.describe() |
gdatafram <- group_by(dataframe, column1) | gdataframe = dataframe.groupby(‘column1’) |
summarise(gdataframe, avg=mean(column1, na.rm=TRUE)) | dataframe.groupby(‘column1’).agg({‘column1′:’mean’}) |
summarise(gdataframe, total =- sum(column1)) | dataframe.groupby(‘column1’).sum() |
No Comments
Leave a comment Cancel