Scipy is a Python Based Library that is widely used for **Scientific and Technical Computing**, for example – Solving complex mathematical problems. Do note that **Scipy** is built upon the top of **Numpy** which is another Python library widely used for Data Analysis.

Moreover, Scipy makes the job of doing mathematical computing easier by provided fully-featured versions of mathematical functions. For example – Scipy provides ready-to-use functions for doing **Linear Algebra** calculations.

Also, Scipy’s Documentation has been divided into certain **Sub-Packages** which makes it easier to use/learn even for beginners.

**Sub-Packages in Scipy**

Name of Scipy SubPackage | Description |
---|---|

cluster | Clustering Algorithms |

constants | Physical and Mathematical Constants |

fftpack | Fast Fourier Transform Routines |

integrate | Integration and ordinary differential equation solvers |

interpolate | Interpolation and smoothing spliners |

io | Input and Output |

linalg | Linear Algebra |

ndimage | N-Dimensional image processing |

odr | Orthogonal Distance Regression |

optimize | Optimization and root-finding routines |

signal | Signal Processing |

sparse | Sparse matrices and associated routines |

spatial | Spatial Data Structures and Algorithms |

special | Special Functions |

stats | Statistical Distributions and Functions |

That was a little bit of introduction to Scipy. Now let’s see why having it as a skill on a resume can be helpful for getting some job related to Python Development or Data Analysis.

In order to get an idea about **Can Scipy really be helpful skill to land a Job?** I just did a Linkedin Search for keyword “Scipy” and found out that there are 764 Job Roles that were listed in United States on Linked in last month. Then one by one I read descriptions of these Job Roles and guess what? I found that almost every job for **Python Developer** mentions that “It’s preferable to have Scipy Skill/Experience”.

So its clear that Businesses when looking for Python Developers or similar kinds of roles to this are also looking for Scipy Skills. That’s why in order to help you to land a Python Job, I have put together some **commonly asked interview questions about Scipy**.

Just a side note, I’m working on building a **Comprehensive Interview Preparation** resource for Software-related job roles, so far I’ve put together interview questions for numpy, pandas. If you want to check those out, you can click on below links. 😊 😊

- Interview Questions and Answers for Pandas Python Library
- Interview Questions and Answers for Numpy Python Library

Now let’s get into questions/answers about Scipy.

Table of Contents

## Q. 1 – Explain Scipy Library?

Scipy is used for scientific computation. It is a collection of components and provided an eco system for scientific computing.

## Q. 2 – Explain **Normality Tests in Scipy**?

Firstly let’s see **what are Normality Tests?** Normality tests are the ones that test for data distribution in the form of Gaussian, kind of checking whether a given DataSet has Gaussian Distribution or not.

Mathematically there are three Normality Tests – **Shapiro-Wilk Test, D’Agostino’s K^2 Test, Anderson-Darling Test**. For doing all these of these Scipy have specific functions. But in order to perform these tests on some data sets some input assumptions have to be satisfied.**Input Assumptions** are – “Observations in each sample are independent and identically distributed”.

## Q. 3 – Explain how to perform Correlation Tests in Scipy?

Correlation Tests are used for studying relationships between different variables of a DataSet or relation between two different DataSets. Scipy offers ready-made functions for calculating –

- Pearson’s Correlation Coefficient
- Spearman’s Rank Correlation Coefficient
- Kendall’s Rank Correlation Coefficient
- Chi-Squared Test

Let’s now see how to calculate each of these coefficients in Scipy.

### Calculating Pearson’s Correlation Coefficient in Scipy

Pearson’s Correlation Coefficient tests for a linear relationship between two variables. But for using Scipy’s built-in function**(scipy.stats.pearsonr(variable1, variable2))** for calculating Pearson’s Coefficient – some things about DataSet need to be assumed.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both variables need to be independent and identically distributed | H0 – Two variables are independent |

Both variables need to be normally distributed | H1 – Two variables have a dependency on each other |

Both variables need to have the same variance |

#### Scipy Code for calculating Pearson’s Correlation Coefficient

```
import scipy.stats
corr, p = scipy.stats.pearsonr(variable1, variable2)
```

Based upon values of **corr**, **p** from above code it can be concluded whether there is a relationship between variable1 and variable2.

### Calculating **Spearman’s Rank Correlation Coefficient** and **Kendall’s Rank Correlation Coefficient** in Scipy

Both Spearman’s Rank Correlation Coefficient and Kendall’s Rank Correlation Coefficient tests whether a **monotonic relationship** exists between two variables or not.

Here monotonic relationship means – increase in one variable results in increase of other variable or decrease in one variable results in decrease of other variable.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both variables need to be independent and identically distributed | H0 – Two variables are independent |

Both variables can be ranked | H1 – Two variables have a dependency on each other |

#### Scipy Code for calculating Spearman’s Rank Correlation Coefficient

```
import scipy.stats
corr, p = scipy.stats.spearmanr(variable1, variable2)
```

Based upon values of **corr**, **p** from above code it can be concluded whether there is a relationship between variable1 and variable2.

#### Scipy Code for calculating Kendall’s Rank Correlation Coefficient

```
import scipy.stats
corr, p = scipy.stats.kendalltau(variable1, variable2)
```

Based upon values of **corr**, **p** from above code it can be concluded whether there is a relationship between variable1 and variable2.

### Doing Chi-Squared Test in Scipy

Chi-Squared Test is used for checking whether **two categorical variables** are independent or not. Certain things need to be assumed about data before using Scipy’s Buil-in function to do Chi-Square Test.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both variabls are independent to each other | H0 – Two variables are independent |

There need to be atleast 25 or more data points for each of category for both of variables | H1 – Two variables have a dependency on each other |

#### Scipy Code for doing Chi-Square Test

```
import scipy.stats
stat, p, dof, expected = scipy.stats.chi2_contingency(table)
```

Based upon values of **corr**, **p**, **dof**, **expected** from above code it can be concluded whether categorical variables in **table** are independent or not.

## Q. 4 – Explain tests pertaining to Parametric Statistical Hypothesis Tests in Scipy?

**Parametric Statistical Hypothesis** Tests can be used for drawing conclusion about a Population Data Set by taking just a sample of data out of it and performing tests on sample data to calculate certain parameters, which then can be used for doing Hypothesis Testing for Population Data Set.

Most common Parametric Statistical Hypothesis Tests are –

- Student’s t-test
- Paired Student’s t-test
- Analysis of Variance Test(ANOVA)

Let’s now see how to perform each of these tests using Scipy.

### Doing Student’s t-test in Scipy

**Student’s t-test** checks whether **means** of **two independent samples** taken out of **same Population DataSet** are significantly different or not.

But for using Scipy’s Built-in function **ttest_ind(dataSample1, dataSample2)** . Some things about these dataSamples need to be assumed.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both of the Data Samples are independent and identically distributed | H0 – Means of both samples are equal |

Both of the Data Samples are normally distributed | H1 – Means of both samples are unequal |

Both of the Data Samples have same variable |

#### Scipy Code for doing Student’s t-test

```
import scipy.stats as ss
stat, p = ss.ttest_ind(dataSample1, dataSample2)
```

### Doing Paired Student’s t-test in Scipy

**Paired Student’s t-test** tests whether **means of two or more independent samples** are **significantly different or not.**

But for using Scipy’s Built-in function **ttest_rel(dataSample1, dataSample2)**. Some things about these dataSamples need to be assumed.

Function Input Assumptions | Function Output Assumptions |
---|---|

Data in each samples is independent and identically distributed | H0 – means of the samples are equal |

Data in each samples is normally distributed | H1 – means of the samples are unequal |

Data in each sample have same variance | |

Data across each sample is paired |

#### Scipy code for doing Student’s t-test

```
import scipy.stats as ss
stat, p = ss.ttest_rel(dataSample1, dataSample2)
```

### Doing Analysis of Variance Test(ANOVA) Test in Scipy

**Analysis of Variance Test(ANOVA)** tests whether **means of two or more independent samples** are significantly different.

But for using Scipy’s Built-in function **f_oneway(dataSample1, dataSample2)**. Some things about these dataSamples need to assumed.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both of dataSamples are independent and identically distributed | H0 – means of the samples are equal |

Both of dataSamples are normally distributed | H1 – means of the samples are unequal |

Both of dataSamples have same variance |

#### Scipy code for doing Analysis of Variance Test(ANOVA)

```
import scipy.stats
stat, p = scipy.stats.f_oneway(dataSample1, dataSample2)
```

## Q. 5 – Explain how to do **Mann-Whitney U test**, a non-parametric Statistical Hypothesis Test in Scipy?

**Mann-Whitney U Test** tests whether **two independent samples** are equal or not.

But for using Scipy’s Built-in function **mannwhitneyu(dataSample1, dataSample2)**. Some things about these dataSamples need to be assumed.

Function Input Assumptions | Function Output Assumptions |
---|---|

Both of dataSamples are independent and indetically distributed | H0 – Distributions of both dataSamples are equal, so both dataSamples are equal |

Both of dataSamples can be ranked | H1 – Distributions of both dataSamples are not equal, so both dataSamples are not equal |

### Scipy code for doing Mann-Whitney U Test

```
import scipy.stats
stat, p = scipy.stats.mannwhitneyu(dataSample1, dataSample2)
```

## No Comments

Leave a comment Cancel