Scipy Interview Questions and Answers

Scipy is a Python Based Library that is widely used for Scientific and Technical Computing, for example – Solving complex mathematical problems. Do note that Scipy is built upon the top of Numpy which is another Python library widely used for Data Analysis.

Moreover, Scipy makes the job of doing mathematical computing easier by provided fully-featured versions of mathematical functions. For example – Scipy provides ready-to-use functions for doing Linear Algebra calculations.

Linear Algebra Functions in Scipy Python Library

Also, Scipy’s Documentation has been divided into certain Sub-Packages which makes it easier to use/learn even for beginners.

Sub-Packages in Scipy

Name of Scipy SubPackageDescription
clusterClustering Algorithms
constantsPhysical and Mathematical Constants
fftpackFast Fourier Transform Routines
integrateIntegration and ordinary differential equation solvers
interpolateInterpolation and smoothing spliners
ioInput and Output
linalgLinear Algebra
ndimageN-Dimensional image processing
odrOrthogonal Distance Regression
optimizeOptimization and root-finding routines
signalSignal Processing
sparseSparse matrices and associated routines
spatialSpatial Data Structures and Algorithms
specialSpecial Functions
statsStatistical Distributions and Functions

That was a little bit of introduction to Scipy. Now let’s see why having it as a skill on a resume can be helpful for getting some job related to Python Development or Data Analysis.

In order to get an idea about Can Scipy really be helpful skill to land a Job? I just did a Linkedin Search for keyword “Scipy” and found out that there are 764 Job Roles that were listed in United States on Linked in last month. Then one by one I read descriptions of these Job Roles and guess what? I found that almost every job for Python Developer mentions that “It’s preferable to have Scipy Skill/Experience”.

So its clear that Businesses when looking for Python Developers or similar kinds of roles to this are also looking for Scipy Skills. That’s why in order to help you to land a Python Job, I have put together some commonly asked interview questions about Scipy.

Just a side note, I’m working on building a Comprehensive Interview Preparation resource for Software-related job roles, so far I’ve put together interview questions for numpy, pandas. If you want to check those out, you can click on below links. 😊😊

Now let’s get into questions/answers about Scipy.

Q. 1 – Explain Scipy Library?

Scipy is used for scientific computation. It is a collection of components and provided an eco system for scientific computing.

Q. 2 – Explain Normality Tests in Scipy?

Firstly let’s see what are Normality Tests? Normality tests are the ones that test for data distribution in the form of Gaussian, kind of checking whether a given DataSet has Gaussian Distribution or not.

Mathematically there are three Normality Tests – Shapiro-Wilk Test, D’Agostino’s K^2 Test, Anderson-Darling Test. For doing all these of these Scipy have specific functions. But in order to perform these tests on some data sets some input assumptions have to be satisfied.

Input Assumptions are – “Observations in each sample are independent and identically distributed”.

Q. 3 – Explain how to perform Correlation Tests in Scipy?

Correlation Tests are used for studying relationships between different variables of a DataSet or relation between two different DataSets. Scipy offers ready-made functions for calculating –

  1. Pearson’s Correlation Coefficient
  2. Spearman’s Rank Correlation Coefficient
  3. Kendall’s Rank Correlation Coefficient
  4. Chi-Squared Test

Let’s now see how to calculate each of these coefficients in Scipy.

Calculating Pearson’s Correlation Coefficient in Scipy

Pearson’s Correlation Coefficient tests for a linear relationship between two variables. But for using Scipy’s built-in function(scipy.stats.pearsonr(variable1, variable2)) for calculating Pearson’s Coefficient – some things about DataSet need to be assumed.

Function Input AssumptionsFunction Output Assumptions
Both variables need to be independent and identically distributedH0 – Twovariables are independent
Bothvariables need to be normally distributedH1 – Twovariables have a dependency on each other
Bothvariables need to have the same variance

Scipy Code for calculating Pearson’s Correlation Coefficient

import scipy.stats
corr, p = scipy.stats.pearsonr(variable1, variable2)

Based upon values of corr, p from above code it can be concluded whether there is a relationship between variable1 and variable2.

Calculating Spearman’s Rank Correlation Coefficient and Kendall’s Rank Correlation Coefficient in Scipy

Both Spearman’s Rank Correlation Coefficient and Kendall’s Rank Correlation Coefficient tests whether a monotonic relationship exists between two variables or not.

Here monotonic relationship means – increase in one variable results in increase of other variable or decrease in one variable results in decrease of other variable.

Function Input AssumptionsFunction Output Assumptions
Both variables need to be independent and identically distributedH0 –Twovariables are independent
Bothvariables can be rankedH1 –Twovariables have a dependency on each other

Scipy Code for calculating Spearman’s Rank Correlation Coefficient

import scipy.stats
corr, p = scipy.stats.spearmanr(variable1, variable2)

Based upon values of corr, p from above code it can be concluded whether there is a relationship between variable1 and variable2.

Scipy Code for calculating Kendall’s Rank Correlation Coefficient

import scipy.stats
corr, p = scipy.stats.kendalltau(variable1, variable2)

Based upon values of corr, p from above code it can be concluded whether there is a relationship between variable1 and variable2.

Doing Chi-Squared Test in Scipy

Chi-Squared Test is used for checking whether two categorical variables are independent or not. Certain things need to be assumed about data before using Scipy’s Buil-in function to do Chi-Square Test.

Function Input AssumptionsFunction Output Assumptions
Both variabls are independent to each otherH0 –Twovariables are independent
There need to be atleast 25 or more data points for each of category for both of variablesH1 –Twovariables have a dependency on each other

Scipy Code for doing Chi-Square Test

import scipy.stats
stat, p, dof, expected = scipy.stats.chi2_contingency(table)

Based upon values of corr, p, dof, expected from above code it can be concluded whether categorical variables in table are independent or not.

Q. 4 – Explain tests pertaining to Parametric Statistical Hypothesis Tests in Scipy?

Parametric Statistical Hypothesis Tests can be used for drawing conclusion about a Population Data Set by taking just a sample of data out of it and performing tests on sample data to calculate certain parameters, which then can be used for doing Hypothesis Testing for Population Data Set.

Most common Parametric Statistical Hypothesis Tests are –

  1. Student’s t-test
  2. Paired Student’s t-test
  3. Analysis of Variance Test(ANOVA)

Let’s now see how to perform each of these tests using Scipy.

Doing Student’s t-test in Scipy

Student’s t-test checks whether means of two independent samples taken out of same Population DataSet are significantly different or not.
But for using Scipy’s Built-in function ttest_ind(dataSample1, dataSample2) . Some things about these dataSamples need to be assumed.

Function Input AssumptionsFunction Output Assumptions
Both of the Data Samples are independent and identically distributedH0 – Means of both samples are equal
Both of the Data Samples are normally distributedH1 – Means of both samples are unequal
Both of the Data Samples have same variable

Scipy Code for doing Student’s t-test

import scipy.stats as ss
stat, p = ss.ttest_ind(dataSample1, dataSample2)

Doing Paired Student’s t-test in Scipy

Paired Student’s t-test tests whether means of two or more independent samples are significantly different or not.
But for using Scipy’s Built-in function ttest_rel(dataSample1, dataSample2). Some things about these dataSamples need to be assumed.

Function Input AssumptionsFunction Output Assumptions
Data in each samples is independent and identically distributedH0 – means of the samples are equal
Data in each samples is normally distributedH1 – means of the samples are unequal
Data in each sample have same variance
Data across each sample is paired

Scipy code for doing Student’s t-test

import scipy.stats as ss
stat, p = ss.ttest_rel(dataSample1, dataSample2)

Doing Analysis of Variance Test(ANOVA) Test in Scipy

Analysis of Variance Test(ANOVA) tests whether means of two or more independent samples are significantly different.
But for using Scipy’s Built-in function f_oneway(dataSample1, dataSample2). Some things about these dataSamples need to assumed.

Function Input AssumptionsFunction Output Assumptions
Both of dataSamples are independent and identically distributedH0 – means of the samples are equal
Both of dataSamples are normally distributedH1 – means of the samples are unequal
Both of dataSamples have same variance

Scipy code for doing Analysis of Variance Test(ANOVA)

import scipy.stats
stat, p = scipy.stats.f_oneway(dataSample1, dataSample2)

Q. 5 – Explain how to do Mann-Whitney U test, a non-parametric Statistical Hypothesis Test in Scipy?

Mann-Whitney U Test tests whether two independent samples are equal or not.
But for using Scipy’s Built-in function mannwhitneyu(dataSample1, dataSample2). Some things about these dataSamples need to be assumed.

Function Input AssumptionsFunction Output Assumptions
Both of dataSamples are independent and indetically distributedH0 – Distributions of both dataSamples are equal, so both dataSamples are equal
Both of dataSamples can be rankedH1 – Distributions of both dataSamples are not equal, so both dataSamples are not equal

Scipy code for doing Mann-Whitney U Test

import scipy.stats
stat, p = scipy.stats.mannwhitneyu(dataSample1, dataSample2)

Gagan

Hi, there I'm founder of ComputerScienceHub(Started this to bring useful Computer Science information just at one place). Personally I've been doing JavaScript, Python development since 2015(Been long) - Worked upon couple of Web Development Projects, Did some Data Science stuff using Python. Nowadays primarily I work as Freelance JavaScript Developer(Web Developer) and on side-by-side managing team of Computer Science specialists at ComputerScienceHub.io

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts