Data visualisation in Python

Ultimate guide for Python Data Visualisation Libraries

Data visualisation is the idea of showing some information in the form of visual. It helps to make sense out of data which otherwise seems un-meaningful. There are many Python libraries which can be used for visualising data, some of these are Matplotlib, Pandas, Seaborn, ggplot, Plotly.

The first step in the process of data visualisation is install library which we will be using for data visualisation and then importing that library into over workflow.

Table of Contents

    Installing Library

    All of these libraries can be install using pip which is a package management system. For example – Matplotlib can be installed by using command ‘pip install matplotlib’ similarly other libraries can also be installed.

    Importing Library into workflow

    After installing the library the second step is to import that library into workflow, which allows the programmer to access functions in the library for making visuals.

    This is the general overview of how Python’s Data Visualisation libraries can be installed and imported into workflow. For thorough walkthrough this process check Installing Python Modules.

    Importing Data sets

    The beauty of Python programming language is that it’s quite compatible meaning that only one library can be used for importing dataset into workflow. And after that we have access to many different libraries which effectively can be used for Data Visualisation. Typically for importing datasets Pandas can be used, which provides an API to do Exploratory Data Analysis on a dataset. Helping in understanding what’s inside dataset like what are the rows, columns in dataset.

    Importing famous iris dataset into workflow.

    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())

    iris dataset

    Note – As this dataset is built into the Pandas library itself so there is not need to download it from somewhere on the web.

    Using Libraries for Visualisation

    MatplotLib

    Matplotlib is the most popular python plotting library. It is a low-level library with a Matlab like interface which offers lots of freedom at the cost of having to write more code.

    Scatter Plot
    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    # create a figure and axis
    fig, ax = plt.subplots()
    
    # scatter the sepal_length against the sepal_width
    ax.scatter(iris['sepal_length'], iris['sepal_width'])
    # set a title and labels
    ax.set_title('Iris Dataset')
    ax.set_xlabel('sepal_length')
    ax.set_ylabel('sepal_width')
    Line Chart
    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    # get columns to plot
    columns = iris.columns.drop(['class'])
    # create x data
    x_data = range(0, iris.shape[0])
    # create figure and axis
    fig, ax = plt.subplots()
    # plot each column
    for column in columns:
        ax.plot(x_data, iris[column], label=column)
    # set title and legend
    ax.set_title('Iris Dataset')
    ax.legend()
    Histogram
    # create figure and axis
    fig, ax = plt.subplots()
    # plot histogram
    ax.hist(wine_reviews['points'])
    # set title and labels
    ax.set_title('Wine Review Scores')
    ax.set_xlabel('Points')
    ax.set_ylabel('Frequency')
    Wine Review Scores
    Bar Chart
    # create a figure and axis 
    fig, ax = plt.subplots() 
    # count the occurrence of each class 
    data = wine_reviews['points'].value_counts() 
    # get x and y data 
    points = data.index 
    frequency = data.values 
    # create bar chart 
    ax.bar(points, frequency) 
    # set title and labels 
    ax.set_title('Wine Review Scores') 
    ax.set_xlabel('Points') 
    ax.set_ylabel('Frequency')

    Check out this video for more through explanation about how Matplotlib can be used for Data Visualisation: –

    Pandas

    Scatter Plot
    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    iris.plot.scatter(x='sepal_length', y='sepal_width', title='Iris Dataset')
    Line Chart
    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    iris.drop(['class'], axis=1).plot.line(title='Iris Dataset')
    Bar Chart
    wine_reviews['points'].value_counts().sort_index().plot.bar()

    Check out this video for learning more about how Pandas can be used for making beautiful data visualisations.

    Seaborn

    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    sns.scatterplot(x='sepal_length', y='sepal_width', data=iris)

    Seaborn Scatter plot
    Line Chart
    import pandas as pd
    iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
    print(iris.head())
    
    sns.lineplot(data=iris.drop(['class'], axis=1))
    Seaborn line chart

    This is how data visualisation can be done easily by using Python’s libraries. If you are keen enough to go one step further in becoming An Expert Data Engineer/Data Scientist definitely read Data Visualisation With Python: Create An Impact With Meaningful Data Insights Using Interactive And Engaging Visuals by Mario Dobler.

    Talking about my personal journey of becoming an Expert Data Scientist CS Dojo have helped me a lot, his video about Data Analysis have helped me a lot.


    Related Posts