Life is short, let Python automate your EDA

Image for post
Image for post

EDA (Exploratory Data Analysis) is one of the first steps performed on a given dataset. It helps us to understand more about our data and gives us an idea of manipulations and cleaning we might have to do. EDA can take anywhere from a few lines to a few hundred lines. In this tutorial, we will look at libraries which help us perform EDA in a few lines

Dataset

We will use the Titanic Dataset provide by Kaggle. Using Panda’s describe() method, we get the below output


I don’t use VS Code

Image for post
Image for post

Introduction

If you write articles on Medium or your personal blog, you probably have a collection of websites/software which help you produce great content. Below is a list of websites/software that I use for my articles

1. PyCharm

Although I love VS Code and would choose it over any other editor for development, copying code from VS Code to Medium is a pain.

  • When you paste your code, it doesn’t automatically create a code block (the gray block)
  • The code doesn’t respect tabs, spaces, or any indentation and format
  • For some weird reason, it adds a line gap after each line.

Below is a piece of code copied from VS Code, I have manually created a code block for better…


This article also includes ways to display your confusion matrix

Image for post
Image for post

Introduction

Accuracy, Recall, Precision, and F1 Scores are metrics that are used to evaluate the performance of a model. Although the terms might sound complex, their underlying concepts are pretty straightforward. They are based on simple formulae and can be easily calculated.

This article will go over the following wrt to each term

  • Explanation
  • Why it is relevant
  • Formula
  • Calculating it without sklearn
  • Using sklearn to calculate it

At the end of the tutorial, we will go over confusion matrices and how to present them. I have provided the link to the google colab at the end of the article.

Data 📈

Let’s assume we are classifying whether an email is spam or…


We will be building a Streamlit Web App to showcase a word cloud of Trending Google Keywords and Twitter Hashtags in 2020

Image for post
Image for post

The Link to the live app and screenshots of some of the word clouds are at the end of the article

Introduction

We will be getting our data from the following website

The above website stores archives of the trending keywords and hashtags for each day. Beautiful Soup will be used to scrape this website to get the required data. We will be building the following features

  • A 2020 word cloud
  • The ability for the user to select a date and generate a word cloud for that date
  • The ability for the user to change the image mask

Pre-Requisite

  • Basic Familiarity with Web Scraping using Beautiful…


Programming

We will be using Streamlit and pytube to build our youtube downloader web app. I will also give a brief overview of the pytube library.

Image for post
Image for post

We will be implementing the following features

  • The ability for the user to give the URL as an input
  • If available, the ability to chose between video with audio/audio download

Setup Virtual Environment

Before we start, we will need to set up and activate a virtual environment

pip install virtualenv /* Install virtual environment */ virtualenv venv /* Create a virtual environment */ 
venv/Scripts/activate /* Activate the virtual environment */

Install Required Libraries

We will need to install the following Libraries

Streamlit

pytube

Type the following command to install the…


8 ideas for your first or nth Blog Post

Image for post
Image for post

Before we move on, let’s debunk some myths about writing Data Science/ Programming articles

Myth: You Need to be a Subject Matter Expert

Truth: You just need to know a topic well enough to be able to explain it to others. It can be as simple as Linear Regression using scikit-learn or as complex as NLP. My first article published on Towards Data Science was a tutorial on Selenium for building a bot. I didn’t know much about Selenium but I knew how to scrape Google Search results and wrote a tutorial on it.

Myth: You need lots of time to maintain a…


Azure Machine Learning Studio (MLS) is a service provided by Microsoft which lets you deploy your models as a web service and consume it as a REST endpoint. It is really useful when you are trying to integrate your web app/ API with Machine Learning models

Image for post
Image for post

Introduction

In this tutorial, I will show you how to deploy your model as a web service. First, we will create a simple KNN model on the iris-dataset and then deploy it. I will also show you how to consume the deployed model and mention some points to keep in mind while deploying the model.

You can find the repository for this article over here

You will need a Microsoft Azure account for this tutorial.

Setup Virtual Environment

pip install virtualenv /* Install virtual environment */
virtualenv venv /* Create a virtual environment */
venv/Scripts/activate /* Activate the virtual environment */

Create your Model

For this tutorial, we won’t be creating a fancy model, we will be creating a simple KNN model to use on the iris dataset provided by the sklearn…


Some programming humour to get you through your week

0. When you count objects, you unknowingly start counting from 0

Array Index Meme
Array Index Meme

1. You don’t have trouble naming folders and files since you spend most of your time naming variables


Word Clouds are a cool and creative form of visualization. You don’t need to be a tech guru to generate one

Image for post
Image for post

Benefits of having a Word Cloud Image as your cover photo

  • It looks more appealing than the plain old ‘About Me’ section
  • If done right, it’s usually the first thing the reader will look at when viewing your profile and this is a great opportunity to tell the reader topics you are interested in and write about

How to create a Word Cloud

Although a quick google search of the query ‘Generate Word Clouds Online’ will result in hundreds of results, I personally recommend this website

https://wordart.com/

The image used for the word cloud is also known as the…


Stuck on a Sudoku Puzzle? Python can solve it for you!

Image for post
Image for post

Sudoku is a logic-based number-placement game. Below is an example of a sudoku puzzle from Wikipedia

About

Rahul Banerjee

Just an average computer engineering student 💻 I mostly write ‘How to’ tutorials related to Python. https://www.linkedin.com/in/rahulbanerjee2699/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store