EDA (Exploratory Data Analysis) is one of the first steps performed on a given dataset. It helps us to understand more about our data and gives us an idea of manipulations and cleaning we might have to do. EDA can take anywhere from a few lines to a few hundred lines. In this tutorial, we will look at libraries which help us perform EDA in a few lines
We will use the Titanic Dataset provide by Kaggle. Using Panda’s describe() method, we get the below output
If you write articles on Medium or your personal blog, you probably have a collection of websites/software which help you produce great content. Below is a list of websites/software that I use for my articles
Although I love VS Code and would choose it over any other editor for development, copying code from VS Code to Medium is a pain.
Below is a piece of code copied from VS Code, I have manually created a code block for better…
Accuracy, Recall, Precision, and F1 Scores are metrics that are used to evaluate the performance of a model. Although the terms might sound complex, their underlying concepts are pretty straightforward. They are based on simple formulae and can be easily calculated.
This article will go over the following wrt to each term
At the end of the tutorial, we will go over confusion matrices and how to present them. I have provided the link to the google colab at the end of the article.
Let’s assume we are classifying whether an email is spam or…
The Link to the live app and screenshots of some of the word clouds are at the end of the article
We will be getting our data from the following website
The above website stores archives of the trending keywords and hashtags for each day. Beautiful Soup will be used to scrape this website to get the required data. We will be building the following features
We will be using Streamlit and pytube to build our youtube downloader web app. I will also give a brief overview of the pytube library.
We will be implementing the following features
Before we start, we will need to set up and activate a virtual environment
pip install virtualenv /* Install virtual environment */ virtualenv venv /* Create a virtual environment */
venv/Scripts/activate /* Activate the virtual environment */
We will need to install the following Libraries
Type the following command to install the…
Before we move on, let’s debunk some myths about writing Data Science/ Programming articles
Myth: You Need to be a Subject Matter Expert
Truth: You just need to know a topic well enough to be able to explain it to others. It can be as simple as Linear Regression using scikit-learn or as complex as NLP. My first article published on Towards Data Science was a tutorial on Selenium for building a bot. I didn’t know much about Selenium but I knew how to scrape Google Search results and wrote a tutorial on it.
Myth: You need lots of time to maintain a…
In this tutorial, I will show you how to deploy your model as a web service. First, we will create a simple KNN model on the iris-dataset and then deploy it. I will also show you how to consume the deployed model and mention some points to keep in mind while deploying the model.
You can find the repository for this article over here
You will need a Microsoft Azure account for this tutorial.
pip install virtualenv /* Install virtual environment */
virtualenv venv /* Create a virtual environment */
venv/Scripts/activate /* Activate the virtual environment */
For this tutorial, we won’t be creating a fancy model, we will be creating a simple KNN model to use on the iris dataset provided by the sklearn…
Word Clouds are a cool and creative form of visualization. You don’t need to be a tech guru to generate one
Although a quick google search of the query ‘Generate Word Clouds Online’ will result in hundreds of results, I personally recommend this website
The image used for the word cloud is also known as the…
Sudoku is a logic-based number-placement game. Below is an example of a sudoku puzzle from Wikipedia
About