You might have already seen articles explaining the ‘secret’ repo with a cool ReadMe feature in GitHub but put off actually creating one. I built an app to do most of the work for you 😃
EDA (Exploratory Data Analysis) is one of the first steps performed on a given dataset. It helps us to understand more about our data and gives us an idea of manipulations and cleaning we might have to do. EDA can take anywhere from a few lines to a few hundred lines. In this tutorial, we will look at libraries which help us perform EDA in a few lines
We will use the Titanic Dataset provide by Kaggle. Using Panda’s describe() method, we get the below output
NoSQL databases are used to solve challenges faces by RDMS (Relational Database Management System), or simply put Relational Databases. Some cons of an RDMS are listed below
On the other hand, NoSQL databases can handle unstructured data and do not need a schema to be defined.
In this tutorial, we will be working with Amazon Dynamo DB. It is a type of key-value and document database NoSQL database.
When writing code, one must aim to follow the DRY Principle (Don’t Repeat Yourself). One way to avoid a repetition of code is to put chunks of code inside functions and invoke them as required.
The concept of functions in SQL is similar to other programming languages like Python. The major difference being the way they are implemented. There are two main types of user-defined functions in SQL based on the data they return:
Streamlit is a Python library that helps us develop UIs for our models without HTML/CSS/JS. Most models die inside a Jupyter notebook and are not appealing. But, using Streamlit, you can create a clean UI for your model and showcase it to others. Building a UI lets users use your model in a more user-friendly format.
Seaborn is an open-source Python library built on top of matplotlib. It is used for data visualization and exploratory data analysis. Seaborn works easily with dataframes and the Pandas library. The graphs created can also be customized easily. Below are a few benefits of Data Visualization.
Graphs can help us find data trends that are useful in any machine learning or forecasting project.
If you write articles on Medium or your personal blog, you probably have a collection of websites/software which help you produce great content. Below is a list of websites/software that I use for my articles
Although I love VS Code and would choose it over any other editor for development, copying code from VS Code to Medium is a pain.
Accuracy, Recall, Precision, and F1 Scores are metrics that are used to evaluate the performance of a model. Although the terms might sound complex, their underlying concepts are pretty straightforward. They are based on simple formulae and can be easily calculated.
This article will go over the following wrt to each term
At the end of the tutorial, we will go over confusion matrices and how to present them. I have provided the link to the google colab at the end of the article.
The Link to the live app and screenshots of some of the word clouds are at the end of the article
We will be getting our data from the following website
The above website stores archives of the trending keywords and hashtags for each day. Beautiful Soup will be used to scrape this website to get the required data. We will be building the following features
We will be using Streamlit and pytube to build our youtube downloader web app. I will also give a brief overview of the pytube library.
We will be implementing the following features
Before we start, we will need to set up and activate a virtual environment
pip install virtualenv /* Install virtual environment */ virtualenv venv /* Create a virtual environment */
venv/Scripts/activate /* Activate the virtual environment */
We will need to install the following Libraries