Featured

## Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?

Assume there are total states and let be the length of the largest sequence. Think how we generate text using an hMM. We first have a state sequence and from each state we emit an output. From each state, any word out of possible outcomes can be generated. Since there are states, at each possible…

## What is the difference between a BarChart and a Histogram ?

A Histogram represents the distribution of a numerical variable.  A bar-chart is typically used to compare numeric values corresponding to categorical variables. To construct a histogram:  X-axis: Usually the range of values is binned. In other words, the entire range is divided into a series of intervals and each interval occupies a slot on the…

## Learn Data Science and Machine Learning from Scratch

The task of transitioning to a new field is challenging ! not for the faint hearted… It is not very different from climbing a mountain ! To become a data scientist you need to learn Some math (Stats, linear algebra, optimization) Programming (preferably Python / R) The art of working with and analyzing data But…

## What is the difference between a Histogram and a Pareto plot ?

A histogram is a bar graph that uses the height of the bar to convey the  frequency of an event occurring. Each bar in a histogram corresponds  to the frequency of occurrence of a specific event. A Pareto chart displays bars by the height of the bars, signifying the order of impact. It follows the Pareto philosophy (the 80/20 rule) through…

## What is ACID property in a database? For data analytics tasks, do you need to care about ACID properties ?

ACID properties are important in an RDBMS setting where operations are transnational and there are database updates involved as a part of the task. For instance a banking or an e-commerce application where real-time user data is updated typically needs an RDBMS. A data analyst typically handles structured data using query languages such as SQL. However,…

## What are the different types of Joins while wrangling data?

Here are the different types of the JOINs in SQL: (INNER) JOIN: Returns records that have matching values in both tables LEFT (OUTER) JOIN: Returns ALL records from the left table, and the matched records from the right table RIGHT (OUTER) JOIN: Returns ALL records from the right table, and the matched records from the…

## Name a few problems that data analysts typically encounter?

Some of the problems encountered by a data analyst are :  Biased Data : Data could be biased due to the source from which it is collected. For instance, suppose you collect data to determine the winner of an electoral campaign, collecting from a specific region alone introduces one form of a bias, while collecting…

## What is the difference between supervised and unsupervised learning ?

In Supervised Learning the algorithm learns from labeled training data. In other words, each data point is tagged with the answer or the label the algorithm should come up with. Using such labeled data, the goal is to predict labels for new data points. The two common forms of supervised learning are classification and regression….

## When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?

Deep learning algorithms are capable of learning arbitrarily complex non-linear functions by using a deep enough and a wide enough network with the appropriate non-linear activation function. Traditional ML algorithms often require feature engineering of finding the subset of meaningful features to use. Deep learning algorithms often avoid the need for the feature engineering step….