Featured

Suppose you are modeling text with a HMM, What is the complexity of finding most the probable sequence of tags or states from a sequence of text using brute force algorithm?

Assume there are total states and let be the length of the largest sequence. Think how we generate text using an hMM. We first have a state sequence and from each state we emit an output. From each state, any word out of possible outcomes can be generated. Since there are states, at each possible…

What is One-Class SVM ? How to use it for anomaly detection?

One-class SVM is a variation of the SVM that can be used in an unsupervised setting for anomaly detection. Let’s say we are analyzing credit card transactions to identify fraud. We are likely to have many normal transactions and very few fraudulent transactions. Also, the next fraud transaction might be completely different from all previous…

What does the typical day of a data scientist look like ?

Being a data scientist is much more than simply churning models with lot of math! This video breaks down and explains the tasks in the typical day of a data scientist : Communicating with stake holders Analyzing data Designing the end to end data pipeline Building models Tuning models Testing and debugging Evaluating models Measuring…

Can we use the AUC Metric for a SVM Classifier ? 

What is AUC ? AUC is the area under the ROC curve. It is a popularly used classification metric. Classifiers such as logistic regression and naive bayes predict class probabilities  as the outcome instead of the predicting the labels themselves. A new data point is classified as positive if the predicted probability of positive class…

Top 50 Machine Learning Interview Questions

Whether you are kickstarting your interview preparation, or wrapping up your preparation and are looking for final touches, here are over 50 must see questions to prepare for a data science interview. We have put them in five categories for convenience. (Note: There are sevaral more questions along with answers in the main menu “Interview…

How to answer “Explain Linear Regression?”

I interviewed 100+ folks in the last few months helping with interview prep. Many were stuck on answering a basic ML concept question. Most have an intuition and understand the basic concept, probably have watched a detailed video on it in a data science course. But when it comes to articulating the concept concisely in…

Semantic Textual Similarity: Automatic Question Answering from FAQs

Semantic Textual Similarity is the task of determining how close two pieces of text are in meaning. It has many applications such as question answering, information retrieval, recommendation systems and so on.  Here is a 1 hour NLP code-along beginners video tutorial on semantic textual similarity. The session covers the task of Automatic Question Answering from…

Finding the Right Data Science Job with Online Networking

When I was graduating from University of Utah, there were not a lot of companies that used to turn up for campus placements since we had a good but a very small department with less than 20 students in MS + PhD around then. While I had a few companies that interviewed me, I felt…

What is the difference between a BarChart and a Histogram ?

A Histogram represents the distribution of a numerical variable.  A bar-chart is typically used to compare numeric values corresponding to categorical variables. To construct a histogram:  X-axis: Usually the range of values is binned. In other words, the entire range is divided into a series of intervals and each interval occupies a slot on the…

Learn Data Science and Machine Learning from Scratch

The task of transitioning to a new field is challenging ! not for the faint hearted… It is not very different from climbing a mountain ! To become a data scientist you need to learn Some math (Stats, linear algebra, optimization) Programming (preferably Python / R) The art of working with and analyzing data But…

What is the difference between a Histogram and a Pareto plot ?

A histogram is a bar graph that uses the height of the bar to convey the  frequency of an event occurring. Each bar in a histogram corresponds  to the frequency of occurrence of a specific event. A Pareto chart displays bars by the height of the bars, signifying the order of impact. It follows the Pareto philosophy (the 80/20 rule) through…