## Top 50 Machine Learning Interview Questions

Whether you are kickstarting your interview preparation, or wrapping up your preparation and are looking for final touches, here are over 50 must see questions to prepare for a data science interview. We have put them in five categories for convenience. (Note: There are sevaral more questions along with answers in the main menu “Interview…

## How to answer “Explain Linear Regression?”

I interviewed 100+ folks in the last few months helping with interview prep. Many were stuck on answering a basic ML concept question. Most have an intuition and understand the basic concept, probably have watched a detailed video on it in a data science course. But when it comes to articulating the concept concisely in…

## Semantic Textual Similarity: Automatic Question Answering from FAQs

Semantic Textual Similarity is the task of determining how close two pieces of text are in meaning. It has many applications such as question answering, information retrieval, recommendation systems and so on.  Here is a 1 hour NLP code-along beginners video tutorial on semantic textual similarity. The session covers the task of Automatic Question Answering from…

## What is stratified sampling and why is it important ?

Stratified sampling is a sampling method where population is divided into homogenous subgroups called strata and the right number of instances are sampled from each stratum. For further explanation visit here. This sampling is important to ensure that sampled dataset is representative of the entire population. To realise this point, consider an example of predicting…

## Suppose you build word vectors (embeddings) with each word vector having dimensions as the vocabulary size(V) and feature values as pPMI between corresponding words: What are the problems with this approach and how can you resolve them ?

Problems As the vocabulary size (V) is large, these vectors will be large in size. They will be sparse as a word may not have co-occurred with all possible words. Resolution Dimensionality Reduction using approaches like Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation. Other Matrix factorisation techniques…

## What is negative sampling when training the skip-gram model ?

Recap: Skip-Gram model tries to represent each word in a large text as a lower dimensional vector in a space of K dimensions such that similar words are closer to each other. This is achieved by training a feed-forward network where we try to predict the context words given a specific word, i.e.,     …

## What is PMI ?

PMI : Pointwise Mutual Information, is a measure of correlation between two events x and y.           As you can see from above expression, is directly proportional to the number of times both events occur together and inversely proportional to the individual counts which are in the denominator. This expression ensures high…

## Given the following two sentences, how do you determine if Teddy is a person or not? “Teddy bears are on sale!” and “Teddy Roosevelt was a great President!”

This is an example of Named Entity Recognition(NER) problem. One can build a sequence model such as an LSTM to perform this task. However, as shown in both the sentences above, forward only LSTM might fail here. Using forward only direction LSTM might result in a model which recognises Teddy as a product : “bear”, which is on…

## Say you’ve generated a language model using Bag of Words (BoW) with 1-hot encoding , and your training set has lot of sentences with the word “good” but none with the word “great”. Suppose I see sentence “Have a great day” p(great)=0.0 using this language model. How can you solve this problem leveraging the fact that good and great are similar words?

BoW with 1-hot encoding doesn’t capture the meaning of sentences, it only captures co-occurrence statistics. We need to build the language model using features which are representative of the meaning of the words. A simple solution could be to cluster the word embeddings and group synonyms into a unique token. Alternately, when a word has…