MLInterview

What is the difference between a Histogram and a Pareto plot ?

Posted on July 15, 2019July 15, 2019 by MLInterview

A histogram is a bar graph that uses the height of the bar to convey the frequency of an event occurring. Each bar in a histogram corresponds to the frequency of occurrence of a specific event. A Pareto chart displays bars by the height of the bars, signifying the order of impact. It follows the Pareto philosophy (the 80/20 rule) through…

What is ACID property in a database? For data analytics tasks, do you need to care about ACID properties ?

Posted on June 28, 2019July 15, 2019 by MLInterview

ACID properties are important in an RDBMS setting where operations are transnational and there are database updates involved as a part of the task. For instance a banking or an e-commerce application where real-time user data is updated typically needs an RDBMS. A data analyst typically handles structured data using query languages such as SQL. However,…

What are the different types of Joins while wrangling data?

Posted on June 28, 2019June 28, 2019 by MLInterview

Here are the different types of the JOINs in SQL: (INNER) JOIN: Returns records that have matching values in both tables LEFT (OUTER) JOIN: Returns ALL records from the left table, and the matched records from the right table RIGHT (OUTER) JOIN: Returns ALL records from the right table, and the matched records from the…

Name a few problems that data analysts typically encounter?

Posted on June 28, 2019June 28, 2019 by MLInterview

Some of the problems encountered by a data analyst are : Biased Data : Data could be biased due to the source from which it is collected. For instance, suppose you collect data to determine the winner of an electoral campaign, collecting from a specific region alone introduces one form of a bias, while collecting…

What is the difference between supervised and unsupervised learning ?

Posted on May 13, 2019 by MLInterview

In Supervised Learning the algorithm learns from labeled training data. In other words, each data point is tagged with the answer or the label the algorithm should come up with. Using such labeled data, the goal is to predict labels for new data points. The two common forms of supervised learning are classification and regression….

When are deep learning algorithms more appropriate compared to traditional machine learning algorithms?

Posted on May 13, 2019May 13, 2019 by MLInterview

Deep learning algorithms are capable of learning arbitrarily complex non-linear functions by using a deep enough and a wide enough network with the appropriate non-linear activation function. Traditional ML algorithms often require feature engineering of finding the subset of meaningful features to use. Deep learning algorithms often avoid the need for the feature engineering step….

Why do you typically see overflow and underflow when implementing an ML algorithms ?

Posted on March 5, 2019May 13, 2019 by MLInterview

A common pre-processing step is to normalize/rescale inputs so that they are not too high or low. However, even on normalized inputs, overflows and underflows can occur: Underflow: Joint probability distribution often involves multiplying small individual probabilities. Many probabilistic algorithms involve multiplying probabilities of individual data points that leads to underflow. Example : Suppose you…

How do you manage not to get overwhelmed by data?

Posted on March 1, 2019March 1, 2019 by MLInterview

It is important to get comfortable dealing with data as a data scientist. One might have done a PhD and have learnt many statistical techniques. HOWEVER: Given a problem, first try to think how you can solve the problem – Data Science or no data science. Try to spend time visualizing data in a different…

Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?

Posted on March 1, 2019March 7, 2019 by MLInterview

Runtime considerations are often important for many applications. Typically you should look at training time and prediction time for an ML algorithm. Some common questions to ask include: Training: Do you want to train the algorithm in a batch mode? How often do you need to train? If you need to retrain your algorithm every…

How do you handle missing data in an ML algorithm ?

Posted on March 1, 2019September 4, 2019 by MLInterview

Missing data is caused either due to issues in data collection or sometimes, the data model could allow for missing data (for instance, the field ‘maximum credit limit on any of your cards’ might not make sense for someone who has no credit cards…). With missing data, typically the ML algorithm implementation might fail with…

← Newer posts Older posts →