How do you manage not to get overwhelmed by data?

It is important to get comfortable dealing with data as a data scientist. One might have done a PhD and have learnt many statistical techniques. HOWEVER: Given a problem,  first try to think how you can solve the problem – Data Science or no data science. Try to spend time visualizing data in  a different…

Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?

Runtime considerations are often important for many applications.  Typically you should look at training time and prediction time for an ML algorithm. Some common questions to ask include: Training: Do you want to train the algorithm in a batch mode? How often do you need to train? If you need to retrain your algorithm every…

How do you handle missing data in an ML algorithm ?

Missing data is caused either due to issues in data collection or sometimes, the data model could allow for missing data (for instance, the field ‘maximum credit limit on any of your cards’ might not make sense for someone who has no credit cards…). With missing data, typically the ML algorithm implementation might fail with…

With the maximum likelihood estimate are we guaranteed to find a global Optima ?

Maximum likelihood estimate finds that value of parameters that maximize the likelihood. If the likelihood is strictly concave(or negative of likelihood is strictly convex), we are guaranteed to find a unique optimum. This is usually not the case and we end up finding a local optima. Hence, the Maximum likelihood estimate usually finds a local…

What is the difference between deep learning and machine learning?

Deep learning is a subset of Machine Learning. Machine learning is the ability to build “models” that can learn automatically from data, without programming explicit rules. Machine Learning models typically have the ability to generalize to new data. Deep Learning is a field in machine learning where we build multi-layered artificial neural network models to…

What are evaluation metrics for multi-class classification problem (like positive/negative/neutral sentiment analysis)

For multiclass classification(MCC) problems, metrics  can be derived from the confusion matrix. Let $tp_i,tn_i,fp_i,fn_i$ denote the true positives, true negatives, false positives, false negatives respectively. MCC problems, usually macro and micro metrics are computed: → Micro metrics (with subscript $\mu$ in table below) are computed by summing up individual tp, tn, fp and fn to…

What is page rank algorithm ?

Quote from wikipedia: A PageRank results from a mathematical algorithm based on the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a…

You want to find food related topics in twitter – how do you go about it ?

One can use any of the topic models above to get topics. However, to direct the topics to contain food related information, specialized topic modeling algorithms are available. However, one simple way to direct the topics to food related things is : Filter tweets by a limited set of food related keywords (food, meal, dinner,…