# Machine Learning / Data Science Interview Questions

Updated: 2021-12-10

## 1. "What is X?"

The basic concepts.

- Statistics questions such as what is an F-test
- implement logistic regression training for binary classification
- explain overfit, underfit, bias, variance and their relations
- gradient descent
- L1/L2 regularization
- Bayes Theorem
- collaborative filtering
- dimension reduction
- what is batch normalization ? What benefit it gives ?
- Explain naive Bayes. (What is independent ?) Explain how to use it to build a spam filter .
- Explain ROC. What is the curve if we do a random guess ? What is different between two points on the ROC curve ? Explain Precision-Recall-Curve. Explain confusion matrix. Explain F1-score, why do we use it ?

## 2. "Compare X and Y" or "Pros and Cons" of X

One step further beyond basic concepts, needs a better understanding of the topics

- bias-variance trade-off
- bagging vs boosting
- Difference between a convex and non-convex solution
- why stochastic gradient descent is appropriate for distributed training
- how XGBoost differs from traditional GBDT, e.g. what is special about its loss function, why it needs to compute the second order derivative
- AdaptiveBoost vs GradientBoost.

Check out the Versus page

## 3. Practical Questions

Needs deeper understanding of the topic or hands on experiences.

- How do you adjust the cost parameter for the SVM regularizer
- How to assess the quality of clustering, especially to know when you have the right number of clusters
- How do you pick the features to use
- model: over calibration issue

## 4. Design Questions

"How would you approach ..."

Question about a real world problem:

- How would you approach the Netflix Prize?
- How would you generate related searches on Bing?
- How would you suggest followers on Twitter?

## More Questions

- describe how a decision tree works, from the viewpoint of "information gain". Why pruning may help ? what benefit we get from pruning a tree ?
- What is random forest ? How to use bagging trick to make RF ? Does RF need pruning and Why ?
- What's difference between Sigmoid and ReLu ? Their advantages and disadvantages ? (sparsity, gradient vanish , activation blow up, complexit )
- what Optimizer you used in your DL model ? Explain AdamOpt, Momentum, SGD.
- Explain transfer learning and fine-tune. Can you arbitrarily take out one layer from CNN model ? Why ? Can you run a CNN on different sizes of images ? Why ?
- Explain learning rate decay, and why use it ? Explain L2 regularization, and why use it ? What's relation/difference between weight decay and L2 reg ?
- Explain K-fold cross validation. How do you use it to train your model ?
- Explain LR (linear regression), OLS (ordinary least square) model, and PCA. What's the difference/relation between them ?
- Does PCA give us largest variance or smallest variance when we use it to compress data ? Explain why. Bonus question: explain Linear Discriminant Analysis and its difference from PCA.
- If your data is corrupted by noise , how the noise affect you model, overfit or underfit ? Why ?
- How the K value affect KNN model ? Larger K overfits or underfits ? Smaller K overfit or underfit ?
- What is the major problem with RNN-BPTT ? How come the gradient may vanish or explode ?
- Illustrate basic ideas of collaborative filtering , and matrix factorization
- Compare Kmeans with Gaussian mixture. Relation and difference ?
- why use mini-batch in training ? Why not just use SGD , or just use all training data in the whole batch when updating the gradient ? Why use momentum ( taking the history of gradients ) when we use SGD ?
- How would you sample uniformly from a continuous stream of data? (or Randomly Pick n elements from a given array of m elements.)Reservoir Sampling.

### What are the problems with feature importance in Random Forest and Gradient Boosted Tree?

- Feature selection based on impurity reduction is biased towards preferring variables with more categories
- With correlated features, strong features can end up with low scores