Machine Learning Interview Questions

Last Updated: 2023-09-19

1. "What is X?"

The basic concepts.

  • Statistics questions such as what is an F-test
  • implement logistic regression training for binary classification
  • explain overfit, underfit, bias, variance and their relations
  • gradient descent
  • L1/L2 regularization
  • Bayes Theorem
  • collaborative filtering
  • dimension reduction
  • what is batch normalization ? What benefit it gives ?
  • Explain naive Bayes. (What is independent ?) Explain how to use it to build a spam filter .
  • Explain ROC. What is the curve if we do a random guess ? What is different between two points on the ROC curve ? Explain Precision-Recall-Curve. Explain confusion matrix. Explain F1-score, why do we use it ?

2. "Compare X and Y" or "Pros and Cons" of X

One step further beyond basic concepts, needs a better understanding of the topics

  • bias-variance trade-off
  • bagging vs boosting
  • Difference between a convex and non-convex solution
  • why stochastic gradient descent is appropriate for distributed training
  • how XGBoost differs from traditional GBDT, e.g. what is special about its loss function, why it needs to compute the second order derivative
  • AdaptiveBoost vs GradientBoost.

Check out the Versus page

3. Practical Questions

Needs deeper understanding of the topic or hands on experiences.

  • How do you adjust the cost parameter for the SVM regularizer
  • How to assess the quality of clustering, especially to know when you have the right number of clusters
  • How do you pick the features to use
  • model: over calibration issue

4. Design Questions

"How would you approach ..."

Question about a real world problem:

  • How would you approach the Netflix Prize?
  • How would you generate related searches on Bing?
  • How would you suggest followers on Twitter?

More Questions

  • describe how a decision tree works, from the viewpoint of "information gain". Why pruning may help ? what benefit we get from pruning a tree ?
  • What is random forest ? How to use bagging trick to make RF ? Does RF need pruning and Why ?
  • What's difference between Sigmoid and ReLu ? Their advantages and disadvantages ? (sparsity, gradient vanish , activation blow up, complexit )
  • what Optimizer you used in your DL model ? Explain AdamOpt, Momentum, SGD.
  • Explain transfer learning and fine-tune. Can you arbitrarily take out one layer from CNN model ? Why ? Can you run a CNN on different sizes of images ? Why ?
  • Explain learning rate decay, and why use it ? Explain L2 regularization, and why use it ? What's relation/difference between weight decay and L2 reg ?
  • Explain K-fold cross validation. How do you use it to train your model ?
  • Explain LR (linear regression), OLS (ordinary least square) model, and PCA. What's the difference/relation between them ?
  • Does PCA give us largest variance or smallest variance when we use it to compress data ? Explain why. Bonus question: explain Linear Discriminant Analysis and its difference from PCA.
  • If your data is corrupted by noise , how the noise affect you model, overfit or underfit ? Why ?
  • How the K value affect KNN model ? Larger K overfits or underfits ? Smaller K overfit or underfit ?
  • What is the major problem with RNN-BPTT ? How come the gradient may vanish or explode ?
  • Illustrate basic ideas of collaborative filtering , and matrix factorization
  • Compare Kmeans with Gaussian mixture. Relation and difference ?
  • why use mini-batch in training ? Why not just use SGD , or just use all training data in the whole batch when updating the gradient ? Why use momentum ( taking the history of gradients ) when we use SGD ?
  • How would you sample uniformly from a continuous stream of data? (or Randomly Pick n elements from a given array of m elements.)Reservoir Sampling.

What are the problems with feature importance in Random Forest and Gradient Boosted Tree?

  • Feature selection based on impurity reduction is biased towards preferring variables with more categories
  • With correlated features, strong features can end up with low scores