Machine Learning

Updated: 2019-01-13

Machine Learning vs Programming

• Programming: the implementation of the logic and/or math. You need to know exactly what to do, step-by-step.
• Machine Learning: based on the observations, experiments and statistics. You provide input and output, machine will fill in the blank.

Terminology

Labels

The target, the true, the tagging of positive/negative, good/bad, car/bike/human .

$y$

Features

The input variables, the traits that describe the instance or the item.

$\mathbf{x} = \{x_1, x_2, ..., x_N\}$

Examples

The instances of the data, each instance is either a combination of the feature vector and the label, as a "labeled example":

$(\mathbf{x}, y)$

or only the features, as an "unlabeled example":

$(\mathbf{x}, ?)$

Model and Inference

Model is a "trained" function $f(\mathbf{x})$, which takes a feature vector as input, and outputs an inference $y'$.

By comparing inference $y'$ and label $y$, we can evaluate the performance of the model.

A Typical End-to-end Machine Learning System

• Data acquisition/collection: acquire data from third party, or collect from mobile, browser, devices, sensor, etc.
• Data dictionary/feature store
• Data warehouse
• Data Pipeline(ETL): move data to analytics platform(e.g. a Hadoop cluster)
• Data Prep:

• Driver Set: define the population of training/testing/validation; append proper meta data for evaluation.
• Feature Engineering: some variables are generated on the fly so can be logged; newly created variables needs to be simulated offline
• Data Sanity Check: check if data is clean and usable.
• Model Building: training and testing ML models
• Online Variable: on-the-fly(velocity), pre-generated lookups loaded in cache(aerospike, ehcache)
• Model Deployment: run models in offline batch mode or deploy to online system for real-time scoring.

Use Cases

• Risk Management And Anti-Fraud
• Precise Marketing: customer profiling, segmentation, and acquisition
• Network Security
• User Intentions

• Predict churn
• customer value: predict customer value, identify high value accounts
• inactive account reactivate: predicts the probability that an inactive account will become reactivated
• Personalization/Recommendation
• Image recognition: face recognition, OCR, autonomous vehicle
• Machine translation
• and more...

Machine Learning Tools Abstraction Layers

• one-button-click
• declarative ML framework: provides common abstractions for many different model architectures. This enables users to change the underlying model with a single line of code
• workflow/pipeline/components
• lib: tensorflow

Tutorials

Some classical Machine Learning problems.

Titanic: OneR, Naive Bayes