Data Sanity Check
Updated: 20190113
Check the following items:
 missing/invalid values: e.g. more than half of all the values are empty or null
 data size: e.g. data load is 10x smaller or larger than expected, maybe storing double numbers as strings
 data volume: total count of records.
 data distribution: e.g. distribution of activity normally seen in the data is off by a factor
 cardinality: count of unique values
 uniqueness: in a field that must have all unique values, there is a duplicate
 median/percentile: 50%, 95%, 99% etc.
 modal: most frequent discrete value(for categorical variable)
ANOVA

ANOVA: Analysis of Variance, includes only one dependent variable
 there can be several error terms whereas there is only a single error term in regression.
 mainly used to determine if data from various groups have a common means or not
 MANOVA: Multivariate Analysis of Variance, includes multiple, dependent variables.
Read more: ANOVA vs MANOVA