Cloud / Distributed Systems - Data Warehouse
- Amazon Redshift: based on ParAccel(initially based on PostgreSQL, columnar)
- Google BigQuery: based on Google's Dremel, battle tested inside Google for years.
- Microsoft Azure SQL Data Warehouse
Hive can be used as Data Warehouse, to store huge amount of data. However other compute engines like Presto are often used to accelerate queries.
- logging and messaging system(kafka)
- scraping(dumping) data from DB(e.g. MySQL) to DW(e.g. Hive); dump to a staging table, then copy to the target table
The fact table contains business facts (or measures), and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables.
Contrary to fact tables, dimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text). These attributes are designed to serve two critical purposes: query constraining and/or filtering, and query result set labeling.