Course 9: Applied Data Analytics

Last Update: June 25,  2020
Course Objective:The objective of this course is to help students develop competences on statistical techniques needed for data analysis, and various data mining techniques and algorithms used in practical problems that require processing big data for decision making purpose.Learning Outcomes: The students on the completion of this course would be able to:

    • Apply various inferential statistical analysis techniques to describe data sets and withdraw useful conclusions from the data sets (e.g., confidence interval, hypothesis testing)
    • Apply data visualization techniques and key data mining techniques (e.g., classification analysis, associate rule learning, anomaly/outlier detection, clustering analysis, regression analysis) in dealing with big data sets
    • Implement the analytic algorithms for practical data sets
    • Perform large scale analytic projects in various industrial sectors

PrerequisiteNone

 Course Outline:

Module 1:  Basic Data Analysis

I.      Basic Concepts

    1.      Descriptive Statistics
    2.      Statistical Inferences
    3.      Data Measurement
    4.      Measures of Central Tendency and Dispersion
    5.      Common Statistical Graphs
    6.      Determination of Outliers

II.      Statistical Inferences

    1.      Point Estimation and Required Properties of Point Estimators
    2.      Interval Estimations for Mean, Proportion and Variance of Population
    3.      Sample Size Determination

III.        Hypothesis Testing

    1.      Hypothesis Testing for Mean, Proportion and Variance of Population – Single Sample Test
    2.      Hypothesis Testing for Mean, Proportion and Variance of Population – Two Samples Test
    3.      Type I and Type II Errors – Power of the Test
    4.      Observed Significance Level

Module 2:  Data Visualization

IV.       Data Visualization

    1.      Introduction to Data Visualization
    2.      Basic Charts for Numerical Data and Categorical Data
    3.      Distribution Plots
    4.      Multivariate Charts: Combo Chart, Combination Chart, Stacked Column Chart

V.       Data Dashboard

    1.      What is a Data Dashboard?
    2.      Applications and Benefits of Data Dashboard
    3.      Design and Construct a Data Dashboard

Module 3:  Key Data Mining Techniques

VI.       Regression Analysis

    1.      Linear Regression and Least Square Method
    2.      Residual Analysis
    3.      Multiple Regression
    4.      Goodness of Fit Tests

VII.       Data Classification

    1.      k-Nearest Neighbor Algorithm for Estimation and Prediction
    2.      Distance Functions: Euclidian, Manhattan, Minkowski, Min-Max Normalization, Z-Score  Standardization
    3.      Logistics Regression
    4.      Bayesian Networks
    5.      Model Evaluation Measures for Classification Task

VIII.      Data Clustering

    1.      Hierarchical Clustering Method
    2.      k-Means Clustering
    3.      Measuring Cluster Goodness: The Silhouette Method and The Pseudo-F Statistic

IX.       Association Rules

    1.      Affinity Analysis
    2.      The a Priori Algorithm – Generating Frequent Itemsets
    3.      The a Priori Algorithm – Generating Association Rules
    4.      Measure the Usefulness of Associate Rules

X.       Case Studies/Group Projects

Week Topic Workshop Learning Materials Teaching Materials Note
1 Basic Concepts MSIE-09-L-M1S1 MSIE-09-T-M1S1
2 Basic Concepts (continued) MSIE-09-T-M1S1-W01

MSIE-09-T-M1S1-W02

3 Statistical Inferences MSIE-09-T-M1S2-W01 MSIE-09-L-M1S2 MSIE-09-T-M1S2
4 Hypothesis Testing MSIE-09-L-M1S3 MSIE-09-T-M1S3
5 Hypothesis Testing (continued) MSIE-09-T-M1S3-W01
6 Data Visualization MSIE-09-L-M2S1 MSIE-09-T-M2S1
7 Data Dashboard MSIE-09-L-M2S2 MSIE-09-T-M2S2
8 Regression Analysis MSIE-09-T-M3S1-W01 MSIE-09-L-M3S1 MSIE-09-T-M3S1
9 Regression Analysis (cont.)
10 Data Classification MSIE-09-L-M3S2 MSIE-09-T-M3S2
11 Data Classification (cont.) MSIE-09-T-M3S2-W01
12 Data Clustering MSIE-09-L-M3S3 MSIE-09-T-M3S3
13 Data Clustering (cont.)
14 Association Rules MSIE-09-L-M3S4 MSIE-09-T-M3S4
15 Association Rules (cont.)


Laboratory Sessions
: None

Learning Resources:

Textbooks: No designated textbook, but class notes and handouts will be provided.

Reference Books:

    1. Larose, D.T. and Larose, C.D., Data Mining and Predictive Analytics, 2nd edition, Wiley, 2015
    2. Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl Jr., K.C., Data Mining for Business Analytics – Concepts, Techniques, and Application in R, Wiley, 2018
    3. Ankam, V., Big Data Analytics, Packt, 2016
    4. Walkowiak, S., Big Data Analytics with R, Packt, 2016
    5. Grolemund, G., Hands-on Programming with R, O’Reilly, 2014
    6. Wickham, H. and Grolemund, G., R for Data Science, O’Reilly, 2017
    7. Wexler, S., Shaffer, J. and Cotgreave, A., The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios, Wiley, 2017
    8. O’Cornor, E., Microsoft Power BI Dashboards Step by Step, Practice Files, 2019

Journals and Magazines:

    1. Management Science, Informs
    2. Journal of Supply Chain Management, Wiley
    3. Computational Statistics & Data Analysis, Elsevier
    4. Advances in Data Analysis and Classification, Springer

Teaching and Learning Methods:

The teaching is done via lectures by the instructor. Tutorial/workshop sessions are conducted on the use of tools in each subject. The learning methods include group discussion, individual/group assignment and group project/case study.

Time Distribution and Study Load:

Lectures: 30 hours

Tutorials/Group Discussions: 30 hours

Self-study: 45 hours

Group project: 40 hours

Evaluation SchemeThe final grade will be computed according to the following weight distribution: Mid-semester examination 20%, assignments and group projects 50%, final examination 30%. In final grading

An “A” would be awarded if a student shows a deep understanding of the knowledge learned through home assignments, project works, and exam results.

A “B” would be awarded if a student shows an overall understanding of all topics.

A “C” would be given if a student meets below average expectation in understanding and application of basic knowledge.

A “D” would be given if a student does not meet expectations in both understanding and application of the given knowledge.

Developer: Huynh Trung Luong (AIT), Sirorat Pattanapairoj (KKU); Komkrit Pitituek (KKU), Wimalin Laosiritaworn (CMU)