-
- Apply various inferential statistical analysis techniques to describe data sets and withdraw useful conclusions from the data sets (e.g., confidence interval, hypothesis testing)
- Apply data visualization techniques and key data mining techniques (e.g., classification analysis, associate rule learning, anomaly/outlier detection, clustering analysis, regression analysis) in dealing with big data sets
- Implement the analytic algorithms for practical data sets
- Perform large scale analytic projects in various industrial sectors
Prerequisite: None
Course Outline:
Module 1: Basic Data Analysis
I. Basic Concepts
-
- Descriptive Statistics
- Statistical Inferences
- Data Measurement
- Measures of Central Tendency and Dispersion
- Common Statistical Graphs
- Determination of Outliers
II. Statistical Inferences
-
- Point Estimation and Required Properties of Point Estimators
- Interval Estimations for Mean, Proportion and Variance of Population
- Sample Size Determination
III. Hypothesis Testing
-
- Hypothesis Testing for Mean, Proportion and Variance of Population – Single Sample Test
- Hypothesis Testing for Mean, Proportion and Variance of Population – Two Samples Test
- Type I and Type II Errors – Power of the Test
- Observed Significance Level
Module 2: Data Visualization
IV. Data Visualization
-
- Introduction to Data Visualization
- Basic Charts for Numerical Data and Categorical Data
- Distribution Plots
- Multivariate Charts: Combo Chart, Combination Chart, Stacked Column Chart
V. Data Dashboard
-
- What is a Data Dashboard?
- Applications and Benefits of Data Dashboard
- Design and Construct a Data Dashboard
Module 3: Key Data Mining Techniques
VI. Regression Analysis
-
- Linear Regression and Least Square Method
- Residual Analysis
- Multiple Regression
- Goodness of Fit Tests
VII. Data Classification
-
- k-Nearest Neighbor Algorithm for Estimation and Prediction
- Distance Functions: Euclidian, Manhattan, Minkowski, Min-Max Normalization, Z-Score Standardization
- Logistics Regression
- Bayesian Networks
- Model Evaluation Measures for Classification Task
VIII. Data Clustering
-
- Hierarchical Clustering Method
- k-Means Clustering
- Measuring Cluster Goodness: The Silhouette Method and The Pseudo-F Statistic
IX. Association Rules
-
- Affinity Analysis
- The a Priori Algorithm – Generating Frequent Itemsets
- The a Priori Algorithm – Generating Association Rules
- Measure the Usefulness of Associate Rules
X. Case Studies/Group Projects
Week | Topic | Workshop | Learning Materials | Teaching Materials | Note |
1 | Basic Concepts | MSIE-09-L-M1S1 | MSIE-09-T-M1S1 | ||
2 | Basic Concepts (continued) | MSIE-09-T-M1S1-W01 | |||
3 | Statistical Inferences | MSIE-09-T-M1S2-W01 | MSIE-09-L-M1S2 | MSIE-09-T-M1S2 | |
4 | Hypothesis Testing | MSIE-09-L-M1S3 | MSIE-09-T-M1S3 | ||
5 | Hypothesis Testing (continued) | MSIE-09-T-M1S3-W01 | |||
6 | Data Visualization | MSIE-09-L-M2S1 | MSIE-09-T-M2S1 | ||
7 | Data Dashboard | MSIE-09-L-M2S2 | MSIE-09-T-M2S2 | ||
8 | Regression Analysis | MSIE-09-T-M3S1-W01 | MSIE-09-L-M3S1 | MSIE-09-T-M3S1 | |
9 | Regression Analysis (cont.) | ||||
10 | Data Classification | MSIE-09-L-M3S2 | MSIE-09-T-M3S2 | ||
11 | Data Classification (cont.) | MSIE-09-T-M3S2-W01 | |||
12 | Data Clustering | MSIE-09-L-M3S3 | MSIE-09-T-M3S3 | ||
13 | Data Clustering (cont.) | ||||
14 | Association Rules | MSIE-09-L-M3S4 | MSIE-09-T-M3S4 | ||
15 | Association Rules (cont.) |
Laboratory Sessions: None
Learning Resources:
Textbooks: No designated textbook, but class notes and handouts will be provided.
Reference Books:
-
- Larose, D.T. and Larose, C.D., Data Mining and Predictive Analytics, 2nd edition, Wiley, 2015
- Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl Jr., K.C., Data Mining for Business Analytics – Concepts, Techniques, and Application in R, Wiley, 2018
- Ankam, V., Big Data Analytics, Packt, 2016
- Walkowiak, S., Big Data Analytics with R, Packt, 2016
- Grolemund, G., Hands-on Programming with R, O’Reilly, 2014
- Wickham, H. and Grolemund, G., R for Data Science, O’Reilly, 2017
- Wexler, S., Shaffer, J. and Cotgreave, A., The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios, Wiley, 2017
- O’Cornor, E., Microsoft Power BI Dashboards Step by Step, Practice Files, 2019
Journals and Magazines:
-
- Management Science, Informs
- Journal of Supply Chain Management, Wiley
- Computational Statistics & Data Analysis, Elsevier
- Advances in Data Analysis and Classification, Springer
Teaching and Learning Methods:
The teaching is done via lectures by the instructor. Tutorial/workshop sessions are conducted on the use of tools in each subject. The learning methods include group discussion, individual/group assignment and group project/case study.
Time Distribution and Study Load:
Lectures: 30 hours
Tutorials/Group Discussions: 30 hours
Self-study: 45 hours
Group project: 40 hours
Evaluation Scheme: The final grade will be computed according to the following weight distribution: Mid-semester examination 20%, assignments and group projects 50%, final examination 30%. In final grading
An “A” would be awarded if a student shows a deep understanding of the knowledge learned through home assignments, project works, and exam results.
A “B” would be awarded if a student shows an overall understanding of all topics.
A “C” would be given if a student meets below average expectation in understanding and application of basic knowledge.
A “D” would be given if a student does not meet expectations in both understanding and application of the given knowledge.
Developer: Huynh Trung Luong (AIT), Sirorat Pattanapairoj (KKU); Komkrit Pitituek (KKU), Wimalin Laosiritaworn (CMU)