Data Warehousing and Data Mining
- Site Owner |
- Tigabu Akal |
The Course is about discovering hidden pattern (knowledge) from a given data warehouse or data source using different data mining functionalities for research and business applications
Course Information
Objectives
On successful completion of the Course students will be able to:
- Understand the concept of data warehouse and data mining
- Understand the different data mining functionalities: Association, Classification, Clustering, etc
- Understand the data warehouse operations: Slicing, dicing, pivoting, rolling up, rolling down, etc
- Understand and use data mining modeling techniques such as CRISP-DM
- Develop skill on how to measure performance of data mining system
- Develop skill to measure the goodness of the data set for decision making
- Develop confidence in doing research in the area of data mining and data warehousing
- Develop and test data mining systems
- Develop team work spirit
Course Content
Course Content
1. Overview
- Brief description of data mining
- Data warehousing, data mining and database technology
- Online Transaction processing and data mining
2. Data warehousing
- Design
- Tools
- Operations
- Issues
3. Data Preprocessing
- Data cleaning
- Data integration
- Data reduction
- Data transformation and data discretion
4. Classification rule Mining
- Description
- Principle
- Design
- Algorithm
- Rule evaluation
5. Clustering
- Description
- Principle
- Design
- Algorithm
- Result Analysis
6. Association rule Mining
- Description
- Principle
- Design
- Algorithm
- Rule evaluation
Methodology
This course will be offered through lectures, presentations, class discussions, laboratory work and Group project work. Students present their assignments, and get feedbacks.
References
- J. Han and M. Kamber with tile Data Mining Concepts and Techniques, 2nd edition
- Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurasamy, "Advances in Knowledge Discovery and Data Mining", AAAI Press/ The MIT Press, 1996.
- J. Ross Quinlan, "C4.5: Programs for Machine Learning", Morgan Kaufmann Publishers, 1993.
- Michael Berry and Gordon Linoff, "Data Mining Techniques (For Marketing, Sales, and
- Customer Support), John Wiley & Sons, 1997.
- Sholom M. Weiss and Nitin Indurkhya, "Predictive Data Mining: A Practical Guide", Morgan Kaufmann Publishers, 1998.
- Alex Freitas and Simon Lavington, "Mining Very Large Databases with Parallel
- Processing", Kluwer Academic Publishers, 1998.
- A. K. Jain and R. C. Dubes, "Algorithms for Clustering Data", Prentice Hall, 1988.
- V. Cherkassky and F. Mulier, "Learning From Data", John Wiley & Sons, 1998.
Assessment
Method of Assessment
The detail of the evaluation criteria and their percentage share is shown below:
- Researching data warehouse architecture (individual work): 5%
- Critical review of data mining and data warehouse paper (group work) and make presentation: 20%
- Write a report (concept note) on selected topics (group work) and make presentation: 20%
- Online Oral Examination (Questions with lottery based): 10%
- Final Project Work (Individual): 45%
Final Project Description (45%):
- The goal of this project is to learn how to uncover the hidden knowledge within the dataset using Python or Weka or other data analysis tool and report findings.
- Requirement: What you need to do for this project is:
- dataset with 10 or more attributes & at least 2500 entries. (https://archive.ics.uci.edu/ml/datasets.php)
- Highly recommended to take datasets from the Ethiopian industries.
- the dataset if there are any incomplete, erroneous, missing values.
- the task (classification, clustering, association rule or a combination of them) you intend to implement in Python or Weka or other tool or programming language. Note: Use two or more algorithms to run the dataset selected.
- dataset with 10 or more attributes & at least 2500 entries. (https://archive.ics.uci.edu/ml/datasets.php)
- Project Report
Prepare a publishable paper that has abstract (1 page), introduction, problem statement & objective (1-2 pages), literature review (3 pages), methods (1 page) and experimentation (discuss any preprocessing, model creation, test results, findings) (2-3 pages), concluding remarks & recommendation (1 page) and References (1 page).
Coaches
Site Owner
Tigabu Akal