Dr. Min Chi
CSC 522 Automated Learning and Data Analysis
3 Credit Hours
Introduction to the problems and techniques for automated discovery of knowledge in databases. Topics include representation, evaluation, and formalization of knowledge for discovery; classification, prediction, clustering, and association methods. Selected applications in commerce, security, and bioinformatics. Students cannot get credit for both CSC 422 and CSC 522.
CSC 226 Applied Discrete Mathematics or LOG 201 Logic, ST 370 Probability and Statistics for Engineers, MA 305 Introductory Linear Algebra and Matrices, or equivalent classes.
The aim of this class is to introduce student to concepts and methods of data mining. Upon completion, you will be able to:
- recognize and describe the major types of data
- identify proper methods for data preprocessing, exploration, summarization, and visualization
- explain representative data mining applications
- describe and perform common data mining tasks
- design, implement, and evaluate strategies for solving real-world data mining problems
- master R or Matlab software for statistical computing
Course topics include:
- describing data
- data preprocessing
- data exploration and visualization
- association analysis
- cluster analysis
- association analysis
This is an approximate list. We will introduce additional topics as warranted by student interest and time; we will omit topics as time expires.
The coursework consists of lectures, readings, homework assignments, exams, and a group project. The exact grading scheme has not been determined yet, but exams will account for ~45% of the grade. We will have one midterm and a final exam. In addition, there will be four homework assignments (accounting together for ~20%), and each student must plan and implement a data mining project (accounting for ~35%). Groups of up to four students are allowed to collaborate in the data mining project.
Homework assignments and project reports must be submitted in printed form electronically. Submissions must be pdf files without any scanned handwritten parts. Late submission of an assignment will result in a 10%/40% point reduction on the first/second day after the due date. No credit will be given for submissions that are three or more days late.
Introduction to Data Mining (2nd Edition) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Tan Pang-Ning, Steinbach Michael Paperback, 792 Pages, Published 2013
ISBN-10: 0-13-312890-3 / 0133128903
ISBN-13: 978-0-13-312890-1 / 9780133128901