Dr. Nagiza Samatova
CSC 591 615 Graph Data Mining
3 Credit Hours
Graph data mining is a growing area of Big Data Analytics due to the ubiquitous nature of graph data. The discovery and forecasting of insightful patterns from graph data are at the core of analytical intelligence in government, industry, and science. This course teaches both basic and more advanced techniques required for routine analytical intelligence operations on graph data. Students will be exposed to the underlying theory and learn to design effective and efficient algorithms and data structures for dealing with huge volumes of complex and noisy graph data, as well as real-world applications.
Undergraduate level knowledge of statistics, linear algebra, and programming in C/C++ and Java. Basic knowledge of data mining and machine learning concepts, as well as R, SAS, SPSS, or Matlab, is a plus. Otherwise, consent of the instructor is required.
The instructor will teach all the course lectures. Lecturing sessions will be alternated with highly interactive, hands-on, in-class exercises by student teams. After-class learning activities will involve:
- Tutorials: Two tutorials on industry-used data mining toolkits (e.g., R, SAS, SPSS, Matlab).
- Programming Projects: Six projects from real-world applications, including, but not limited to, social network analysis, sentiment analysis, recommender systems, network and homeland security intelligence, medical diagnostics and healthcare. The projects will be implemented using R, SAS, SPSS, or Matlab, and demonstrated in various infrastructure environments (cloud computing, supercomputing, graph databases).
- Homeworks: Biweekly homeworks to practice graph theoretical concepts, design and analysis of graph mining algorithms, as well as design and evaluation of graph data structures.
- Quizzes & Exams: In-class pop-up quizzes and two home-take exams.
Part I: Static Graphs: Advanced theoretical and algorithmic knowledge of graph mining techniques for
discovery and prediction of frequent and anomalous patterns in graph data using techniques of link analysis,
cluster analysis, community detection, graph-based classification, and anomaly detection. Key strategies for
effective graph mining, including kernalization, random walks, graph-based dimension reduction, and graph
properties (e.g., centrality metrics).
Part II: Time-evolving Graphs: Intermediate knowledge and experience with dynamic network models,
percolation networks, network resilience and robustness, with applications to Internet vulnerability and traffic
on complex networks, epidemic/viral spreading, social networking and collective behavior, such as social
influence, rumor propagation, and opinion mining.
Part III: Graphical Models and Association Networks: Intermediate knowledge of probabilistic
graphical models (Hidden Markov Models and Conditional Random Fields) and association networks inference
using penalized linear regression methods, with applications to optical character recognition, automatic
inference of semantic networks from unstructured text, inference of association networks for the number of
crimes in a community, etc.
Part IV: Big Graph Data Analytics: Basic knowledge and experience in dealing with Big Data in cloud
computing (Hadoop, MapReduce, Hive), supercomputing (OpenMP, MPI, parallel R) and large graph
databases (SPARQL, RDF) infrastructure environments.
By the end of the course, students will have gained knowledge of different methods and techniques for graph mining and analytics, will be able to critically analyze the pros and cons of applying these techniques in different contexts, and will be aware of the applications that require such graph analytic techniques. Finally, students will be able to conceptualize and design efficient and effective graph mining solutions for different real world problems.
Furthermore, graph mining and analytics is a hot topic for many companies like Netflix, Facebook and Google, and the course projects will be good additions to the students’ resume.