Instructor

Dr. Munindar Singh

Dr. Munindar Singh

Computer Science

Phone: 919-515-5677
Fax: 919-515-7896
Email: singh@ncsu.edu (preferred means of communication)
Instructor Website
Research Website

CSC 791 603 Natural Language Processing

3 Credit Hours

This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques.

This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic (phrase-structure) and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing.

The course provides the necessary background in linguistics and artificial intelligence. This course is suitable for high-performing students who are willing and able to learn abstract concepts, complete programming assignments, develop a student-selected project, and produce a term paper.

Ordinarily, the term paper would describe a research topic based on the project. The term paper could instead be a substantial review of the literature on some specific aspect of NLP or be an original contribution.

Please discuss (with me and any concerned faculty member) any potential overlap of your project and term paper with your other work; also report any overlap within your project report and term paper. Such overlap is acceptable as long as there is an assurance that the work performed for uniquely for this course is substantial.

Prerequisite

The course is self-contained. The main informal prerequisite is maturity in thinking about subtle concepts, such as might be gained through experience with conceptual modeling in databases or software.

Prior encounters with AI (knowledge representation and machine learning) or data science will help but aren’t necessary.

From long experience, I have discovered that the material in CSC 226 is essential for my courses. Here is a (partial) list of topics that will be assumed: elementary set theory, relations, partial orders, functions, concept of a theorem, propositional logic, and predicate logic.

I recommend you brush up on these topics if you aren’t comfortable with them. These topics are covered in CSC 226: Applied Discrete Mathematics. You may review Chapters 1 to 6 from the following book, which is sometimes used as the CSC 226 textbook:

Kenneth H. Rosen, Discrete Mathematics and its Applications, McGraw-Hill, 7th edition, 2012. ISBN 0-07-289905-0.

Course Objectives

Upon completion of this course, students will be able to do the following.

  • Understand the capabilities, limitations, and promise of NLP.
  • Understand and apply concepts of computational models underlying NLP
  • Develop software to carry out NLP tasks based on leading existing libraries for natural language processing and machine learning.
  • Design and implement a new NLP application bringing together known techniques with creative ideas of their own.
  • Evaluate emerging NLP concepts and techniques with respect to key principles of language and artificial intelligence.

Course Requirements

Readings and software

  • Required: Readings linked from the schedule page. Available openly, through the university library, or from the course website.
  • All necessary software will be available for use on NCSU laboratory computers or for free download for academic purposes.

Topics

The tentative schedule lists the main topics of this course.

Grading

I will assign +/- grades. There will be a fair amount of work—please plan to spend about eight hours (plus time in class) each week.

Exams:   30%
Programming:   60%
Homework:    5%
Message Board Participation:    5%
Term Paper:   10%
Total  110%

The following programming assignments jointly add up to the programming component of the course grade in the above table. The weights of the assignments are based on their expected complexity. I may change the weights as the semester progresses.

Assignment Weight
TBD 115%
TBD 215%
Project report (RO)10%
Project report (R1)10%
Project report (R2)10%
Project report (R3) and demo40%

CSC 791 students must submit a term paper worth approximately 9% of the total grade for them. A general-purpose rubric for term papers is here. However, if you base your term paper on the same topic as your semester project, you can submit it merely by extending your final project report by two pages. The project-based option will turn out to be far less work for you, in general.

Textbook

Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd edition (in draft form), Prentice Hall, ISBN not known.

The draft book can be downloaded from https://web.stanford.edu/~jurafsky/slp3/