# CSCI-GA.2590 Natural Language Processing, Spring 2023

## Table of contents

## About

How can we teach machines to understand human languages so that they can answer queries, summarize dense information, or hold a conversation with us? The primary goal of this course is to provide students with the principles and state-of-the-art tools needed to solve a variety of NLP problems. We will focus on three paradigms in the field of NLP: supervised learning, pretrain-then-finetune, and the most recent large language models. Students are expected to read research papers and gain hands-on experience through coding assignments and course projects.

## Prerequisites

Students are expected to have a solid mathematics background and strong programming skills.

- Probability, statistics, linear algebra (DS-GA.1002, MATH-UA.140, MATH-UA.235)
- Algorithms and data structure (CSCI-UA.102)
- Basic knowledge in machine learning (DS-GA.1003, CSCI-UA.0473). We will not spend a significant amount of time on machine learning basics so some prior exposure to the supervised learning framework (e.g., loss functions, SGD) is expected.

## Logistics

**Lectures**: Tue 4:55pm–6:55pm, CIWW 101- Join on Zoom using the NYU account
- Zoom recordings can be found on Brightspace

**Office hours**: We will have three office hours each week: one with the instructor (lecture or general questions), one with the TA (lecture or assignment questions), and one with the graders (grading questions). Details can be found on the Staff page. You are also encouraged to ask questions on Campuswire.**Communication**: We will use Campuswire as our main communication tool for announcements and answering questions related to the lectures, assignments, and projects. The registration link is available on Brightspace.

## Grading

**Assignments (36%)**: There will be three assignments, each counting 12%.**Midterm (18%)**: There will be an online midterm on March 7.**Project (46%)**: You are required to complete a (group) project applying techniques learned in this course.

## Coursework

### Assignments

The assignments will contain both written problems and programming problems.

**Late policy**: All assignments are due at noon 12:00pm (New York time) on the due date. You have**5 late days**in total that can be distributed among the assignments. However, homeworks will not be accepted 48 hours after the deadline.**Collaboration policy**: You may discuss problems with your classmates. However, you must write up the homework solutions and the code from scratch, without referring to notes from your joint session. In your solution to each problem, you must write down the names of any person with whom you discussed the problem—this will not affect your grade.**Submission**: Assignments are submitted through Gradescope. At the beginning of the semester, you will be added to the Gradescope roster through Brightspace. Please do not register on Gradescope separately or change your email, since the rosters will be out-of-sync.**Grading**: We aim to release grades within two weeks of the submission date. Once the grades are released, you will have one week to submit any rebuttal.

### Midterm

The midterm will be hosted online through Gradescope on March 7 before the spring break. Details to follow as the date gets closer. If you have accessibility requirements, please let us know asap.

### Project

The project is an important component of this course.
It allows you to apply what you have learned to a real problem. You are asked to complete the project in a group of **1 to 5** students. Larger groups are expected to show more efforts.

**Topics**: You can choose any topics in NLP. Take a look at the ACL proceedings for inspiration. Here are some general directions:- A new algorithm or model for important problems, e.g., detecting out-of-domain examples in QA
- An application of NLP technology, e.g., identifying misinformation and disinformation online
- Analysis of a dataset, a model, or an approach, e.g., what are the caveats, pros/cons, and interpretations
- Replication of a published result (see the Reproducibility Challenge)

**Deliverables**:- Proposal (6%): You are required to submit a one-page proposal by March 21.
- Presentation (10%): Each group will give a short presentation (3 minutes) of their work followed by Q&A (1 minutes) in the last lecture.
- Report (30%): The final report is due on
**May 5**. Each group should submit a report in .pdf in the ACL templates.