Instructor | David Rosenberg |
---|---|
Lecture | Tuesday 5:20pm–7pm, GSACL C95 (238 Thompson St.) |
Lab | Wednesday 6:45pm–7:35pm, MEYER 121 (4 Washington Pl) |
Office Hours | Instructor: Wednesdays 5:00-6:00pm CDS (60 5th Ave.), 6th floor, Room 650 |
Section Leader: Wednesdays 7:45-8:45pm, CDS (60 5th Ave.) Room C15 | |
Graders: Mondays 3:30-4:30pm CDS (60 5th Ave.), 6th floor, Room 660 |
This course covers a wide variety of topics in machine learning and statistical modeling. While mathematical methods and theoretical aspects will be covered, the primary goal is to provide students with the tools and principles needed to solve the data science problems found in practice. This course also serves as a foundation on which more specialized courses and further independent study can build.
This course was designed as part of the core curriculum for the Center for Data Science's Masters degree in Data Science. Other interested students who satisfy the prerequisites are welcome to take the class as well. This class is intended as a continuation of DS-GA-1001 Intro to Data Science, which covers some important, fundamental data science topics that may not be explicitly covered in this DS-GA class (e.g. data cleaning, cross-validation, and sampling bias).
We will use Piazza for class discussion. Rather than emailing questions to the teaching staff, please post your questions on Piazza, where they will be answered by the instructor, TAs, graders, and other students. For questions that are not specific to the class, you are also encouraged to post to Stack Overflow for programming questions and Cross Validated for statistics and machine learning questions. Please also post a link to these postings in Piazza, so others in the class can answer the questions and benefit from the answers.
Other information:
Homework (40%) + Midterm Exam (20%) + Final Exam (20%) + Project (20%)
Many homework assignments will have problems designated as “optional”. At the end of the semester, strong performance on these problems may lift the final course grade by up to half a letter grade (e.g. B+ to A- or A- to A), especially for borderline grades. You should view the optional problems primarily as a way to engage with more material, if you have the time. Along with the performance on optional problems, we will also consider significant contributions to Piazza and in-class discussions for boosting a borderline grade.
Slides | Notes | References | |
---|---|---|---|
ML Prereqs Jan 1 |
Slides |
Notes |
References
|
Slides | Notes | References | |
---|---|---|---|
Lecture Jan 23 |
Slides |
Notes |
References |
Lab Jan 24 |
Slides |
Notes |
References |
Slides | Notes | References | |
---|---|---|---|
Lecture Jan 30 |
Slides |
Notes |
References
|
Lab Jan 31 |
Slides(None) |
Notes(None) |
References
|
Slides | Notes | References | |
---|---|---|---|
Lecture Feb 6 |
Slides |
Notes |
References |
Lab Feb 7 |
Slides |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture Feb 13 |
Slides |
Notes |
References |
Lab Feb 14 |
Slides |
Notes |
References |
Slides | Notes | References | |
---|---|---|---|
Lecture Feb 20 |
Slides |
Notes |
References
|
Lab Feb 21 |
Slides |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture Feb 27 |
Slides |
Notes |
References(None) |
Lab Feb 28 |
Slides |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Midterm Exam Mar 6 |
Slides(None) |
Notes(None) |
References(None) |
Project Adviser Meetings Mar 7 |
Slides(None) |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture Mar 20 |
Slides |
Notes |
References
|
Canceled for snow Mar 21 |
Slides(None) |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture Mar 27 |
Slides |
Notes |
References
|
Lab Mar 28 |
Slides |
Notes |
References
|
Slides | Notes | References | |
---|---|---|---|
Lecture Apr 3 |
Slides |
Notes(None) |
References
|
Lab Apr 4 |
Slides |
Notes(None) |
References
|
Slides | Notes | References | |
---|---|---|---|
Lecture Apr 10 |
Slides |
Notes |
References
|
Lab Apr 11 |
Slides |
Notes |
References |
Slides | Notes | References | |
---|---|---|---|
Lecture Apr 17 |
Slides |
Notes(None) |
References |
Project Adviser Meetings Apr 18 |
Slides(None) |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture Apr 24 Video |
Slides |
Notes(None) |
References
|
Course Review Apr 25 |
Slides(None) |
Notes(None) |
References(None) |
Slides | Notes | References | |
---|---|---|---|
Lecture May 1 |
Slides |
Notes(None) |
References
|
Project advisor meetings. May 2 |
Slides(None) |
Notes(None) |
References(None) |
Late Policy: Homeworks are due at 10pm on the date specified. Homeworks will still be accepted for 48 hours after this time but will have a 20% penalty.
Collaboration Policy: You may discuss problems with your classmates. However, you must write up the homework solutions and the code from scratch, without referring to notes from your joint session. In your solution to each problem, you must write down the names of any person with whom you discussed the problem—this will not affect your grade.
Homework Submission: Homework should be submitted through Gradescope. If you have not used Gradescope before, please watch this short video: "For students: submitting homework." At the beginning of the semester, you will be added to the Gradescope class roster. This will give you access to the course page, and the assignment submission form. To submit assignments, you will need to:
Homework Feedback: Check Gradescope to get your scores on each individual problem, as well as comments on your answers. Since Gradescope cannot distinguish between required and optional problems, final homework scores, separated into required and optional parts, will be posted on NYUClasses.
The project is your opportunity for in-depth engagement with a data science problem. In job interviews, it's often your course projects that you end up discussing, so it has some importance even beyond this class. That said, it's better to pick a project that you will be able to go deep with (in terms of trying different methods, feature engineering, error analysis, etc.), than choosing a very ambitious project that requires so much setup that you will only have time to try one or two approaches.
A good project for this class is one that's a real "problem", in the sense that you have something you want to accomplish, and it's not necessarily clear from the beginning the best approach. The techiques used should be relevant to our class, so most likely you will be building a prediction system. A probabilistic model would also be acceptable, though we will not be covering these topics until later in the semester.
To be clear, the following approaches would be less than ideal:
The project proposal should be roughly 2 pages, though it can be longer if you want to include figures or sample data that will be helpful to your presentation. Your proposal should do the following:
The main objective of the project writeup is to explain what you did in a self-contained report. No strict guidelines on the format of the report, but the goal is to make it something you'd be proud to share with a potential employer. Some of the content will resemble your project proposals. Make sure to:
David is a data scientist in the office of the CTO at Bloomberg L.P. Formerly he was Chief Scientist of YP Mobile Labs at YP.
Ben is a 2017 NYU Data Science MS graduate. He currently works as a data scientist for the University of Chicago's Crime Lab New York (CLNY), where his portfolio includes several prediction problems that arise in criminal justice and social policy.
Lisa Ren (Head Grader)
Lisa is a second-year student in the Data Science program at NYU.
Utku is a second year Courant Computer Science M.Sc. student from Turkey interested in Neural Networks and their energy landscape.
Mi is a second year student in CS department at Courant.
Sanyam is a Masters student in Computer Science at NYU Courant and currently works as a Researcher in Machine Learning at the NYU Center for Data Science.
Nan is a second year student in the Data Science program at NYU.
Zemin is a second year student in the Data Science program at NYU.
Kurt is a researcher at the quantitative hedge fund PDT Partners.
Brian is Director of Data Science at Zocdoc, and he was formerly the VP of Data Science at Dstillery. He is also an Adjunct Professor of Data Science at NYU Stern School of Business.
Bonnie is VP Data Science at Pegged Software. Prior to Pegged, she was Director, Cognitive Algorithms, at IBM Research and has also served on the faculty at the New Jersey Institute of Technology.
Daniel is at the Institute for Advanced Studies in Toulouse and Toulouse School of Economics. He is a former Chair of Law and Economics at ETH Zurich (2012-2015), Duke Assistant Professor of Law, Economics, and Public Policy (2010-2012), and Kauffman Fellow at the University of Chicago Law School (2009-2010).
Vitaly is a Research Scientist at Google Research, New York.
Elliott is a Visiting Scholar at Princeton University's Woodrow Wilson School of Public Affairs and Assistant Professor of Economics at University of Warwick. His research combines methods from applied microeconometrics, natural language processing, and machine learning to provide empirical evidence on the socioeconomic impacts of legal and political institutions.
David Frohardt-Lane is a portfolio manager at 3Red Trading, overseeing a quantitative trading team. Previously he worked as a trader at GETCO for 8 years. He is a former professional gambler who has been involved sports analytics for over 15 years.