Data Science for Linguists 2025

Course home for
     LING 1340/2340

HOME
• Policies
• Term project guidelines
• Learning resources by topic
• Schedule table

Class Schedule

*Class schedule is subject to revision throughout the semester.

W Date Due (before class @ 12:45pm) Topics
Tools
#To-do/Homework
Project
1 1/8 (W) [slides] Course introduction, setup
1/10 #1 [slides] Data management and version control
2 1/13 #2 [slides] Linguistic datasets
1/15 Homework 1: Explore linguistic data [slides] Processing linguistic data
1/17 #3 Data processing fundamentals [slides, JNB] Python's numpy library
3 1/22
(W)
#4 [slides, JNB] Data frames with pandas
1/24
(F)
#5 [slides, JNB] More pandas
4 1/27 [JNB] Pandas wrap
1/29 #6 Statistics [JNB] Statistics crash course
1/31 Homework 2: Process the ETS corpus (partial) [JNB] More stats, visualization
5 2/3 Homework 2: Process the ETS corpus (2nd) [JNB] Stats wrap
2/5 Homework 2: final submission [JNB] HW2 review
2/7 #7 HW2 review continued
6 2/10 #8 Open access & data publishing Linguistic data sharing: discussion
2/12 Corpora, Annotation, Web & social media mining [slides] Linguistic annotation projects
2/14 #9 [slides] ELAN for APLS (guest presentation by Maya Asher)
7 2/17 #10 [slides] Linguistic annotation
2/19 #11 [slides] Data formats
2/21 - [slides] Social media and web mining, data formats and conversion
8 2/24 #12 Machine learning [JNB1] Regression
2/26 [JNB2] Classifiers: KNN, Naive Bayes, count vectors, TF-IDF
2/28 - [JNB2] Pipelines, confusion matrix, feature weights
No class: Spring break
9 3/10 - ML (ctd) [JNB3] Categorical data, SVM, cross-validation
3/12 Homework 3: Machine Learning with ETS data (partial) [JNB2] HW3 review: Task2
3/14 Homework 3: final submission [JNB1, slides] HW3 review: Task1, GitHub collaboration
10 3/17 #13 [JNB1, JNB3] HW3 review: Task1, Task3
3/19 - [JNB3] HW3 review: Task3, dimensionality reduction, ensemble models
3/21 Big data at CRC, and Machine learning (ctd), and Advanced NLP [slides] Shell, command-line tools
11 3/24 #14 [slides] Supercomputing, running jobs on CRC
3/26 #15 [slides] Big data wrangling, OnDemand on CRC
3/28 - [slides, JNB1, JNB2] Computational efficiency, clustering
12 3/31 Homework 4: Supercomputing Yelp Data [JNB2, JNB3] Topic modeling; grid search & parallel processing
4/2 #16 [JNB4] Advanced NLP: spaCy, Stanza
4/4 - Speech & multimedia [slides, JNB] Speech sounds: IPA, phonological features
13 4/7 [slides] Speech data and corpora, tools
4/9 #17 [slides] Montreal Forced Aligner
4/11 - [slides, JNB] ASR
14 4/14 [slides, JNB] Riley presentation on SQL, Day 1: QF
4/16 Day 2: AB, CM
4/18 Day 3: SR, JH
15 4/21 Day 4: JB, LC
4/29 (Tue) Finals week