Data Science for Linguists 2023

Course home for
     LING 1340/2340

HOME
• Policies
• Term project guidelines
• Learning resources by topic
• Schedule table

Class Schedule

*Class schedule is subject to revision throughout the semester.

W Date Due (before class @ 9:45am) Topics
Tools
#To-do/Homework
Project
1 1/9 [slides] Course introduction, setup
1/11 #1 [slides] Data management and version control
1/13 #2 [slides] Linguistic datasets
2 1/18
(W)
Homework 1: Explore linguistic data [slides] Processing linguistic data
1/20
(F)
#3 Data processing fundamentals [slides, JNB] Python's numpy library
3 1/23 #4 [slides, JNB] Data frames with pandas
1/25 #5 [slides, JNB] More pandas
1/27 [JNB] Pandas wrap
4 1/30 #6 Statistics [JNB] Statistics crash course
2/1 - [JNB] Stats (ctd), visualization
2/3 Homework 2: Process the ETS corpus (1st half) [JNB] Stats wrap, visualization
5 2/6 Homework 2 (2nd half) [JNB] HW2 review
2/8 #7 Open access & data publishing, Data mining [JNB] Twitter mining
2/10 #8 (due @9am!!) Guest speaker: Dr. Lauren Collister
6 2/13 Corpora, Annotation [slides, JNB] Corpora: data formats, HW2 review
2/15 - [slides] Text data files, conversion
2/17 #9 [slides] Web mining, linguistic annotation
7 2/20 #10 [slides] Annotation continued
2/22 - [slides] Annotation wrap, HW2 revisited
2/24 Machine learning [JNB1] Regression
8 2/27 #11 [JNB2] Classifiers: count vectors, TF-IDF
3/1 - [JNB2] Continued; Naive Bayes
3/3 - [JNB3] SVC, categorical data, cross-validation
No class: Spring break
9 3/13 Homework 3: Machine Learning with ETS data (1st half) ML (ctd) [slides] GitHub collaboration, cross-validation, ML comparisons
3/15 Homework 3 (2nd half) [JNB2, JNB1] HW 3 review
3/17 #12 [JNB1, JNB3] HW 3 review
10 3/20 - [JNB3] HW3 wrap: dimensionality reduction, ensemble model
3/22 - Big data at CRC, and Machine learning (ctd), and Advanced NLP [slides] Shell, command line
3/24 [slides] Supercomputing, command line tools
11 3/27 #13 [slides] Running jobs on CRC
3/29 #14 [slides] Big data wrangling, OnDemand on CRC
3/31 - [slides, JNB1] Computational efficiency, big data wrangling on CRC
12 4/3 Homework 4 [JNB2, JNB3] Advanced NLP, Clustering & topic modeling
4/5 #15 [JNB3, JNB4] Topic modeling, grid search and parallel processing
4/7 - Speech & multimedia [slides] Speech data and corpora
13 4/10 [slides] Speech data tools, forced aligner
4/12 #16 [slides, JNB] Montreal Forced Aligner, ASR
4/14 #17 [slides] ELAN demo By Emma,
Day 0.5: Sen
14 4/17 Day 1: Alex, Moldir, Seth
4/19 Day 2: Soobin, Camryn, Mack
4/21 Day 3: Wilson, Ashley, Varun
15 4/30
(6pm)
Finals week