Data Science for Linguists 2024

Course home for
     LING 1340/2340

HOME
• Policies
• Term project guidelines
• Learning resources by topic
• Schedule table

Class Schedule

*Class schedule is subject to revision throughout the semester.

W Date Due (before class @ 10:45am) Topics
Tools
#To-do/Homework
Project
1 1/8 [slides] Course introduction, setup
1/10 #1 [slides] Data management and version control
1/12 #2 [slides] Linguistic datasets
2 1/17
(W)
Homework 1: Explore linguistic data [slides] Processing linguistic data
1/19
(F)
#3 Data processing fundamentals [slides, JNB] Python's numpy library
3 1/22 #4 [slides, JNB] Data frames with pandas
1/24 #5 [slides, JNB] More pandas
1/26 [JNB] Pandas wrap
4 1/29 #6 Statistics [JNB] Statistics crash course
1/31 - [JNB] Stats (ctd), visualization
2/2 Homework 2: Process the ETS corpus (part 1) [JNB] Stats wrap, visualization
5 2/5 Homework 2 (part 2) [JNB] HW2 review
2/7 #7 HW2 review continued
2/9 #8 (due @9:30am!!) Open access & data publishing, Data mining Guest speaker: Dominic Bordelon (Pitt Library)
6 2/12 [slides] Corpus data formats, conversion
2/14 - Corpora, Annotation [slides, JNB] Formats, social media and web mining
2/16 #9 [slides] Web mining, linguistic annotation
7 2/19 #10 [slides] Annotation continued
2/21 - [slides] Annotation wrap
2/23 Machine learning [JNB1] Regression
8 2/26 #11 [JNB1, JNB2] Classifiers: count vectors, TF-IDF
2/28 - [JNB2, JNB3] Naive Bayes, pipelines, categorical data
3/1 - [JNB3] SVC, categorical data, cross-validation
9 3/4 Homework 3: Machine Learning with ETS data ML (ctd) [slides] GitHub collaboration, cross-validation, feature weights
3/6 - [JNB2, JNB1] HW 3 review
3/8 #12 [JNB1, JNB3] HW 3 review
No class: Spring break
10 3/18 - ML (ctd) [JNB3] HW3 wrap: dimensionality reduction, ensemble model
3/20 - Big data at CRC, and Machine learning (ctd), and Advanced NLP [slides] Shell, command line
3/22 - [slides] Command line tools
11 3/25 #13 [slides] Supercomputing, running jobs on CRC
3/27 #14 [slides] Big data wrangling, OnDemand on CRC
3/29 - [slides, JNB1, JNB2] Computational efficiency, big data wrangling on CRC, advanced NLP
12 4/1 Homework 4 [JNB3, JNB4] Clustering & topic modeling; grid search & parallel processing
4/3 #15 [slides] Text generation with TensorFlow by Ashley Feiler
4/5 - Speech & multimedia [slides] Speech data and corpora
13 4/8 [slides] Speech data tools, forced aligner
4/10 -- [slides, JNB] Montreal Forced Aligner, ASR
4/12 -- [slides] ELAN for APLS by Maya Asher
14 4/15 RH
4/17 MA, DA
4/19 TD, MP
15 4/28 (11pm) Finals week