Data Science for Linguists 2019

Course home for
LING 1340/2340

HOME
Policies
Term project guidelines
Learning resources by topic
Schedule table

Class Schedule

*Class schedule is subject to revision throughout the semester.

W Date Due (before class @ 3:30pm) Topics
Tools
#To-do/Homework
Project
1 1/8 [slides] Course introduction, setup
1/10 #1 [slides] Data in linguistics
2 1/15 Homework 1: Explore linguistic data [slides] Processing linguistic data
1/17 #2 Data processing fundamentals, statistics [slides, JNB] Python's numpy library
3 1/22 #3 [JNB] Data frames with pandas
1/24 #4 [slides, JNB] More pandas, text processing, stats
4 1/29 [JNB] Stats crash course, visualization
1/31 Homework 2: Process the ETS corpus Putting it all together: HW2 review
5 2/5 #5 [JNB, JNB] To-do 5, HW2 review
2/7 Corpus linguistics, annotation [JNB, slides] HW2 wrap up, corpus concepts, building & processing
6 2/12 #6 [slides] Annotation, data standards & exchange formats
2/14 #7 Open access & data publishing [slides] Guest speaker Lauren Collister
7 2/19 #8 Data mining and machine learning [JNB, slides] Annotation, data-mining web & social media
2/21 [JNB] Regression, NB classifier, count vectors, TF-IDF
8 2/26 #9 [JNB] Classifiers continued, categorical data
2/28 #10 [JNB] Dimensionality reduction, cross-validation
9 3/5 Homework 3: Data mining & machine learning [JNB, JNB, JNB] Homework 3 review
3/7 #11 HW3 review, Bash and command line
No class: Spring break
10 3/19 Big data [slides] Command line, Bash, grep
3/21 #12 [slides] Supercomputing at CRC, SSH, command line
11 3/26 #13 [slides, Word Embeddings and Clustering] Computational efficiency, machine learning big data, word embeddings
3/28 Homework 4: Supercomputing Yelp Data Homework 4 review
12 4/2 #14 Speech & multimedia [slides] Speech data, ASR theory
4/4 #15 Speech data
13 4/9 [slides] More speech data, multimodal data
4/11 #16 Project presentations DB, TS, EC
14 4/16 EB, JS, PS
4/18 KT, MB, CM
15 4/26 No class: finals week