Data Science for Linguists 2021

Course home for
     LING 1340/2340

• Policies
• Term project guidelines
• Learning resources by topic
• Schedule table

Learning Resources by Topic

      This is not just a link dump. These resources are carefully curated textbook stand-ins, and you are fully expected to learn from them! There are multiple types:

  1. Online tutorials. Watch, practice and learn. I pre-screened and narrowed down to very essential & relevant contents only, so you can stop wondering if you should learn the whole thing!
  2. Articles. Read them -- they will be referenced in lectures and used in classroom discussions.
  3. Book and book chapters. Python Data Science Handbook neatly aligns with our data science focus and doubles up as a reference book. Parts of the NLTK Book will also be referenced.
  4. Software installation links. Download and install on your machine.
  5. Bookmark pages. These are lists of useful links compiled by someone else, which often contain pointers to data sets or resources. Explore them and use them as needed; you should become familiar with what's on them.
  6. References -- for looking things up.

Linguistic Data, Open Access, Data Publishing

Corpus Linguistics

Linguistic Annotation, Ontology, and Knowledge Engineering

Speech and Multimedia Data

Statistics References

Data Processing Fundamentals: Python’s numpy, pandas, and visualization libraries

Web and Social Media Mining

Machine Learning

Big Data Essentials

Advanced NLP

(We only touched on this topic in class, for your own future learning!)



Below focuses more on the software tools side of resources.

Git and GitHub


Anaconda and Jupyter Notebook

Command line, Bash/Zsh and Unix Tools

Text Editor

Speech and Multimedia Software

The topics below are not among the focus areas of this course, but parts of them will be relevant. They are provided for reference.

Natural Language Processing, NLTK, Computational Linguistics

Python References