Our lab

The Cornell Computational Linguistics Lab is a research and educational lab in the Department of Linguistics and Computing and Information Science. It is a venue for lab sessions for classes, computational dissertation research by graduate students, undergraduate research projects, and grant research.

The lab collaborates with a large group at Cornell, including faculty and students in Cognitive Science, Computer Science, Psychology, and Information Science. The Department of Computing and Information Science provides system administration support for the lab, and some computational work is done on hardware at the Department of Computer Science and the Centre for Advanced Computing .

Faculty

Graduate Students

Undergraduate Research Assistants

Lab Alumni

Projects

Students and faculty are currently working on diverse projects in computational phonetics, phonology, syntax, and semantics.

Finite-state phonology

Mats Rooth • Simone Harmath-de Lemos • Shohini Bhattasali • Anna Choi

In this project we train a finite state model to detect prosodic cues in a speech corpus. We are specifically interested in detecting stress cues in Brazilian Portuguese and Bengali and finding empirical evidence for current theoretical views.

Xenophobia and dog whistle detection in social media

Kaelyn Lamp • Marten van Schijndel

In this collaboration with the Cornell Xenophobia Meter Project, we study the linguistic properties of social media dog whistles to better identify extremist trends before they gain traction.

Models of code-switching

Marten van Schijndel • Debasmita Bhattacharya • Vinh Nguyen • Andrew Xu

In this work, we study bilingual code-switching (that is, where two languages are used interchangably in a single utterance). We are particularly interested in how information flows across code-switch boundaries; how information from a span in Language 1 can influence productions in and comprehension of spans in Language 2. We are also studying what properties influence code-switching and whether they occur mainly to ease production for the speaker or whether they mainly serve to ease comprehension for the listener.

Representation sharing in neural networks

Marten van Schijndel • Forrest Davis • Debasmita Bhattacharya • William Timkey

Much work has gone into studying which linguistic surface patterns are captured by neural networks, and in this work we are interested in studying how various surface patterns are grouped into larger linguistic abstractions within the networks and in studying how those abstractions interact. Is each instance of a linguistic phenomenon, like filler-gap, related to other instances of that phenomenon (i.e. models encode a filler-gap abstraction) or is each contextual occurrence encoded as a separate phenomenon?

Summarization as Linguistic Compression

Marten van Schijndel • Fangcong Yin • William Timkey

In this work, we conceptualize summarization as a linguistic compression task. We study how different levels of linguistic information are compressed during summarization and whether automatic summarization models learn similar compression functions. We also study how each aspect of linguistic compression is correlated with various measures of summary quality according to human raters.

Recent publications (2022, 2021, and 2020)

Here are some selected publications from recent work by faculty and graduate students:

Recent Courses:

If you are interested in computational linguistics, these classes are a great way to get started in this area:

LING 4424: Computational Linguistics

Introduction to computational linguistics. Possible topics include syntactic parsing using functional programming, logic-based computational semantics, and finite state modeling of phonology and phonetics.

LING 4434: Computational Linguistics 2

Computational Linguistics 2 - This course introduces techniques to probe for linguistic representations in neural network models of language. Centered around discussion of current research papers as well as student research projects.

LING 4429/6429: Grammar Formalisms

This course introduces different ways of "formalizing" linguistic analyses, with examples from natural language syntax. Students learn to identify recurrent themes in generative grammar, seeing how alternative conceptualizations lead to different analytical trade-offs. Using distinctions such as rule vs constraint, transformational vs. monostratal and violable vs. inviolable, students emerge better able to assess others' work in a variety of formalisms, and better able to deploy formalism in their own analyses.

LING 4485/6485: Topics in Computational Linguistics

Current topics in computational linguistics. Recent topics include computational models for Optimality Theory and finite state models.

LING 2264: Language, Mind, and Brain

An introduction to neurolinguistics, this course surveys topics such as aphasia, hemispheric lateralization and speech comprehension as they are studied via neuroimaging, intracranial recording and other methods. A key focus is the relationship between these data, linguistic theories, and more general conceptions of the mind.

Resources

Access to Cornell's G2 Computing Cluster
More than 850 Language Corpora in 60+ languages (e.g.news text, dialogue corpora, television transcripts, etc)

Downloads

English 97, BankBaseline, and PF Linear Expansion: Please contact Mats Rooth.


The Cornell Conditional Probability Calculator (CCPC): Please contact ccpc@cornell.edu.


DeepParse 2.2, DepPrint 1.1, NegraToConfig: Please contact Marisa Boston.


Computation lexicon of Modern Greek annotated with POS and lemma, Newspaper corpus of Modern Greek:
Please contact Effie Georgala.

Useful links

Linguistics department
Natural Language Processing group
Cognitive Science program
Association of Computational Linguistics
Cornell Linguistics Circle