Our lab

The Cornell Computational Linguistics Lab is a research and educational lab in the Department of Linguistics and the Department of Computing and Information Science. It is a venue for lab sessions for classes, computational dissertation research by graduate students, undergraduate research projects, and grant research.

The lab collaborates with a large group at Cornell, including faculty and students in Cognitive Science, Computer Science, Psychology, and Information Science. The Department of Computing and Information Science provides system administration support for the lab, and some computational work is done on hardware at the Department of Computer Science and the Centre for Advanced Computing .



Past Members

Marisa Boston Zhong Chen Anca Chereces Tejaswini Deoskar Effi Georgala
Kyle Grove Tim Hunter • David Lutz • Jiwon Yun • Yuping Zhou


Students and faculty are currently working on diverse projects in computational phonetics, phonology, syntax, and semantics.

Multi-Word Expression Processing

John Hale • Shohini Bhattasali • Jixing Li

This project looks into how our brain processes MWEs in English and French using data from a fMRI experiment. Through speech comprehension data via neuroimaging, we are investigating how this links to current linguistic theories comparing compositional meaning to frozen meaning.

Generalizable Learning

Jacob Collard • John Hale • Mats Rooth

This project explores the learnability of various syntactic formalisms such as Categorial Grammars and Dependency Grammars and attempts to provide a more naturalistic algorithm for learning syntax that does not rely on extensively annotated structures, but rather on inferences that can be made from basic knowledge. We are also investigating how learned grammars compare to engineered grammars and to claims made in theoretical syntax.

Finite-state phonology

Mats Rooth • Simone Harmath-de Lemos • Shohini Bhattasali

In this project we train a finite state model to detect prosodic cues in a speech corpus. We are specifically interested in detecting stress cues in Brazilian Portuguese and Bengali and finding empirical evidence for current theoretical views.

Recent publications

Here are some selected publications from recent work by faculty and graduate students:

  • John Hale, Shohini Bhattasali, Jonathan Brennan, Jixing Li, Wen-Ming Luh, Christoph Pallier, R. Nathan Spreng. (2017). Localizing Structure-building and Memory Retrieval in Naturalistic Language Comprehension. 9th Annual Meeting of the Society for the Neurobiology of Language (SNL 2017).
  • Matthew Nelson, Imen El Karoui, Kristof Giber, Xiaofang Yang, Laurent Cohen, Hilda Koopman, Sydney S. Cash, Lionel Naccache, John Hale, Christophe Pallier, Stanislas Dehaune. (2017) Neurophysiological dynamics of phrase-structure building during sentence processing. Proceedings of the National Academy of Sciences.
  • Matthew Nelson, Stanislas Dehaene, Christophe Pallier, and John Hale. (2017). Entropy Reduction correlates with Temporal Lobe Activity. Proceedings of the 7th Workshop on Cognitive Modelling and Computational Linguistics (CMCL 2017).
  • Jonathan Howell, Mats Rooth, and Michael Wagner. (2016). Acoustic classification of focus: on the web and in the lab. Doi 1813/42538
  • Jonathan R. Brennan, Edward P. Stabler, Sarah E. Van Wagenen, Wen-Ming Luh, and John T. Hale. (2016) Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94.
  • John Hale. (2016). Information-theoretical complexity metrics." Language and Linguistic Compass
  • Jixing Li, Jonathan Brennan, Adam Mahar, and John Hale. (2016). Temporal lobes as combinatory engines for both form and meaning. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC 2016).
  • John T. Hale, David E. Lutz, Wenming Luh, and Jonathan R. Brennan. (2015). Modeling fMRI time courses with linguistic structure at various grain sizes. Proceedings of CMCL 2015.
  • Shohini Bhattasali, Jeremy Cytryn, Elana Feldman, and Joonsuk Park. (2015). Automatic identification of rhetorical questions. Proceedings of the ACL 2015.

Recent Courses:

If you are interested in computational linguistics, these classes are a great way to get started in this area:

LING 4424: Computational Linguistics

Introduction to computational linguistics. Possible topics include syntactic parsing using functional programming, logic-based computational semantics, and finite state modeling of phonology and phonetics.

LING 4429/6429: Grammar Formalisms

This course introduces different ways of "formalizing" linguistic analyses, with examples from natural language syntax. Students learn to identify recurrent themes in generative grammar, seeing how alternative conceptualizations lead to different analytical trade-offs. Using distinctions such as rule vs constraint, transformational vs. monostratal and violable vs. inviolable, students emerge better able to assess others' work in a variety of formalisms, and better able to deploy formalism in their own analyses.

LING 4485/6485: Topics in Computational Linguistics

Current topics in computational linguistics. Recent topics include computational models for Optimality Theory and finite state models.

LING 2264: Language, Mind, and Brain

An introduction to neurolinguistics, this course surveys topics such as aphasia, hemispheric lateralization and speech comprehension as they are studied via neuroimaging, intracranial recording and other methods. A key focus is the relationship between these data, linguistic theories, and more general conceptions of the mind.


English 97, BankBaseline, and PF Linear Expansion: Please contact Mats Rooth.

The Cornell Conditional Probability Calculator (CCPC): Please contact ccpc@cornell.edu.

DeepParse 2.2, DepPrint 1.1, NegraToConfig: Please contact Marisa Boston.

Computation lexicon of Modern Greek annotated with POS and lemma, Newspaper corpus of Modern Greek:
Please contact Effie Georgala.

Useful links

Linguistics department
Natural Language Processing group
Cognitive Science program
Association of Computational Linguistics
Cornell Linguistics Circle