Projects
Students and faculty are currently working on diverse projects in computational phonetics, phonology, syntax, and semantics.
Finite-state phonology
Mats Rooth • Simone Harmath-de Lemos • Shohini Bhattasali • Anna Choi
In this project we train a finite state model to detect prosodic cues in a speech corpus. We are specifically interested in detecting stress cues in Brazilian Portuguese and Bengali and finding empirical evidence for current theoretical views.
Xenophobia and dog whistle detection in social media
Kaelyn Lamp • Marten van Schijndel
In this collaboration with the Cornell Xenophobia Meter Project, we study the linguistic properties of social media dog whistles to better identify extremist trends before they gain traction.
Models of code-switching
Marten van Schijndel • Debasmita Bhattacharya • Vinh Nguyen • Andrew Xu
In this work, we study bilingual code-switching (that is, where two languages are used interchangably in a single utterance). We are particularly interested in how information flows across code-switch boundaries; how information from a span in Language 1 can influence productions in and comprehension of spans in Language 2. We are also studying what properties influence code-switching and whether they occur mainly to ease production for the speaker or whether they mainly serve to ease comprehension for the listener.
Representation sharing in neural networks
Marten van Schijndel • Forrest Davis • Debasmita Bhattacharya • William Timkey
Much work has gone into studying which linguistic surface patterns are captured by neural networks, and in this work we are interested in studying how various surface patterns are grouped into larger linguistic abstractions within the networks and in studying how those abstractions interact. Is each instance of a linguistic phenomenon, like filler-gap, related to other instances of that phenomenon (i.e. models encode a filler-gap abstraction) or is each contextual occurrence encoded as a separate phenomenon?
Summarization as Linguistic Compression
Marten van Schijndel • Fangcong Yin • William Timkey
In this work, we conceptualize summarization as a linguistic compression task. We study how different levels of linguistic information are compressed during summarization and whether automatic summarization models learn similar compression functions. We also study how each aspect of linguistic compression is correlated with various measures of summary quality according to human raters.
-
Sidharth Ranjan, Marten van Schijndel, Sumeet Agarwal, and Rajakrishnan Rajkumar. (2022), Dual Mechanism Priming Effects in Hindi Word Order
Proceedings of The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022)
-
Sidharth Ranjan, Marten van Schijndel, Sumeet Agarwal, and Rajakrishnan Rajkumar. (2022), Discourse Context Predictability Effects in Hindi Word Order
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
-
Forrest Davis and Gerry T.M. Altmann. (2021), Finding Event Structure in Time: What Recurrent Neural Networks can tell us about Event Structure in Mind
Cognition, 213: 104651, 2021
-
-
Marten van Schijndel and Tal Linzen. (2021), Single-stage prediction models do not explain the magnitude of syntactic disambiguation difficulty
Cognitive Science, 45(6): e12988, 2021
-
Simone Harmath-de Lemos. (2021), Detecting word-level stress in continuous speech: A case study of Brazilian Portuguese
Journal of Portuguese Linguistics 20.1, 2021
-
Eric Campbell and Mats Rooth. (2021), Epistemic semantics in guarded string models
Proceedings of the Society for Computation in Linguistics (SCiL).2021
-
William Timkey and Marten van Schijndel. (2021), All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021
-
Forrest Davis and Marten van Schijndel. (2021), Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning
In Proceedings of the 2021 Annual Conference of the Association for Computational Linguistics (ACL). 2021
-
Matt Wilber, William Timkey, and Marten van Schijndel. (2021), Understanding How Abstractive Summarizers Paraphrase Text
Proceedings of the 2021 Findings of the ACL. 2021
-
Samuel Ryb and Marten van Schijndel. (2021), Analytical, Symbolic and First-Order Reasoning within Neural Architectures
Proceedings of the 2021 Workshop on Computing Semantics with Types, Frames and Related Structures. 2021.
-
Cory Shain, Idan Blank, Marten van Schijndel. (2020), William Schuler, and Evelina Fedorenko. (2020) fMRI reveals language-specific predictive coding during naturalistic sentence comprehension.
Neuropsychologia, 138:107307. 2020
-
Forrest Davis and Marten van Schijndel. (2020) Recurrent neural network language models always learn English-like relative clause attachment.
Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics (ACL). 2020.
-
Forrest Davis and Marten van Schijndel. (2020) Discourse structure interacts with reference but not syntax in neural language models.
24th Conference on Computational Natural Language Learning (CoNLL). 2020.
-
Debasmita Bhattacharya and Marten van Schijndel. (2020) Filler-gaps that neural networks fail to generalize.
24th Conference on Computational Natural Language Learning (CoNLL). 2020.
-
Forrest Davis and Marten van Schijndel. (2020) Interaction with context during recurrent neural network sentence processing.
Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society (CogSci). 2020.
Publications prior to 2020
-
Marten van Schijndel, Aaron Mueller, and Tal Linzen. (2019) Quantity doesn't buy quality syntax with neural language models.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCAI). 2019.
-
Grusha Prasad, Marten van Schijndel, and Tal Linzen. (2019) Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models.
Proceedings of the 2019 Conference on Computational Natural Language Learning (CoNLL). 2019.
-
Forrest Davis and Abby Cohn. (2019) Effects of lexical frequency and compositionality on phonological reduction in English compounds.
25th Architectures and Mechanisms of Language Processing conference (AMLaP 2019)
-
Jacob Collard. (2018) Finite State Reasoning for Presupposition Satisfaction.
Proceedings of the First International Workshop on Language Cognition and Computational Models (COLING 2018)
.
-
Shohini Bhattasali, Murielle Fabre, John Hale. (2018) Processing MWEs: Neurocognitive Bases of Verbal MWEs and Lexical Cohesiveness within MWEs.
Proceedings of the 14th Workshop on Multiword Expressions (COLING 2018)
.
-
Simone Harmath-de Lemos. (2018) What Automatic Speech Recognition Can Tell Us About Stress and Stress Shift in Continuous Speech.
Proceedings of the 9th International Conference on Speech Prosody 2018
.
-
Jixing Li, Murielle Fabre, Wen-Ming Luh, John Hale. (2018) Modeling Brain Activity Associated with Pronoun Resolution in English and Chinese.
Proceedings of NAACL Workshop on Computational Models of Reference, Anaphora, and Coreference (CRAC 2018)
.
-
Jacob Collard. (2018) A Naturalistic Inference Learning Algorithm.
Linguistic Society of America (LSA 2018)
.
-
Shohini Bhattasali, John Hale, Christophe Pallier, Jonathan R. Brennan, Wen-Ming Luh, R. Nathan Spreng. (2018) Differentiating Phrase Structure Parsing and Memory Retrieval in the Brain.
Proceedings of the Society for Computation in Linguistics (SCiL 2018)
.
-
Mats Rooth. (2017) Finite-state intensional semantics.
12th International Conference on Computational Semantics (IWCS 2017)
.
- Matthew Nelson, Imen El Karoui, Kristof Giber, Xiaofang Yang, Laurent Cohen, Hilda Koopman, Sydney S. Cash, Lionel Naccache, John Hale, Christophe Pallier, Stanislas Dehaune. (2017) Neurophysiological dynamics of phrase-structure building during sentence processing.
Proceedings of the National Academy of Sciences
.
- Matthew Nelson, Stanislas Dehaene, Christophe Pallier, and John Hale. (2017). Entropy Reduction correlates with Temporal Lobe Activity.
Proceedings of the 7th Workshop on Cognitive Modelling and Computational Linguistics (CMCL 2017)
.
-
Jacob Collard. (2016) Inferring Necessary Categories in CCG.
9th International Conference on the Logical Aspects of Computational Linguistics (LACL 2016)
.
- Jonathan Howell, Mats Rooth, and Michael Wagner. (2016). Acoustic classification of focus: on the web and in the lab. Doi 1813/42538
- Jonathan R. Brennan, Edward P. Stabler, Sarah E. Van Wagenen, Wen-Ming Luh, and John T. Hale. (2016) Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157, 81-94.
-
John Hale. (2016). Information-theoretical complexity metrics."
Language and Linguistic Compass
-
Jixing Li, Jonathan Brennan, Adam Mahar, and John Hale. (2016). Temporal lobes as combinatory engines for both form and meaning.
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC 2016)
.
-
John T. Hale, David E. Lutz, Wenming Luh, and Jonathan R. Brennan. (2015). Modeling fMRI time courses with linguistic structure at various grain sizes.
Proceedings of CMCL 2015
.
-
Shohini Bhattasali, Jeremy Cytryn, Elana Feldman, and Joonsuk Park. (2015). Automatic identification of rhetorical questions.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015)
.
LING 4424: Computational Linguistics
Introduction to computational linguistics. Possible topics include syntactic parsing
using functional programming, logic-based computational semantics, and finite state
modeling of phonology and phonetics.
LING 4434: Computational Linguistics 2
Computational Linguistics 2 - This course introduces techniques to probe for linguistic representations in neural network models of language. Centered around discussion of current research papers as well as student research projects.
LING 4429/6429: Grammar Formalisms
This course introduces different ways of "formalizing" linguistic analyses, with
examples from natural language syntax. Students learn to identify recurrent themes in
generative grammar, seeing how alternative conceptualizations lead to different analytical
trade-offs. Using distinctions such as rule vs constraint, transformational vs. monostratal
and violable vs. inviolable, students emerge better able to assess others' work in a variety
of formalisms, and better able to deploy formalism in their own analyses.
LING 4485/6485: Topics in Computational Linguistics
Current topics in computational linguistics. Recent topics include computational models
for Optimality Theory and finite state models.
LING 2264: Language, Mind, and Brain
An introduction to neurolinguistics, this course surveys topics such as aphasia,
hemispheric lateralization and speech comprehension as they are studied via neuroimaging,
intracranial recording and other methods. A key focus is the relationship between these data,
linguistic theories, and more general conceptions of the mind.