LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Reply to Kodner et al.: Fundamental misunderstanding of both model and methods

Photo by thinkmagically from unsplash

Our recent work (1) shows a program-learning model can acquire some key structures in natural language, including recursive hierarchies and patterns that require more than context-free capacities (2). Kodner et… Click to show full abstract

Our recent work (1) shows a program-learning model can acquire some key structures in natural language, including recursive hierarchies and patterns that require more than context-free capacities (2). Kodner et al.’s (KCY) commentary (3) is based on several fundamental misunderstandings. Most notably, they claim that our “model transforms candidate hypotheses into probabilistic context-free grammars that are evaluated against the training data via Bayesian inference.” This is unambiguously incorrect: At no point does our model convert hypotheses into probabilistic context-free grammars. Not only does it not do that, but that approach could not work because we show that the model can learn languages that are not context-free: No method of comparing context-free grammars could learn the languages our model succeeds on. Our model compares programs, not grammars, and we showed examples of these programs in the paper. KCY claim we conceive of “language as strings” and we fail to recognize that language has structure. This is another fundamental misunderstanding. It is true that the data provided to the model are strings (sequences of characters), following prior work on learnability, but the model uses the strings to discover structure, much like a linguist would. Finding latent structure behind strings is the only way for the model to generalize beyond what it has seen, which our results document it does. KCY contend that our analysis method is flawed because an n-gram model can show high performance on some languages when following our methods. Their interpretation is not correct. KCY only examined performance of an n-gram model on finite-state languages (figure 1 of ref. 3), which are precisely languages that an n-gram model can represent. So, of course an n-gram model can do well on their examples. In fact, no evaluation metric could show that n-gram models are poor on these languages because, simply, they are not. One has to look at nonfinite state languages, where our evaluation scheme shows an n-gram model fails (Fig. 1). Thus, our evaluation does exactly what it should: It scores an n-gram model high on languages they can learn (figure 1 of ref. 3) and low on languages they cannot learn (Fig. 1). The same metric shows that our model learns everything from finite sets to context-sensitive grammars. Finally, KCY grab onto our statement that people do not necessarily use the same methods as our implementation. Ours is a standard Marr (4) computational-level analysis: We hoped to formalize the problem people solve (Bayesian selection of generative processes) without necessarily knowing how they solve it. Our more modest claim is warranted because there is no evidence about how children solve this problem. The insight of Marr also answers KYC’s final question of how the model helps us understand acquisition. The model shows that learning a generating process for languages is possible, xx Dyck abc ab abc xx

Keywords: free grammars; gram model; model; context free; fundamental misunderstanding

Journal Title: Proceedings of the National Academy of Sciences of the United States of America
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.