Member-only story

Stanford University and Google continue to pursue the 1950s model of language using parts-of-speech in their Universal Dependencies (UD) project. This is the model shown to be NP-Hard (“unsolvable”) in the late 1980s but continues anyway— with humans manually annotating documents using their design.
“The … scheme is based on … (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).”
Why is this a topic to consider? Of course, there is nothing wrong with scientific enquiry, but the old thesis that is behind this— revived from antiquity in the 1930s — is highly ambiguous. Surely, we should improve the theory with the latest research in linguistics and neuroscience!
But when a prominent university is behind the work (Stanford), along with the company dominating global search (Google); their efforts will encourage others to follow their line of research despite its problems, rather than considering alternatives.
Today I will look at the Universal Dependencies project — an attempt to not only make parts-of-speech “work,” but to do it across multiple languages; and then compare it to our working model that is using meaning.
Problem: Model is Flawed
What’s Wrong? Syntax-First Linguistics is Stuck
In the late 1950s, the world of linguistics was revolutionized by syntax: most scientists followed the new, formal approach. To solve linguistics, just find the set of sentences that are grammatical, and exclude those that aren’t.
Fast forward sixty years. No progress. It didn’t work.
Parsing Persistence
Why so persistent? Isn’t now the right time to pivot the model?