Member-only story

Universal Dependencies are Fundamentally Flawed

John Ball
11 min readApr 28, 2022

--

The 1950s model of language is behind the attempt to have a universal model of grammar, but a model based on meaning is the great simplification for scientific language models (Image: Adobe Stock)

Stanford University and Google continue to pursue the 1950s model of language using parts-of-speech in their Universal Dependencies (UD) project. This is the model shown to be NP-Hard (“unsolvable”) in the late 1980s but continues anyway— with humans manually annotating documents using their design.

“The … scheme is based on … (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).”

Why is this a topic to consider? Of course, there is nothing wrong with scientific enquiry, but the old thesis that is behind this— revived from antiquity in the 1930s — is highly ambiguous. Surely, we should improve the theory with the latest research in linguistics and neuroscience!

But when a prominent university is behind the work (Stanford), along with the company dominating global search (Google); their efforts will encourage others to follow their line of research despite its problems, rather than considering alternatives.

Today I will look at the Universal Dependencies project — an attempt to not only make parts-of-speech “work,” but to do it across multiple languages; and then compare it to our working model that is using meaning.

An artist’s impression of the Universal Dependency project: a theoretical mess. In the case of Universal Parts-of-Speech, the model is a fundamentally flawed compromise. Photo by Martijn Baudoin on Unsplash

Problem: Model is Flawed

What’s Wrong? Syntax-First Linguistics is Stuck

In the late 1950s, the world of linguistics was revolutionized by syntax: most scientists followed the new, formal approach. To solve linguistics, just find the set of sentences that are grammatical, and exclude those that aren’t.

Fast forward sixty years. No progress. It didn’t work.

Parsing Persistence

Why so persistent? Isn’t now the right time to pivot the model?

--

--

John Ball
John Ball

Written by John Ball

I'm a cognitive scientist working on NLU (Natural Language Understanding) systems based on RRG (Role and Reference Grammar). A mouthful, I know!

Responses (1)

Write a response