Once upon a Time in 1954…
Posted: October 28th, 2015 | Author: Domingo | Filed under: Artificial Intelligence | Tags: Baroni, Bernardi, Chomsky, Erik T. Mueller, Georgetown-IBM experiment, IBM Watson, natural language processing, NLP, Zamparelli | Comments Off on Once upon a Time in 1954…
Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt
(The limits of my language are the limits of my world)
Ludwig Wittgenstein, Tractatus Logico-Philosophicus
… In a cold day of January it took place in Washington DC the Georgetown-IBM experiment, the first and most influential demonstration of automatic translation performed throughout the history. Developed jointly by the University of Georgetown and IBM, the experiment implied the automatic translation of more than 60 sentences from Russian into English. The sentences were chosen precisely; there was no syntactic analysis, which could manage to identify the sentence structure. The approach was mainly lexicographic, based on dictionaries in which a certain word had a link to some particular rules.
That episode was a success. Story has it that the level of euphoria amongst the researchers was such that it was stated that within three or five years the problem of the automatic translation would be solved… That was more than 60 years ago and the language problem –the comprehension and generation of messages by the machine- is still pending. Probably this is the last frontier which separates the human intelligence from the artificial intelligence.
In the structural scope Chomsky wrote in his book Introduction to the formal analysis of natural language that the native speaker of a language has the ability of understanding an immense number of sentences which s/he has never heard, as well as generating new sentences which are likewise understandable for other native speakers. The machine would have to mimic the child’s learning, being able to admit as input a sample of grammatical sentences, and to produce as output a language grammar –essentially finite. A grammar must set up a theory on those recurrent regularities which are encompassed under the expression “syntactic structure of the language”.
Thanks to the development of the study of the natural language processing in these last 60 years, the lexical and morphosyntactic levels have almost been able to be managed by the machine. Nonetheless, what happens with the semantic level and its “fearsome” ambiguity?
The Italian researchers Baroni, Bernardi y Zamparelli stated in their work Frege in Space: A Program for Compositional Distributional Semantics that the Semantic compositionality is the crucial property of natural language, according to which the meaning of a complex expression is a function of the meaning of its constituent parts (words) and of the mode of their combination.
The real problem of natural language processing for machines comes from the semantic load of nouns, verbs, and adjectives, and not from grammatical terms, which can be modelled very easily.
In his book Natural Language Processing with ThoughtTreasure, Erik T. Mueller, one of the researchers behind the success of IBM Watson, said that since the German philosopher Gottfried Leibniz’s Characteristica Universalis (17th century), there have been several attempts to find the perfect and universal language: a canonical representation of concepts. In this ideal environment the automatic translation programs would translate the message of the source language into the canonical linguistic representation, and then from the latter into the target language. Some breakthroughs have been achieved in this field but by the end of the day researchers have arrived to a cul-de-sac. The programs which have been built using canonical linguistic representations are unable to capture the elusive and open-ended nature of the human concepts. As Eco affirmed, sentences will always be open to an infinite number of interpretations.
Some months ago I had the chance of exchanging some messages about NLP with Éric Laporte, French professor and researcher at the Université Paris-Est Marne-la-Vallée, and both agreed that probably the first obstacle to solve this difficult problem was in the standpoint, from the very same moment in which, as he declared:
“Effectivement, les linguistes ont laissé les informaticiens s’installer dans l’ignorance de la complexité des langues, quand ils ne les ont pas encouragés”.
It is high time to change the approach.