Игорь Волков – Hardware and software of the brain (страница 7)
The problem is that the human brain groups words not according to formal categories of the part of speech, but according to their meaning and takes context into account simultaneously. In fact, there are no sequential levels in the live neurocomputer. Everything happens in a complicated computational system using many different neural nets which operate simultaneously in the parallel mode. Then, is syntax really useful? Maybe we should reduce the previous scheme to Lexicon – Pragmatics?
In the very distant past, the language, and the life itself, was simpler. In the mind of those people word phrases were translated directly into static or dynamic images. Probably word categories were used too, but they were meaningful. Not the noun but the object. Not the verb but the action. With the development of civilization, new features were added. The gerund is a noun-type word derived from the verb. Why not? Can we take a movie, pick one frame from it, and consider this frame as a static object? Yes, of course. The next example is abstract concepts. Take justice. Is it an object or an action? Looks like parts of speech and abstract grammar are necessary indeed.
Language imperfectness
Human language is a product of evolution. Nobody intentionally developed it. Various features were added by different people in different epochs. Some elements used by mathematics may be easily found, but look into the matters and you will discover that their development was simply uncompleted. Moreover, it works on the live computational system which was created using the same principle. The main goal of this language is not precision and efficiency but workability in the quite various, often harsh environment. It is easy and convenient for simple tasks, everyday use by millions of people. If you face complications and want super reliability, it is better to use more formalized tools.
The current state of human languages is the state of overcomplication. Too many features were piled together. Let's consider an example. The normal attribute to the noun is the adjective. What if we want to use another noun for this purpose? The normal way will be 'leg of chair'. For simplicity English allows 'chair leg', but how to agree it with syntax rules? In Russian, there is a simple way to produce an adjective from a noun. Their spelling will be different. In English 2 variants are used. The syntax allows a noun as an attribute to another noun. Alternatively, we can make a double entry into the dictionary. One as a noun, another as an adjective. Both create problems for a parser in computers. Even more problems will be on the semantical level because all semantical rules should be doubled as well.
Semantics of natural language
Basic semantic categories depend both on the structure of the real world and the operation of our perception. They represent what features we extract out of nature.
Language describes different types of reality. This may be external events in the environment, own actions of the speaker, the same actions of another man.
When we learn a language, be it our native one in school or a foreign language, the focus is usually made on grammar. Accordingly, the success of education is evaluated by the number of grammatical errors. Meanwhile, this is not the main goal of communication. If you miss a comma in a sentence, but the reader understands it correctly, no matter. Much worse if the sentence is grammatically correct but meaningless. I would prefer a language where I can freely choose options to express my ideas better rather than permanently fear to make an error.
Let's look how natural language represents meaning. It generates 2D images in the neocortex. The first sentence creates an image. The next – add details. To group words inside the sentence, grammar is used. A Part Of Speech has some generalized semantical load, but is mainly needed in syntax rules.
POS is defined for each word separately by enumeration. This does not obey any rules. Instead, POS itself defines how this word will be used in syntax.
It turns out that live humans use 2 principally different systems of language processing. 1 – intuitive, which you learned as a native spoken language. 2 – grammatical, learned with writing or as a foreign language.
Formal semantics
Semantics of human language may be formalized like it was already done with lexicon and syntax. What elements of meaning is it possible to single out from a text? Words have their own meaning which directly links language to the real world. We will concentrate on the next level of semantics – the meaning of the syntax, that is the meaning which emerges when words interact with each other according to the principle of compositionality in linguistics. The sentence (or the clause of complex sentences) is the smallest complete structure of language. This is enough to represent an idea. How sentences group in the text is a separate question. Let's discuss the meaning of the single sentence now.
Actions and items
3 different types exist: affirmative, interrogative sentences, and orders. The last 2 types are variations of the first one so let's consider semantics of the affirmative sentence. When a person conceives it, this is transformed into some internal image. It may have no details like 'A large air balloon hanged in the sky.' or may be rich in various parts. In this case parts are designated by various phrases of the sentence. The structure usually forms a hierarchy where large-scale parts have further details. At the upper layer, the sentence is divided into the subject phrase and the predicate phrase. Which is the main part of the sentence? Probably, predicate is more preferable. In this case, the whole sentence denotes some action. Static sentences such as 'An apricot is a fruit.' are not an exclusion. Rather, they are a particular case of inaction when nothing changes. If the text is a list of actions, then the whole of it is the answer to the question, "What happens?" A quite reasonable approach to the world and especially to the life with its dynamism.
Other parts of the sentence play certain roles in this action. The subject is an actor, the direct object is an application of this action while the prepositional object is an instrument or any other supplementary part. The roles may vary. If there is no actor in an action, the subject may designate the focus of attention. Note that the term 'object' is used differently in linguistics and programming. The former is a purely formal element of the syntax while the latter is meaningful and may be a very complicated construct. Objects in programming may represent both actions and items.
Now, we need some lexical semantics. Of course, any word has its own meaning, but words fall into several large groups. Verbs usually designate actions, nouns – items. Other words are used so as to build complex constructs. Adjectives denote properties of items. If you add an adjective to a noun, you will create a noun phrase and can add color, dimension, smell, even a texture of the surface to some object. Similarly, adverbs modify actions. The simplest verb phrase is verb + adverb. In addition, it may include other elements. As mentioned above, the subject denotes the main participant of the action. There is also the indirect object in the form of the prepositional phrase (He came with a new book.). The indirect object without the preposition (I gave him a new article.) is ellipsis where the preposition is dropped. An equivalent prepositional variant exists (I gave a new article to him.) so such reduced constructs may be considered derivative, auxiliary. The preposition itself designates a relation. This is especially obvious for spatial prepositions. 'Upon' and 'under' designate a direction (vertical as opposed to horizontal in this case) and also determine what is on top.
Overall semantics of the sentence may be denoted as
predicate(subject_phrase, direct_object, adverb, prepositional_object)
This looks like a function of C programming language or mathematics, right? Sentences of natural language are a powerful tool to describe variability of the analog world in discrete words. They are a subset of all the structures possible in mathematics. Why? Because they represent the internal gear of human perception. If you write a program in a human-like language, the computer will think like a human being.
This is only basics. It was enough for people millennia ago, but further, language evolved and got more and more complicated. This evolution is controversial. On one hand this made it possible to describe more situations, but on the other hand more and more troubles emerged. As new elements were added or old ones found additional usage, it affected the previously perfect composition. Nobody supervised these "amendments". Those who introduced new elements even didn't think about what they do so now we have literally a pile of features which are often not coordinated with each other. If you try to implement them mechanically from the list, the program simply will not work. The main problem for those who want to work with a more close approximation of human language is not only to implement more features separately, but also ensure that they will work in various combinations. Let's try and list these features one by one.