|
|
|
<<
Home
Evolution of Human Languages - An Introduction
|
There are currently about 6000 languages on our planet,
some of them spoken by millions and some by only a few dozen people. A primary
goal of EHL researchers is to provide a detailed classification of these
languages, organizing them into a genealogical tree similar to the accepted
classification of biological species. Since all representatives of the species
Homo sapiens presumably share a common origin, it would be natural to suppose -
although this is a goal yet to be achieved - that all human languages also go
back to some common source. Most existing classifications, however, do not go
beyond some 300-400 language families that are relatively easy to discern. This
restriction has natural reasons: languages must have been spoken and constantly
evolving for at least 40,000 years (and quite probably more), while any two
languages separated from a common source inevitably lose almost all superficially
common features after some 6,000-7,000 years.
Nevertheless, despite widespread scepticism and reluctance to tackle the
problem, there are a number of scholars who believe that these obstacles are
not insurmountable. Research has been going on over the past several decades
that appears to indicate that larger genetic groupings are not only possible,
but indeed quite plausible. It can be shown that most of the world's language
families can be classified into roughly a dozen large groupings, or macrofamilies.
Two sorts of evidence can be used for this purpose:
1) Even a superficial analysis of the vocabulary of a large number of
linguistic families reveals numerous lexical similarities extending far beyond
the borders of the smaller genetic units. They are frequently restricted to
individual macrofamilies (such as Eurasiatic, Afroasiatic etc.), but a
significant number of such matches have already been found between the
macrofamilies themselves, pointing to the probability of common origin.
2) Classical historical linguistics has developed a very powerful tool -
the comparative method - that allows the reconstruction of unattested language
stages, so-called proto-languages. It turns out that whereas modern languages
may vary significantly, protolanguages in various cases tend to be much more
similar to one other. This is the case, e.g., with Indo-European, Uralic and
Altaic: modern English, Finnish, and Turkish may have almost nothing in common,
but their respective ancestors - Proto-Indo-European, Proto-Uralic and
Proto-Altaic - appear to have many more common traits and common vocabulary.
This means that the possibility exists of extending the time perspective and
reconstructing even earlier stages of human language and much of this research
has already been conducted.
The amount of information that has to be
processed in order to achieve a deep linguistic taxonomy is enormous - if one
keeps in mind that one has to process thousands of languages and hundreds of
linguistic families. Modern computer technology, however, provides some
solutions to these problems. The first step that needs to be taken is a
compilation of computer databases containing established matches between
related languages - etymologies. The primary goal of the EHL research is
therefore to collect and compile such databases and to make them easily
available: in the present world this means making them available on the Web.
A large set of computer databases is already available and many of them are
already online. The databases provided by the EHL participants, and freely
browseable on the Web, include Altaic, Dravidian, (North) Caucasian, Yenisseian,
Sino-Tibetan, Indo-European, Austroasiatic, Chukchi-Kamchatkan, and Semitic.
For many other language families the databases are in the stage of preparation.
Etymological databases for several macrofamilies are also being compiled,
and several of them - Australian, Eurasiatic (Nostratic) and Afroasiatic - are
already near completion. Once an etymological database becomes available it can
be used to significantly simplify the task of searching for lexical cognates
and building up higher level databases. Etymological databases can also be used
(and are being used) for a statistical evaluation of taxonomic correlations.
The number of etymological matches between languages is a good measure of the
distance between them and they can also be employed for evaluating the time
depth of any linguistic family. In fact, so-called lexicostatistics is the only
available tool for absolute linguistic dating and its theoretical rationale
and practical employment is one of the central tasks of the EHL project.
While the project is concentrated on building up a hierarchical system of
etymological databases, reflecting the hierarchical taxonomy of the linguistic
genealogical tree, it is also concerned with collecting and putting online
primary language wordlists as well as existing etymological sources. The ideal
etymological database system should be able to provide an etymology for any
word in any modern or ancient language, tracing its origin as far as possible.
The participants of the project have provided source wordlists for poorly
explored language families such as Indo-Pacific and Australian, where most of
the comparative work is yet to be done. They have also scanned, recognized,
and converted to database format some of the major existing etymological
dictionaries, such as Pokorny's Indo-European etymological dictionary.
The ultimate goal of the system of databases described above is to arrive
at a stage when an absolute majority of the world's languages can be reduced
to a minimum number of huge language macrofamilies, which in turn can be traced
back to a Proto-Sapiens stage, should the databases provide sufficient
evidence to support the hypothesis of monogenesis. With the database system
completed, and the basics of the Proto-Sapiens structure established, we can
hope to come into possession of a vital tool for helping us understand the
nature of the origin of language itself.
|