Crosslinguistic
correlations between size of
syllables, number of cases, and adposition order[1]
Gertraud Fenk-Oczlon &
August Fenk
In: G. Fenk-Oczlon & Ch. Winkler (eds.) 2005. Sprache und Natürlichkeit.
Gedenkband für Willi Mayerthaler. p.75-86.
Tübinger Beiträge zur Linguistik 483. Tübingen: Narr.
Previous crosslinguistic
studies by the authors have shown that a small number of phonemes per syllable
is associated with a high number of syllables per word and per clause, and,
moreover, with Object-Verb (OV) order and agglutinative morphology. And since
OV order is often connected with a tendency to postpositions (e.g. Greenberg
1966) and agglutinative morphology with both a tendency to postpositions and a
tendency to a higher number of cases, the present study investigates the
assumption of crosslinguistic correlations between these two tendencies and
between them and our „metric” variables mean size of syllables in terms of
phonemes and mean size of clauses in terms of syllables.
The results: All correlational coefficients showed the expected tendency,
i.e. the expected sign (+ or −). And a high number of morphological cases
turned out to be correlated with low syllable complexity (almost significant)
and with a tendency to postpositions (highly significant). Our interpretation
focuses on the association between syllable complexity and rhythmic organization,
e.g. an association of stress-timed rhythm with a tendency to higher syllable
complexity and fusion of morphemes.
The title of this paper
names three relevant variables that will be connected in the following
sections. In order to characterize the theoretical background of the study, we
will address to three other, rather abstract concepts:
·
The
program and goals of a „holistic”, or „systemic”, or „natural” (?) typology.
·
The
central role of „rhythm” within a holistic view of language.
·
Cognitive
constraints are effective in the sense of constraints of language variation:
the concepts of „cognitive costs” and „cognitive economy” have explanatory
power with respect to language universals and language evolution (Fenk-Oczlon
& Fenk 2002).
„Cognitive costs” are a central issue in Naturalness
Theory. For Mayerthaler „’more or less natural’” with respect to universal grammar and/or single languages’
grammar „boils down to ‘more or less easy for the human brain’” (Mayerthaler
1987:27)……”At this point ‘more or less natural’ (with respect to universals)
corresponds to ‘more or less easy for the human brain’” (Dressler &
Mayerthaler 1987:11).
Systemic Typology (Fenk-Oczlon & Fenk 1999) also
tries to explain constraints of language variation by constraints of relevant resources,
in particular by limits of our cognitive capacities. In this respect it corresponds
to Naturalness Theory. Thus it was tempting to call it „Natural Typology”
(Fenk-Oczlon 1997). From our point of
view this term might still be tempting in two other respects as well:
·
With
respect to the „natural” role of frequency: the more frequent, the more
familiar, and the lower the cognitive costs of processing! In terms of Zipf’s
(1949) tool analogy: The artisan refines and rearranges the
tools in a way so that frequently used tools are multifunctional, smaller, and
nearer to him. Lower distance and small size mean shorter access time and
reduced (cognitive) effort. Applied to the unit „word”: Frequently used words
are more familiar; they tend to get shorter, better retrievable, and rather
polysemous.
·
With
respect to the „natural” interdependencies between different levels of language
or different levels in the description of language:
Systemic
Typology suggests systematic interactions between sound structure, morphology
and syntax. Several authors (e.g. von der Gabelentz 1901, Skalička 1935,
Lehmann 1978, Donegan & Stampe 1983, Gil 1986, Plank 1998) already have assumed,
stated or described co-variations between prosodic, phonological, morphological,
and syntactic properties:
In
recent times, typologists have often confined themselves to seeking dependencies
among variable language-parts WITHIN syntax, WITHIN morphology, or WITHIN
phonology. As to dependencies BETWEEN levels or modules, syntax and morphology
were considered essentially the only candidates showing some real typological
promise. Dependencies between sound structure on the one hand and word, phrase,
clause, sentence, and discourse structure, or also lexical structure, on the
other were something respectable main stream typology has steered clear of. /…/
Nonetheless, the temptation to link phonological parameters of crosslinguistic
variation on the one hand and morphological and syntactic ones on the other has
now and again proved irresistible to the more adventurous, perhaps encouraged
by the ever popular all-encompassing master maxim that languages are systèmes
où TOUT se tient……(Plank 1998:195f)
The aim of linking phonological parameters of
crosslinguistic variation with morphological and syntactic parameters is the demanding
program of systemic or holistic typology, or, according to von der Gabelentz
(1901), of typology as such. Von der Gabelentz suggests that some of the components
interacting within the system language might be more decisive than others.
According to Donegan & Stampe (1983: 350) such a decisive factor might be
accent: „What but accent could be behind such holism? Accent is the only factor
pervading all the levels of language”. The languages’s accent, or more
generally, the languages’s rhythm will play a key role in the interpretation of
our empirical results.
In a previous crosslinguistic study (Fenk-Oczlon &
Fenk 1999) native speakers of 34 typologically different languages translated a
certain German „text” (a set of 22 unconnected simple declarative sentences)
into their mother tongue.[2]
Crosslinguistic computation revealed a pattern of significant correlations
between the „size” of syllables (in phonemes), of words (in syllables), and of
sentences (in syllables, in words). For instance: the fewer phonemes per
syllable, the more syllables per sentence. This was, as far as we can see, the
first really „crosslinguistic” correlation, i.e. a computation where each one
of the data-pairs (mean n of phonemes/syllable – mean n of syllables/sentence)
represents one of the languages of the sample. The results reported in this
study form a set of mutually dependent correlations:
a)
The more syllables per clause, the
fewer phonemes per syllable:
r
= - 0,75 (p<0,1%)
b)
The more syllables per word, the
fewer phonemes per syllable:
r
= - 0,54 (p<0,1%)
c)
The more syllables per clause, the
more syllables per word:
r
= + 0,47 (p<1%)
d)
The more words per clause, the fewer
syllables per word:
r
= - 0,66 (p<0,1%)
Additional results: Languages with simple syllables
showed a tendency to Object-Verb (OV) order (and to syllable-timed rhythm and
agglutinative morphology), while languages with more complex syllables tended
to Verb-Object (VO) order (and to stress-timed rhythm and fusional or isolating
morphology).
Agglutinative morphology is, moreover, often assumed
to be associated with a rather high number of cases and postpositions. And OV
order is not only associated with less complex syllables, but also with a
tendency to postpositions (e.g. Greenberg 1966 and our sample, where 72 % of
the postpositional languages showed OV and 90 % of the prepositional languages
VO.)
These results and considerations were the starting point for the following
correlational assumptions generated and examined in the present paper. In the
following hypotheses two of our metric variables – size of sentence in
syllables (A), size of syllable in phonemes (B) – are linked to the non-metric
variables number of cases (C) and predominant adposition order (D).
Correlations rAC and rAB are coupled to their partners rCD and rBD
by the above mentioned significant negative correlation between the number of
phonemes per syllable and the number of syllables per sentence.
Hypothesized correlations with C
(n of cases):
rCD the
fewer phonemes per syllable (D),
the higher the number of cases (C)
rAC the
more syllables per sentence (A),
the higher the number of cases
(C)
Hypothesized correlations
with B (prepositional versus postpositional):
rBD a
low number of phonemes per syllable (D)
is associated with a tendency to postpositions (B)
rAB a
high number of syllables per sentence (A)
is associated with a tendency to postpositions (B)
The tendency to suffixing is generally stronger than
the tendency to prefixing (e.g. Greenberg 1966). If postpositions get more
easily attached to the stem, thus forming a new semantic case (e.g. a local
case), then we may assume that
rBC a
tendency to postpositions (B) is associated
with
a tendency to a higher number of cases (C)
The following pattern of „inductive reasoning” is
running through the generation of the hypotheses (of the former and of the present
study), their statistical evaluation, and the diagrammatic representation of
the results: If a certain variable (lets say A) is known or assumed to be correlated
with two other variables (B, C), then it
is not implausible – and the more plausible the higher the correlations A-B and
B-C - to expect a correlation between B and C as well. Correlations A-B and A-C
together are at least a useful indication to search for a correlation B-C. The
most plausible expectation regarding the sign (+ or −) of the correlation
B-C depends on the signs of the correlations A-B and A-C: If these correlations
have the same sign, the prediction of the positive sign is more plausible than
the prediction of a negative sign: If an increase (or decrease) of A
corresponds to an increase (or decrease) in both partners (B an C), this
favours a positive correlation between these two partners. Correspondingly, in
the case of different signs it is more plausible to expect a negative sign of
the third correlation (Figure 1).
Figure 2 illustrates our theoretical model. It is,
first of all, inspired by the empirical and hypothetical arguments mentioned in
the last paragraphs of section 2. And it claims internal consistency, interlocking
se-veral triangles – i.e.: inductive inferences – of the sort explicated in
Figure 1: If, for instance, all correlations forming the square are negative,
then it is plausible to expect positive rather than negative correlations in
both diagonals. And if all the correlations, with the exception of rBC,
are already given with the signs as illustrated in the figure, then we may
expect a negative coefficient rBC.
But what about the empirical validity of this model?
Figure 1: Two
given correlations (between A and B, between A and C) indicating a third
correlation (between B and C) and the sign of the third correlation: If the
edges A-B and A-C represent correlations with different signs (left panel), the
most valid prediction regarding a possible correlation B-C is a negative sign.
In cases of equal signs in A-B and A-C (right panel) the most valid prediction
regarding B-C is a positive sign.
Figure 2:
Statistical arguments forming a „plausibility square” by interlocking four
triangles of the type shown in Figure 1. The pattern of inductive reasoning is
the same as in the triangles of Figure 1
(see text).
Assumptions were tested on a database of 32 languages.
(In 2 of our 34 languages - Annang and Ewondo - no sufficient grammatical
information was available so far.) In all these assumptions the respective
crosslinguistic correlations showed the expected tendency, i.e. the expected
sign. Only correlations rCD
(- 0.145) and rAC (+ 0.056) were far from statistical significance.
Coefficients regarding rBD (- 0.208) and rAB (+ 0.314)
were somewhat higher, and correlation rBC (- 0.494) turned out to be
highly significant despite the relatively small sample of languages. And
correlation rCD, when computed only in those 20 languages having
case, was r = - 0.371. This is rather promising: Given the same coefficient in
a sample with about ten more languages, this coefficient would already be
significant.
Figure 3: A
comparison between stress-timed languages (left panel) and syllable-timed
languages (right panel) with respect to the parameters
A:
mean number of syllables per sentence
B:
tendency to prepositions
C:
number of cases
D:
mean number of phonemes per syllable
Thus
we may say that the new statistical results match with our theoretical construction
illustrated in Figure 2 – either highly significant (rBC), or
almost significant (rCD), or at least with respect to the sign (+ or
−) of the correlational coefficient.
A good indication regarding the external validity and
prognostic value of our model are the facts
§
that all the six correlational
assumptions tested – all the six lines forming this theoretical model (Figure
2) – show the expected tendency (+ or -) and
§
that two of these correlations (rAD
and rBC) are highly significant
despite the rather small sample of languages.
If we connect, regardless of their significance, the
present results with our previous results, the division is into languages with
syllable timed rhythm and languages with stress-timed rhythm (see Figure 3 and
Table 1). In Figure 3, comparing stress-timed with syllable-timed languages,
the pattern of correlations is the same in both cases and the same as in Figure
2. But high parameters in the left panel correspond with low parameters in the
right panel, and vice versa:
Stress-timed rhythm is associated with a low number of
complex syllables per word and per clause, a high number of words per clause,
and non-metric properties such as the tendency to prepositions and to a low
number of cases.
Syllable-timed rhythm is associated with a high number of
simple syllables per word and per clause, a low number of words per clause, and
non-metric properties such as the tendency to postpositions and to a high
number of cases.
Table 1: A comparison between
languages with stress-timed rhythm versus languages with syllable-timed rhythm
stress-timed
rhythm syllable-timed rhythm
metric properties:
metric properties:
high n of phonemes per syllable low n of phonemes per syllable
low n of syllables per clause high n of syllables per clause
low n of syllables per word high n of syllables per word
high n of words per clause low n of words per clause
non-metric properties: non-metric properties:
fusional or isolating
morphology agglutinative morphology
VO
order OV
order
tendency
to prepositions tendency
to postpositions
low n of cases high n of
cases
cumulative case exponents separatist case exponents
Some aspects of our interpretation are already
anticipated in Table 1: Our empirical findings suggest that it is first of all
the rhythm which discriminates, or makes differences, between languages. The languages’
rhythmic organization seems to be rather the determinant than a consequence or
a specific aspect of different morphological types (isolating, agglutinative,
fusional). And the variability and size of the beats, of the basic measure of
this rhythm, reflect first of all the respective languages’ syllable complexity.
For instance: A language having exclusively V- and CV-syllables represents the
absolute minimum of both, size and variability of beats.
So far the interpretation only deals with characteristics
of segmentation, especially with the syllable complexity going hand in hand
with characteristic rhythmic patterns. But how should we imagine the association
of such metric properties with other properties such as adposition order and
number of cases?
Stress-timed languages are often (e.g. Dauer 1983,
Auer 1993) characterized by their proneness to reduction processes such as the
deletion of unstressed vowels, which results in relatively complex syllables.
Such reduction processes will, of course, also affect (grammatical) morphemes.
And if stress-timed rhythm also favours the fusion, cumulation and deletion of
morphemes, this will result in fusional and/or isolating morphology. This
means, moreover, that cumulative exponents will predominantly occur in
stress-timed languages. According to Plank (1986:32) „cumulative exponents
simultaneously express at least two co-occuring inflexional categories without
being formally segmentable into two or more parts, while separatist exponents
express only one inflexional category of a word form.” And languages with
cumulative exponents tend to a lower number of cases than languages with
separatist exponents (Plank 1986). These tendencies taken together might
„explain” the associations found between certain phonological traits like
syllable complexity and morphological traits such as the number of cases.
But why do languages with cumulative case exponents
tend to a lower number of cases? And why are cumulative exponents, as reported
by Plank (1999), associated with variance and separatist case exponents with
invariance?
We would argue (Fenk-Oczlon & Fenk 2000) that
frequency (token frequency) is a key concept that offers a rather simple explanation:
If a language has predominantly multifunctional cases (one case „accumulates”
two or more functions), then this language will manage with a rather low number
of case forms but will need and use each one of these case forms very often.
And signs with high token frequency tend to both high variability and short
coding for obviously economic reasons (e.g. Zipf 1949). This explanation should
also hold for „split morphology” – cumulation/variance and
separation/invariance – within languages: the most frequent cases in a given
language will tend to cumulation and variance.
Let us again take up Mayerthaler’s idea that
„natural”, when applied to (grammatical) language universals, boils down to
„easy for the human brain”. Language has to meet cognitive constraints, such as
short-term memory limits in terms of elements or chunks of elements that can be
kept within the focal attention, and the time limit known as the „psychological
present”. All languages, irrespective of their typological character, have to
adapt to these constraints. According to the „holistic” or „systemic” approach
of typology, each language goes through „natural” selfregulatory processes optimizing
the interaction between its phonology, morphology, and syntax and the
interaction with its „natural” environment, e.g. the articulatory and the
cognitive system.
Auer, Peter
(1993). Is a rhythm-based typology possible? A study of the role of
prosody in phonological typology. KontRI Working Paper (University
Dauer,
Rebecca M. (1983). Stress-timing and
syllable-timing reanalysed. Journal of Phonetics 11, 51-62.
Donegan, Patricia & Stampe, David (1983). Rhythm and the holistic organization of language
structure. In J.F. Richardson et al. (eds.), Papers from the Parasession on
the Interplay of Phonology, Morphology and Syntax, 337-353.
Dressler,
Wolfgang U. & Mayerthaler, Willi (1987). Introduction. In Wolfgang U. Dressler, Willi Mayerthaler, Otto Panagl
& Wolfgang U. Wurzel (eds.), Leitmotifs in Natural Morphology, 3-20.
Amsterdam/Philadelphia: John Benjamins.
Fenk-Oczlon, Gertraud (1983). Bedeutungseinheiten und sprachliche Segmentierung.
Eine sprach-vergleichende Untersuchung über kognitive Determinanten der
Kernsatzlänge. Tübingen: Gunther Narr.
- (1997). Thesen zu einer natürlichen Typologie. Papiere
zur Linguistik 56, 1, 107-116.
Fenk-Oczlon, Gertraud & Fenk,
August (1999). Cognition, quantitative linguistics, and systemic
typology. Linguistic Typology 3-2, 151-177.
- (2000). The magical number seven in language and
cognition: empirical evidence and prospects of future research. Papiere zur
Linguistik 62/63, 3-14.
- (2002). The clausal structure of linguistic and
pre-linguistic behavior. In T. Givón &
Bertram F. Malle (eds.), The evolution of language out of
pre-language, 215-229.
-
(2003). Crosslinguistic correlations between size of syllables, number of
cases, and adposition order. Paper presented at the Fifth International Conference of the Association for Linguistic Typology. Cagliari, September 2003.
Gabelentz, Georg von der (1901). Die Sprachwissenschaft: Ihre Aufgaben,
Methoden und bisherigen Ergebnisse. 2nd edition. Leipzig: Tauchnitz.
Gil, David (1986). A prosodic typology of language. Folia Linguistica
20, 165-231.
Greenberg,
Joseph H. (1966). Some universals of grammar
with particular reference to the order of meaningful elements. In Joseph H.
Greenberg (ed.), Universals of Language, 73-113.
Lehmann,
Winfred P. (1978). English: A characteristic SVO language. In W. P. Lehmann
(ed.), Syntactic typology: Studies in the phenomenology of language,
169-222.
Mayerthaler, Willi (1987). System-independent
morphological naturalness. In Wolfgang U. Dressler, Willi Mayerthaler, Otto
Panagl & Wolfgang U. Wurzel (eds.), Leitmotifs in Natural Morphology,
25-96. Amsterdam/Philadelphia: John Benjamins.
Miller, George A. (1956). The magical
number seven, plus or minus two: some limits on our capacity for processing information.
Psychological Review 63, 81-97.
Plank, Frans (1986). Paradigm size, morphological
typology, and universal economy. Folia Linguistica 20, 29-48.
-
(1998). The co-variation of phonology with morphology and syntax: A hopeful
history. Linguistic Typology 2, 1998, 195-230.
- (1999).
Skalička,
Vladimír (1935). Zur ungarischen Grammatik. Praha:
Zipf,
George K. (1949). Human behavior and the
principle of least effort. An introduction to human ecology.
Authors’ adresses:
Gertraud Fenk-Oczlon
Department of Linguistics and Computational Linguistics
August Fenk
Department of Media and Communication Studies
University
of Klagenfurt, Universitaetsstrasse 65-67, 9020 Klagenfurt, Austria.
[1]
This is an extended version of a paper
presented at the Fifth International Conference of the Association for
Linguistic Typology in
[2] Actually, the translations by 27 native speakers are already documented
in a doctoral dissertation (Fenk-Oczlon 1983) supervised by Willi Mayerthaler.
This study showed that Miller’s (1956)
”magical number seven” is also efficient in natural language processing and in
the sense of a language universal. The above mentioned 1999-study includes
already 34 languages, 18 Indoeuropean and 16 non-Indoeuropean. And meanwhile,
during a 2004 Fulbright research grant for Fenk-Oczlon at the