How many nouns are there in Finnish? A paper by Fred Karlsson investigates that question. The paper also considers their sound structure.
Karlsson used a machine-readable version of the Reverse Dictionary of Modern Standard Finnish (RDF, Suomen kielen käänteissanakirja). This lists 72,785 entries. Of those, 34,673 (47.6 %) have the code ‘S’, short for noun (including words like suomalainen ‘Finn; Finnish’, which have homonymous nominal and adjectival readings).
Number of nouns
Karlsson focussed on atomic free root morphemes (excluding derivatives and compounds) in the core vocabulary of words presumed to be known by every normal native speaker. So, he excluded:
- at least 16,000 fully productive and transparent derivatives like pysäköi+nti ‘parking’, nuole+skel+u ‘(habit of) licking’, ost+el+ija ‘one who habitually buys’, tilaa+ja ‘one who orders’, tanssi+ja+tar ‘female dancer’, dumppa+us ‘dumping’, suvaitse+vais+uus ‘tolerance’, riittä+mättöm+yys ‘insufficiency’, marksi+lainen ‘Marxist’.
- many morphologically complex nouns, for example 251 words ending in –isti and 285 ending in –ismi like aktivisti ‘activist’, aktivismi ‘activism’.
- at least 5,000 clear borrowings, including around 4,000 nouns containing the foreign letters <b d g z x f c š w q> (but excluding words written with the digraph <ng> and genuinely Finnish words with <d>).
- specialised terms (eg musical terms) and obsolete and dialectal words.
Karlsson concluded that the number of genuinely Finnish, morphologically atomic noun roots is around 12,000, perhaps even lower. He suggests this number is much less than popular beliefs would hold.
Karlsson then went on to analyse the sound structure of the pouns, focussing on the monosyllabic and disyllabic ones.
Karlsson found in RDF only 29 nouns consisting (in the nominative singular case form) of a single syllable and belonging to the core vocabulary (words known to any normal speaker of Finnish). Their sound patterns are shown below, in terms of consonants (C) and vowels (V):
- Almost all of them (24) have the pattern CVV: hai, hää, jää, koi, kuu, kyy, luu, maa, pii, puu, pyy, pää, suo, suu, syy, sää, tee, tie, tiu, työ, täi, voi, vuo, vyö
- 4 have the pattern CVVC: mies (‘man’), syys (‘autumn’), hius (‘hair’), ruis (‘rye’). Three of them inflect in a way that suggest that they were originally disyllabic: mies – miehe+n, hius – hiukse+n, ruis – rukii+n. Syys allows no inflection.
- One word has the pattern VV: yö (‘night’)
- No nouns occur with the patterns V, VVV, CV, CVC, VC, VVC (ien, oas, äes exist, but are disyllabic).
Disyllabic nouns ending in an open syllable
Table 1 shows disyllabic nouns ending in an open syllable (a syllable ending in a vowel). A full stop <.> shows the boundary between syllables.
|CV.CV||756||kala, peto, maku|
|V.CV||61||aho, ele, äly|
|CVV.CV||950||jousi, laatu, määrä, tuoli|
|VV.CV||52||aamu, aika, ääni|
|CVC.CV||1,795||hihna, kukko, pentu|
|VC.CV||143||ahma, olki, ämmä|
|CVVC.CV||628||haaska, juusto, lieska|
|VVC.CV||33||aalto, aitta, äänne|
|CVCC.CV||503||harppi, kalske, lamppu|
|VCC.CV||23||ankka, arkki, yrtti|
|(C)VVCC.CV||6||aortta, nyanssi, seanssi|
Karlsson makes the following comments on the table:
- very few underived noun roots have a long vowel or diphthong in their 2nd syllable. But there are around 100 borrowings (eg filee, revyy, turnee) and many bimorphemic derivatives like takuu (‘guarantee’, from takaa-, inflectional stem, ‘to guarantee’).
- strikingly, long (bimoraic) 1st syllables—CVC.CV (1,795) and CVV.CV (950)—are much commoner than the theoretically simplest pattern CV.CV (756).
- surprisingly, the trimoraic pattern CVVC.CV (628) is almost as frequent as monomoraic CV.CV.
- the share of monomoraic (C)V.CV is only 756+61=817 (16%).
- the four-moraic pattern (C)VVCC.CV appears only in a few borrowings.
- as is to be expected, CV-initial 1st syllables (eg CV.CV) are much more frequent than V-initial 1st syllables (eg V.CV), by a factor of 10-20.
- VV.V does not occur in underived words, but there is a single example of this pattern: the derivative ai.e ‘intention’ (ai+e, from aiko– ‘intend’) is a singleton.
Disyllabic nouns ending in a closed syllable
The above table covers only disyllabic nouns with an open 2nd syllable. There are around 800 underived disyllabic nouns with a closed 2nd syllable (ending in a consonant). Some 650 of them have a bimoraic first syllable.
Karlsson’s paper does not analyse trisyllabic nouns in detail, but he says a fast test shows that more than 75% of them too have a bimoraic (or even heavier) 1st syllable.
Preference for bimoraic 1st syllables
Karlsson also says the preference for bomoraic 1st syllables holds across the board for the vocabulary: 75% of the lexemes listed in RDF have a 1st syllable that is at least bimoraic (CVC: 40 378; CVV: 13,899; CV: 17,171).
Karlsson’s findings surprised him because languages generally prefer light CV syllables. He suggests the following possible reasons why Finnish may prefer bimoraic rather than monomoraic 1st syllables:
- languages with few phonemes (eg Finnish with 21) tend to have longer words than languages with more phonemes. He quotes a research finding that the mean word length in the African language Khoisan !Kung (147 phonemes) was only 4.02 segments against a mean word length in Turkish (28 phonemes) was 6.44 segments.
- bimoraic (and longer) syllables amplify the effect of the word-stress, which in Finnish is fixed on the first syllable.
- for morphophonological reasons, new words and borrowings prefer quantitative over qualitative consonant gradation. Quantitative consonant gradation is an alternation between the double consonants /pp/, /tt/ and /kk/, and their single counterparts /p/, /t/ and /k/. It can only occur in syllables that are (at least) bimoraic (eg rokki ‘rock’ [nominative singular] – roki+n [genitive singular]).
The phonemes of Finnish
If phonemically /long/ vowels and consonants are treated as combinations of identical phonemes, Finnish has 8 vowel phonemes /i e æ y ø u o a/.
/æ / is standardly written as <ä> and /ø/ as <ö>.
Finnish has 13 consonant phonemes /p t k d s h v j l r m n ŋ/. /ŋ/ is phonemic only when long, and is written as the digraph <ng>.
Karlsson concluded that Finnish has in its core vocabulary only 12,000 or fewer underived native words. Only 29 of them are monosyllabic and fewer than 6,000 are disyllabic.
There is a string preference for dimoraic 1st syllables. Trimoraic 1st syllables are about as common as monomoraic ones. CV-initial 1st syllables (eg CV.CV) are much more frequent than V-initial 1st syllables (eg V.CV). In disyllabic words, many more 2nd syllables are open than are closed.
Phonotactic complexity of Finnish nouns, Fred Karlsson (2005) in Inquiries into Words, Constraints, and Contexts. Festschrift for Kimmo Koskenniemi on his 60th Birthday, edited by A. Arppe, L. Carlson, K. Lindén, J. Piitulainen, M. Suominen, M. Vainio, H. Westerlund and A. Yli-Jyrä (2005). Online at kimmo_festschrift_Karlsson (helsinki.fi)