I have just read a paper describing 16 differences between Bosnian, Croatian, Montenegrin, and Serbian. The description was in a paper that looked for the border between language varieties separated by each difference. The paper also looked at whether those borders match national borders and how close the varieties are to each other.
The authors based their analysis on Twitter messages posted between 2013 and 2016. The paper is Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue, by Nikola Ljubešić, Maja Miličević Petrović and Tanja Samardžić, Journal of Linguistic Geography, Volume 6 Issue 2 (2018).
I summarise below:
- some background on languages spoken in today’s states of Bosnia-Herzegovina, Croatia, Serbia and Montenegro
- the methodology for the study reported in the paper (including a description of the 16 differences studied)
- the results of the study
- the author’s conclusions
I also give a link to a site providing more resources on this topic.
Background
This post deals with languages varieties used in the area occupied today by Bosnia-Herzegovina, Croatia, Montenegro and Serbia. For simplicity, this post uses the term BCMS territories as short hand for that area.
Linguistic varieties in the BCMS territories
Linguistic standardisation took place in the BCMS territories in the 19th century. At that time, several varieties of Southern Slavonic were used in the BCMS territories. Those varieties are best distinguished by two prominent features:
- the form of the question word meaning what: što / kaj / ča
- the modern counterpart of the Proto-Slavic vowel known as jat: [e] / [je] / [i]. The ča variety mostly has [i] as the modern counterpart of jat and the kaj variety has [e]. The što variety—which is the most widespread—can have any of [e] / [je] / [i].
(For a discussion of how jat developed in Russian and Bulgarian, please see Bulgarian through Russian – Language Miscellany)
When standardisation began, those features were distributed as follows:
- Belgrade (Serbia): što + e or što + je. The first variety was (and still is) predominant here, but the second one was also used, especially in folk tales and poems, highly valued at the time. The second variety also allowed a connection with the Serbian-oriented (Orthodox) population outside the territory under the influence of Belgrade.
- Zagreb (Croatia): kaj + e (Kajkavian) or ča + i (Čakavian) or što + je (Štokavian). There was no clear preference for any of the three varieties. The first one was (and still is) spoken in the city of Zagreb and had a literary tradition. The second one also had a rich literary tradition and prestige, mostly in Dalmatia. The third one was (and still is) most widely spread in the Croatian-oriented (Catholic) population.
- Sarajevo (Bosnia) and Cetinje (Montenegro): što + je was the only variety.
Standardisation in the 19th century
None of the main cultural centres—Belgrade (Serbia), Zagreb (Croatia), Sarajevo (Bosnia-Herzegovina), or Cetinje (Montenegro)—had the political power to implement standardisation fully. The BCMS territories were split between the Austro-Hungarian and Ottoman empires. Montenegro and Serbia were the first to obtain full independence (in 1878), though not within today’s borders.
The approach to standardisation was strongly influenced by an interplay between:
- an integrationist vision (of a common future for all Slavic groups living in a single independent Slavic state); and
- a separatist vision (of keeping local cultural and, especially, religious differences).
This interplay has continued to the present day.
Vuk Karadžić, a prominent 19the century Serbian language reformer supported by the Austrian authorities, proposed adopting što + je. His proposal was accepted in Zagreb, the centre of the Illyrian movement. That movement had the goal of unifying all South Slavs and countering the dominance of the German and Hungarian languages. Belgrade accepted the proposal by Karadžić but kept the što + e version. Thus, almost the same variety was codified in all four centres. This provided the basis for further unification efforts, especially when Yugoslavia adopted Serbo-Croatian (or Croato-Serbian) as its main official language.
Almost as strong as demands for unifying the South Slavs were demands, especially in Zagreb, to keep local varieties and connections with not only Štokavian but also Kajkavian and Čakavian literary traditions, and to avoid Serbian dominance.
Divergences were codified in the 1960s into two ‘variants’: ‘eastern’ (Belgrade) and ‘western’ (Zagreb). From 1974 the constitution allowed separate ‘standard idioms’ in the four Serbo-Croatian speaking republics. After Yugoslavia broke up, the desire to re-codify and separate the four standards became predominant in all four centres. Nevertheless, all four centres still have as a base ‘almost the same’ variety: the one chosen for codification in the 19th century.
Names are still fluid
There may not yet be a consensus on whether the four language varieties spoken in the BCMS territories are:
- separate (though closely related) languages; or
- varieties or dialects of one (or more) language(s).
As a result, the names for the four varieties still seem to be fluid. Evidence of this fluidity comes from Montenegro’s 2011 census: respondents speaking one of the 4 varieties gave 9 different names for their mother tongues: Montenegrin (229,251), Serbian (265,895), Bosnian (33,077), Bosniak (3,662), Croatian (2,791), Montenegrin-Serbian (369), Croatian-Serbian (224), Serbo-Croat (12,559), Serbo-Montenegrin (618). For some speakers at least, some of the following labels might also refer to one of the 4 varieties: Mother tongue (3,318), Regional languages (458), Does not want to declare (24,748).
Other languages listed were: Albanian (32,671), Roma (5,169), Russian (1,026), Macedonian (529), Hungarian (225), English (185), German (129), Slovenian (107), Romanian (101), Other (2,917).
Methodology
The researchers studied 693,111 tweets (by 13,102 users) that were geo-encoded in Bosnia, Croatia, Montenegro or Serbia and posted on Twitter between June 2013 and the end of 2016. The geo-encoding identifies where the user tweeted, not which variety of the language(s) the user speaks. For example, someone tweeting in Bosnia-Herzegovina might speak Bosnian, but they might speak Serbian, Croatian or Montegrin.
The researchers studied 16 variables, selected to meet the following 3 criteria:
- The variables reflected differences identified in grammars, orthography manuals, or studies on differences between the new standard languages. Most early studies focused on Serbian and Croatian. Recent ones have given more attention to Bosnian, largely to disentangle its Croatian-like and Serbian-like features. As the youngest of the standard languages, often viewed in the past as just a variant of Serbian, Montenegrin has had least coverage.
- The variables needed to be easy to retrieve from the data automatically. Excluded were variables that are difficult to retrieve automatically, for example because of homonymy. For instance, the contrast between te (Croatian) and pa (Serbian), both meaning ‘then’, was not studied. This was because other words are also spelled te: the accusative singular 2nd person personal pronoun (as in vidim te ‘I see you’) and a demonstrative pronoun (as in te kuće ‘those houses’).
- The variables had to appear often enough to permit meaningful analysis of their spatial distribution. For example, the lexical variables studied were function words and not content words, such as voz / vlak ‘train’, hleb / kruh ‘bread’, even though differences in content words are often viewed as being the most prominent differences.
Results
The researchers focused on the geographical boundaries between the areas where different values of the variable occurred. For example, the analysis looked at the boundaries between areas where the interrogative pronoun meaning ‘who’ takes the form tko and the areas where it takes the form ko.
The researchers identified 4 patterns of results:
- Croatia vs rest (4 of the 16 variables studied)
- Croatia and Bosnia-Herzegovina vs. Montenegro and Serbia (6 variables)
- Serbia vs rest (2 variables)
- Eastern v Western, but division not aligned with political boundaries (4 variables)
Croatia v the rest
For 4 variables, the researchers identified as dominant one form in Croatia but a different form in the other 3 countries:
- In deriving borrowings from international verbs, the verbal suffix -ira is typical in Croatia (as in promovirati ‘promote’, registrirati ‘register’), but -isa and -ova prevail in Serbia, Bosnia-Herzegovina and Montengro (promovisati, registrovati).
- The interrogative pronoun meaning ‘who’ takes the form tko in Croatia but ko in Serbia, Bosnia-Herzegovina and Montenegro. The same goes for the derived pronouns niko/nitko ‘nobody’, svako/svatko ‘everybody’, neko/netko ‘somebody’, and iko/itko ‘anybody’.
- In words of Greek origin which started with initial ch-, K is used in Croatia (eg kemija ‘chemistry’). H is used in Serbia, Bosnia-Herzegovina and Montenegro (eg hemija).
- In some words, eg jučer/juče ‘yesterday’, takođe/ također ‘also’, the final r can occur (typical of Croatia) or be dropped (typical of Serbia, Bosnia-Herzegovina and Montenegro).
Croatia and Bosnia-Herzegovina vs. Montenegro and Serbia
For 6 variables, the researchers identified as dominant one form in Croatia and Bosnia-Herzegovina but a different form in Serbia and Montenegro
- After modals (voliti ‘like’, moći ‘can’, morati ‘must’, smeti ‘dare, may’, trebati ‘need’) or phasal verbs (početi ‘begin’, završiti ‘end’), an infinitive is used in Croatia and Bosnia-Herzegovina if the subject remains the same (as in volim pisati ‘I like to write’). In Serbia and Montenegro, these verbs take as complement da (‘that’) + present tense form of the verb (as in volim da pišem ‘I like to write’, literally ‘I like that I write’)—a construction also typical of some other languages in the Balkans.
- In Croatia, where personal forms are normally accompanied by infinitives, the modal verb trebati ‘need’ normally appears in a personal form: trebam ići (‘I need to go’). In Serbia, trebati is often used impersonally: treba da idem (‘it needs that I go’).
- The intensifying adverbs mnogo and puno ‘many, a lot’, are both used in all variants of BCMS, but puno is particularly typical of Croatian and Bosnia-Herzegovina, and mnogo of Serbian and Montenegrin.
- h is sometimes omitted at the beginning of a word, and omitted or replaced with an alternative (typically j or v) within a word. Examples are hrđa/rđa ‘rust’, snaha/snaja ‘sister/daughter-in-law’, čahura / čaura ‘cocoon; capsule’, and gluh/gluv ‘deaf’. The orthographic norm of Serbo-Croatian requires h where this reflects etymological criteria, and in practice Croatian and Bosnian keep the h, but Serbian mostly allows both forms. Furthermore, in Bosnian, h is added in some words that do not contain it in Croatian and did not contain it etymologically—examples are kahva ‘coffee’ (Croatian kava, Serbian kafa), lahko ‘easily’ (Croatian and Serbian lako). The Bosnian norm also bans suv ‘dry’, duvan ‘tobacco’, and other similar Serbo-Croatian alternatives to suh and duhan. Montenegrin seems to pattern with Serbian, but without a clearly formulated rule, and with some inconsistencies.
- Croatian has an analytic form of the future tense, with the infinitive (short form—as discussed separately below) and the auxiliary (clitic forms of the auxiliary hteti ‘want’) written as separate words (as in pisat ću ‘I will write’). The future tense has a synthetic form for most verbs in Serbian, with the clitic merged onto the verb (as in pisaću). The analytic form is used in Serbian too when the verb ends in
-ći (reći ću ‘I will say’). Bosnian uses both forms, apparently with no clear preference for one or the other. The Montenegrin norm allows both forms, but synthetic forms may be more common. - The preposition s(a) ‘with’ is more often s in standard Croatian and Bosnian, but is more often sa in standard Serbian.
Serbia vs rest
For two variables, the researchers found one dominant form in Serbia but a different dominant form in the other 3 countries.
- The modern counterpart of the Proto-Slavic vowel known as ‘jat’ is [e] (as in mleko ‘milk’, or pesma ‘song’) in Serbia but [(i)je] (mlijeko, pjesma), as in Croatia, Bosnia-Herzegovina and Montenegro. The researchers comment that this is the most pervasive of all the differences they studied.
- To derive feminine agent nouns, the suffix –ica (as in nastavnica ‘teacher’) dominates in Croatia and Bosnia-Herzegovina, but in Serbian the suffixes –ka (čitateljka ‘reader’) and –inja (laborantkinja ‘lab technician’) are frequent as well. Inter-varietal differences between -ica and -ka mostly occur after -or and -ar (as in profesor – profesorica / profesorka ‘professor’, or zubar – zubarica / zubarka ‘dentist’). The study looked only at -rica and –rka, because both -ica and –ka are too generic and do not always mark agents. (Also, the study did not look at the most widely discussed suffix pair, -telj/-lac (as in čitatelj/čitalac ‘reader’). This was because that ending is too generic.)
Eastern v Western, but division not aligned with political boundaries
For 4 variables, the researchers found an east-west split, but not along political boundaries. For the following 2 variables, the split was roughly Serbia v the other 3 countries:
- Yes/no questions are asked using interrogative particles je li and da li. Je li is the norm in Croatian, where da li only occurs in the colloquial register. Serbian uses both forms, but commonly shortens je li to je l’, jel’ or jel and uses it colloquially, whereas the preferred full form is da li. Bosnian seems mixed, with a moderate preference for Croatian-type question forms. The Montenegrin orthography manual lists both je li and da li.
- The full infinitival form of verbs ends in either -ti or -ći (pisati ‘write’; ići ‘go’). In Croatian, it is common to shorten the infinitives by removing the final i (as in pisat, ić). Sometimes this occurs in forming the future tense (for verbs ending in –ti, eg pisat ću ‘I will write’ as mentioned above in discussing the analytic form of the future tense). Sometimes this occurs colloquially. This shortening is rare in Serbian.
For the other 2 variables, the split was one not seen for other variables: roughly Croatia and Montenegro on one side, Bosnia-Herzegovina and Serbia on the other side. This split too was not aligned closely with the political borders:
- In dialects based on što (as opposed to kaj and ča), the standard form of the interrogative pronoun ‘what’ is što in Croatian, Bosnian and Montenegrin. Serbian reference works list both šta and što, but šta is more common.
- A vowel is sometimes added to the end of an adjective for easier pronunciation or as a stylistic marker. The most typical case is -a in genitive singular masculine adjectives: for example novoga (‘of the new’, frequent in standard Croatian) instead of novog (more typical of Serbian).
Conclusions
The authors draw the following main conclusions from their analysis:
- For most variables, the boundaries between variants follow national political borders. These boundaries reflect long-standing linguistic and normative differences, as well as recent separate standardisation processes, which are separate for each country. However, the match is never complete, and boundaries differ for different variables.
- There is an overall split between east (Serbia) and west (Croatia). Bosnia-Herzegovina and Montenegro pattern sometimes with the east and sometimes with the west.
- Within Bosnia-Herzegovina, some linguistic boundaries reflect ethnic divides. For 5 variables, parts of Bosnia-Herzegovina populated heavily with either ethnic Croats or ethnic Serbs resemble that ethnic group’s ‘mother country’.
- in southern Bosnia-Herzegovina, with a large Croatian population, the form preferred in Croatia occurs for 2 variables: (1) initial k-, not h- in words of Greek origin (eg kemija, not hemija); (2) retention of final -r in, for example jučer ‘yesterday’ and također ‘also’.
- in the area of central-northern Bosnia-Herzegovina populated mostly by ethnic Serbs, the form preferred in Serbia occurs for 3 variables: (1) omission or replacement of h in, for example: rđa (vs hrđa); snaja (vs snaha); čaura (vs čahura); and (gluv vs gluh); (2) the synthetic form of the future tense (as in pisaću), rather than the analytic form (as in pisat ću); (3) more frequent use of sa rather than s (‘with’).
- Usage within Serbia is more consistent than within the other 3 countries. This is because ‘it is more compact dialect-wise than Croatia, and more centralized standard-wise than Bosnia-Herzegovina and Montenegro’. Usage is most diverse in Bosnia-Herzegovina, probably because its more mixed population contains competing influences from both Croatia and Serbia.
The authors also:
- comment that they observed a considerable degree of clustering between different variables. The researchers found no plausible linguistic explanations for the clusters of variables.
- caution that their results depend heavily on the particular variables they selected.
Another resource
In preparing this post, I came across a site at the University of Graz (Austria) that documents differences between Bosnian, Croatian and Serbian. That site is for the project Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen at
https://www-gewi.uni-graz.at/gralis/projektarium/BKS-Projekt/index.html