Home / Articles / Volume 12 (2015) / Cyber-Latinica: A Comparative Analysis of Latinization in Internet Slavic
Document Actions



The Latin alphabet is used in Slavic languages in non-standard, complex ways. This is especially evidenced in computer-mediated writing. For example, googling the Cyrillic-lettered nonsense phrase кгыылшн нфяшл kgyylshn nfyashl brings up search results referring to the Russian language (русский язык russkiy yazik). The connection between the input and the results becomes transparent if one types the transliterated phrase russkiy yazik in the Google search box using the standard QWERTY keyboard but choosing the 'non-phonetic' Russian layout.

As shown in Figure 1, the nonsense Cyrillic search term кгыылшн нфяшл suggests an alternative search term denoting the Russian language, in Russian Cyrillic. In addition, Google's “intelligent” search engine (a special algorithm built to give results that best fit the query) suggests four phrases in Russian, again all related to the language: русский язык ('Russian language’), русский язык для детей ('Russian for children’), and русский язык в Украине ('Russian in Ukraine’).

Figure 1. Google search results for кгыылшн нфяшл (kgyylshn nfyashl)

What made the developers of the Google search engine take into account such a random letter combination? And why would anyone use a five-letter cluster нфяшл with all but one letter representing consonants? These are some of the questions underpinning the complexities of the non-standard use of the Latin alphabet in Internet Slavic (Magner, 2001). This article addresses these and other related issues concerning the emergence and use of latinized paradigms (sub-systems) in Internet Slavic in languages written in the Cyrillic alphabet (hereafter called Cyrillic-alphabeted), based on the examples of Bulgarian, Macedonian, Russian, and Serbian (BMRS). It argues that, to a greater or lesser extent, these Slavic languages undergo latinization from below – the emergence of grassroots, non-standard Latin-based orthographic conventions – in computer-mediated writing.

In the literature on non-standard writing practices in Slavic languages there is only sparse and sporadic mention of latinization, and when it is mentioned, it is in support of other research objectives (e.g., Angermeyer, 2012; Hentschel, 1998; Magner, 2001; Mironovschi, 2007). To address this gap, the present study pursues two interdependent goals: first, to identify and describe non-standard latinization practices in Slavic; and second, to account for the emergence of the latinized subsystems, defined as sets of distinct, Latin-based orthographic conventions sharing a particular trait, such as the loss of diacritical markings or use of numerals to represent letters. In addition, a comparative perspective on Internet Slavic offers an opportunity to examine possible similarities as well as differences in non-standard online orthographies that cut across mutually-related languages.

This article is organized as follows. After a review of latinization online, including the terminology and corpus of work on the use of the Latin alphabet in Slavic and other non-Latin-alphabeted languages, I present historical and contemporary contexts of alphabet use in BMRS. A structural analysis is conducted to identify and describe the Latin-based subsystems emerging from online writing in the four languages. In a subsequent analysis, in addition to the role of the language and alphabet standardization that is discussed in the background section, factors which give rise to latinization and specific variation patterns in and across these languages are addressed: strategies underlying the non-standard graphemic encodings; orthographic economy in computer-mediated writing; shared alphabet inventory; alphabet and keyboard mappings; and diasporic, linguistic minority, and second language writing practices. The article concludes with suggestions for further research on the topic of latinization in Internet Slavic.

Latinization from Below and Orthographic Subsystems

Latinization refers to the substitution of native, non-Latin-based graphemes with graphemes from the Latin alphabet (which I will hereafter call by the Slavic name Latinica for Slavic) in languages where standard usage prescribes a different writing system. In the case of Serbian, whose standard permits the use of both alphabets,2 the term latinization denotes the preference of Latinica over Cyrillic.

The qualifier from below describes the language use and linguistic practices of mostly anonymous, “non-expert” users, as opposed to solutions imposed and prescribed by “experts,” for instance, in language academies. On this view, these de facto norms most likely will never be sanctioned from above (by the government or “experts”), but they are nevertheless recognized in certain domains, such as computer-mediated communication (CMC), as accepted and stabilized orthographic forms. In this article, the emphasis is on the exploration of bottom-up linguistic processes in what in the context of computer-mediated interaction has been described as “grass-roots standardization” (Dauber & Hinrichs, 2007, p. 22), “spontaneous spelling reform” (Jensen, 1995), or even an “anti-standard” (Sebba, 1998). The results of such processes are orthographic subsystems of varying degrees of autonomy.

An orthographic subsystem is defined as "a set of independent or interdependent orthographic variants, associated by a convention of use and/or by the presence of a common graphemic trait, such as the loss of diacritics (de-diacriticization)" (Ivković, 2013, p. 342). An orthographic subsystem is a system itself and a component of a suprasystem. In the context of the present article, both standard and non-standard conventionalized sets of orthographic paradigms constitute subsystems within the total Latin-alphabeted paradigm/suprasystem associated with a particular language. In principle, whereas standard orthographic subsystems are closed, autonomous, and complete, non-standard subsystems are open, more or less autonomous, and incomplete. The non-standard subsystems thus require, and allow, inter-systemic “mixing and matching” of graphemic representations dynamically, on the fly.

Computer-Mediated Latinization

In the pre-Internet age, latinization was confined to transcription and transliteration of texts written in non-Latin-based scripts. For example, the Latin-based orthography for Mandarin, pinyin, was formally adopted in 1958 in the People's Republic of China. It was devised for pedagogical reasons (Coulmas, 2003), to introduce the novice student of Mandarin, native and non-native speakers alike, to the complex Chinese logographic system (Rogers, 2005), as well as for other purposes, including library cataloguing and lexical glossing. For other non-Latin-based scripts, international transliteration standard encodings into Latin characters were introduced with similar functions in mind. Rosowsky (2010) illustrates how Latin-based transcription systems are used by Arabic, Panjabi, and Urdu speakers in the UK as a pedagogical 'shortcut' for learning the classical non-Latin scripts.

These standards are known by their numerical names for individual languages, suggesting that these fully functional and autonomous writing systems were created for special purposes rather than for everyday communication or as official alphabets. For example, the International Standard (ISO) 233 is the standard introduced for transliteration of Arabic characters into the Latin alphabet, ISO 259 for Hebrew, ISO 843 for Greek, and ISO 9 for transliteration of Cyrillic characters into Latin equivalents. What differentiates these orthographic systems from the writing patterns in new media is the source of the conventions (from above) and orthographic regularity.

The practice of using the Latin alphabet in place of non-Latin-based systems is characteristic and emblematic of language representation in digital discourses (including SMS messages and Internet discussion boards), taking place primarily at the grassroots level (from below). Latinization in computer-mediated writing has been documented in various languages, including Arabic (e.g., Jarbou & al-Share, 2012; Palfreyman & Khalil, 2007), Cantonese (Lee, 2007), Greek (e.g., Androutsopoulos, 2007), and Mandarin (e.g., Dai, 2009; Yang, 2007), to mention a few. Unlike the standardized Latin-based transcription systems (which, similar to the standard orthographies in their respective languages, require a strict adherence to the prescribed conventions in representing speech), latinization online is associated with more or less spontaneous, idiosyncratic, and heteroglossic practices.

The non-standard online orthographic forms exist alongside the official standard orthographies. One of the best-researched cases of the non-standard use of the Latin alphabet is 'Greeklish' or latinized Greek (Androutsopoulos, 2007, 2009; Tseliga, 2007). The use of Latin-alphabeted Greek is "restricted to contexts of computer mediated interaction" (Androutsopoulos, 2007, p. 227), but 'Greeklish' is also used by the Greek Diaspora, including in offline writing, as well as on road signs and in passports, for the purpose of transnational communication. 'Greeklish' draws on the Latin alphabet inventory without diacritical markings, both in online and offline writing.

Androutsopoulos (2009) identified two main transliterations schemes for Greeklish: phonetic (based on correspondences between Greek phonemes and Latin graphemes) and orthographic, which he reconstructs "on the basis of inductive generalizations and by taking into account users' metalinguistic awareness" (p. 231). Within the orthographic scheme, he further identifies two sub-schemes: the keyboard-based scheme, whereby users type on the keyboard as though typing in Greek letters, and the visual scheme, which "aims at simulating the shape of Greek letters with Latin characters as closely as possible" (p. 232). Notable examples of the visual sub-scheme in Greek are: <w> for <ω>, numerals <8>, <0>, <9> for <θ>, and <3> for <ξ>. Similarly, in Arabic, the letter <Í> /H/ is represented as the numeral <7> because of the similarity in shape between the two (Palfreyman & al Khalil, 2003).

In her study of SMS compliments in Russian, Mironovschi (2007) found that among 187 SMS messages only 28 messages were written in Cyrillic. Some of the orthographic conventions she identified from her sample are the use of the graphemes <w> and <q> in place of the Cyrillic <ш> /ʂ/ and <я> /ya/, respectively, as well as the use of the numerals <4> and <6> in place of <ч> // and <ш> /ʂ/. These conventions, although predominantly occurring in online writing, are also documented in offline discourses. Angermeyer (2012), for example, reports the use of the numerals <4>, <6> and <3>, the last one replacing the Cyrillic letter <з> /z/, on the licence plates in the USA of obviously Russian-speaking owners, such as HE TPE3B не трезв ne trezv 'not sober' (p. 266).

Hentschel (1998) notes that in online discussion forums both Russian and Serbian posters often substitute letters in the Russian and Serbian alphabets with transcription conventions in the Latin alphabet. Referring to these forms as Internet Slavic, Magner (2001) claims that in the Slavic languages "all diacritics are eliminated and, if Cyrillic is involved, transliteration without diacritics also occurs" (p. 24). With regard to online writing practices in Serbian, Ivković (2013) argues that unlike in Russian and Greek, the use of Latinica in Serbian is rather "an alternative orthographic practice, since in this language Latinica is already one of two officially recognized scripts” (p. 337). In his empirical study of alphabet choice and use of non-standard variants on two Serbian news websites, Ivković observes two interconnected processes: the dominance of Latinica over Cyrillic and the stabilization of non-standard Latin variants.

The emergence and stabilization of certain orthographic forms and patterns is not random, but rather is motivated by the specifics of their historical development, including language and orthography standardizations. The following section examines the historical development of Cyrillic orthographies in BMRS.

A Comparative Historical Perspective on Orthographies in BMRS

Slavic literacy originated in the Balkans, among the South Slavs. However, it was the North, through the Russian language, that influenced both alphabet choice and the variant of Cyrillic used among many nations in Central Asia and Far East. The power of the Russian-led empires, including the Soviet state, imposed the Russian version of Cyrillic on numerous languages of the former Soviet Union, as well as on some languages spoken predominantly outside the Soviet state, such as Mongolian. It is hard to determine the influence of Russian orthographic practices on Bulgarian, however, since the latter already used Cyrillic, and Slavic literacy originated in Bulgaria itself. It is more likely that both Russian and Bulgarian orthographic practices were mutually reinforcing and interdependent. As a consequence, the Cyrillic inventory of Old Church Slavonic1 is considerably better preserved in Russian and Bulgarian Cyrillic than, for example, in Serbian Cyrillic (see Figure 2), Serbian being in closer sociolinguistic contact with, and a borrower from, the Latin-alphabeted languages German and Hungarian, especially in northern Serbia.

Serbian is today a clear example of a European language exhibiting synchronic digraphia (the use of two different scripts by the same speech/writing community (Dale, 1980; De Francis, 1984; Grivelet, 2001; Zima, 1974). The Cyrillic alphabet (ћирилица) is the preferred choice of the government (Ivić, 2001, p. 11), while Latinica, or Serbian written in the Latin alphabet, prevails in colloquial writing, especially on the Internet. Modern Serbian orthography follows the phonological principle of 'one phoneme, one grapheme.' The Serbian Cyrillic inventory also became the basis for the Macedonian alphabet system, with both languages (with some exceptions) following the motto 'write as you speak, and speak as it is written.'

The history of literary Macedonian can be traced in the latter part of the 18th century. However, it was not until 1944 that standardization of the literary language took place, based on the West Central region of Vardar Macedonia, which was then part of Yugoslavia (Friedman, 1993). Although structurally closer to Bulgarian than Serbian, standard Macedonian shares a considerable lexical pool with Serbian; also, with minor exceptions, it draws on the Serbian Cyrillic inventory. As Serbo-Croatian was the lingua franca of the former Yugoslavia, many Macedonians are still bilingual with Serbian. However, the extent of active bilingualism with Serbian most likely has decreased since the collapse of Yugoslavia in the 1990s. Conversely, the influence of Albanian in Macedonia has grown stronger due to a number of factors, including the improved status of the Latin-alphabeted Albanian language as a language in all levels of education, a working language of the national parliament, and an official and dominant language in some municipalities, especially in western Macedonia (e.g., Tetovo, Gostivar).

Research Questions

The results reported in a previous study of latinization and digraphia in Internet Serbian (Ivković, 2013) serve as the basis for the research questions examined in the present study. In his study of latinization and orthographic variation on two Serbian news websites, the bi-alphabetic Politika Online and the 'Latin-only' B92, Ivković found that the use of Latinica markedly exceeded the use of Cyrillic on the two websites (especially on B92), and that de-diacriticized Latinica was used more than standard Latin-based Serbian orthography. The present study expands on these findings by adopting a comparative perspective to account for the emergence of subsystems, as well as the similarities and differences in Internet writing across the other three Slavic languages: Bulgarian, Macedonian, and Russian. Thus the research questions that guide this study are:

RQ1: What type of Latin-alphabeted variations do the other three Slavic languages – Bulgarian, Macedonian, and Russian – exhibit in Internet writing?

RQ2: What factors drive the stabilization of ad hoc orthographic practices in Internet Slavic, resulting in the emergence of non-standard subsystems and similarities in writing on the Internet among specific languages?

Data and Methodology

To address the research questions, data (threads, comments, and sentences written in the Latin alphabet) were collected from YouTube, Facebook, and discussion forums in online news sites between January 2010 and March 2011. No algorithm for random data selection was used, as the study is not concerned with statistical measurements. Instead, for instance, YouTube commentaries that were likely to elicit the use of a particular language, such as a video posting of a Bulgarian, Macedonian, Russian or Serbian song, or commentaries on Macedonian and Serbian news websites (e.g., Nova Makedonija and B92) were targeted. The list of orthographic variants examined in the present study is not (and can not be) exhaustive, because a potential poster is free to express herself/himself by selecting various resources, although informal observation suggests that the variants account for the vast majority of instances of latinized letters found on the Internet in the four languages.

To describe and explain the motivations and functions of online latinization variation and emerging patterns in the four Slavic Cyrillic-alphabeted languages, the letters and letter clusters displaying variability were identified, sorted, and classified based on a common classifying principle, e.g., de-diacriticization (loss of diacritical markings) or numeration (substitution of alphabetic characters with numerals). For each language, a set of variants was selected for analysis. The variants were then compared and categorized into language-specific subsystems, such as LB1 (non-standard Latinica simple) for latinized Bulgarian, as described below.

Non-standard Orthographic Subsystems in Internet Bulgarian, Macedonian, Russian, and Serbian

The analysis of online orthographies in Serbian and Macedonian identified two main types of variation: de-diacriticization and transliteration. In these languages, the Cyrillic alphabet is used exclusively according to standard prescriptive rules (Table 1, SC; Table 2, MC), but the data in the Latin alphabet can be categorized into standard diacriticized Latin, only in Serbian (SL), as well as into several non-standard de-diacriticized latinized subsystems: non-standard Latinica simple (SL1, ML2), non-standard Latinica composite (SL2, ML3), and non-standard Albanized Latinica (MA). The subsystems are based on the sibilants /ʃ/ and /ʒ/ and the affricates /tʃ/ and / d͡ʒ/ in Serbian and /tɕ/ and /dʑ/ /c/ and / ɟ/ in Macedonian (see Appendix). These phonemes correspond to the graphemes that have diacritics in SL (Table 1).


Table 1 shows graphemic correspondences in the Internet Serbian subsystems.

Table 1. Orthographic subsystems in Internet Serbian (S)

Two non-standard subsystems were identified for Serbian: Serbian Latinica simple, without diacritics (SL1), and Latinica composite (SL2), alongside standard Serbian Cyrillic (SC) and standard Latinica with diacritics (SL).2 SL1 and SL2 are illustrated in Examples 1 and 2, respectively (taken from B92). In the examples that follow, the non-standard subsystems in each of the languages, such as SL1 in Serbian, are illustrated with sentences taken from the original sources, followed by analogous sentences in the standard version in each language (i.e., BC, MC, RC, SL). The bialphabetic Serbian is here represented with the Latin alphabet.


Jedino mi nije jasno zasto, nazalost, neki od nasih trenera, koji su uzgred najbolji ucitelji kosarke u Evropi nikad se nisu oprobali u Americi. (SL1)

Jedino mi nije jasno zašto, nažalost , neki od naših trenera, koji su uzgred najbolji učitelji košarke u Evropi nikad se nisu oprobali u Americi. (SL)

'The only thing I don't understand is why some of our trainers, who are, by the way, the best basketball instructors in Europe, have never tested their skills in America.'


Na republichkim izborima mogu svi da glasaju, a na opshtinskim samo ljudi iz te opshtine. (SL2)

Na republičkim izborima mogu svi da glasaju, a na opštinskim samo ljudi iz te opštine. (SL)

'Everyone can vote in state elections, but in municipal elections, only people from that municipality can vote.'

SL1 forms (Example 1) are characterized by the loss of diacritics (de-diacriticization), whereby the graphemes <č>/<ć> and <š> from the standard Serbian Latin inventory are represented by <c> (note the graphemic syncretism) and <s>, respectively. SL2 forms (Example 2) are characterized by one-to-many phoneme-to-grapheme relations, whereby the following substitutions take place: <č>/<ć> =><ch>, <š>=><sh>, <ž>=><zh>, and <đ>=><dj> (see Ivković, 2013).


Table 2 shows graphemic correspondences in the Internet Macedonian subsystems. Four non-standard Latinized subsystems were identified: Macedonian Latinica with diacritics or an apostrophe (ML1), Latinica simple without diacritics (ML2), Latinica composite (ML3), and Albanized (Macedonian) Latinica (MA), besides the standard Macedonian Cyrillic subsystem (MC).

Table 2. Orthographic subsystems in Internet Macedonian (M)

These subsystems are illustrated in the following four examples, taken from YouTube (Example 3), Facebook (Example 4), and Nova Makedonija (Examples 5 and 6).


Uživajte so našiot najsakan Toše i so edna od najdobrite njegovi baladi. (ML1)

Уживајте со нашиот најсакан Тоше и со една од најдобрите његови балади. (MC)

'Enjoy our beloved Toshe [a late, popular Macedonian singer] and one of his best ballads.'


Мnogu mi se dopaga pesnata, ama frizurata i e stvarno mnogu losa i udara u oci. Toa e dobronameren komentar za nea, ako ne kazeme tuka, kako ke i kazeme? (ML2)

Многу ми се допаѓа песната, ама фризурата е стварно многу лоша и удара у очи. Тоа е добронамерен коментар за неа, ако не кажеме тука, како ќе и кажеме. (MC)

'I like the song very much, but her hairstyle is really very bad and strikes the eyes. This is a well-meaning commentary. If we don't say it here, how can we tell her that?'


Vo toa vreme ushte mozheshe da se najdat chesni i poshteni lue. (ML3)

Во тоа време уште можеше да се најдат чесни и поштени луѓе. (MC)

'One could still find honest people then.'


No i jas sum gazda vo mojata kuqa.Ti me kanish jas ne doagjam i reshen problem. Dosta beshe veqe. Koga makedonija beshe vo bllokada od juzhniot sosed, togash mu se najdoa vo llosho istocniot i zapadniot sosed. (MA)

Но и јас сум газда во мојата куќа. Ти ме каниш јас не доаѓам и решен проблем. Доста беше веќе. Кога Македонија беше во блокада од јужниот сосед, тогаш му се најдоа во лошо источниот и западниот сосед. (MC)

'But I am also the boss in my house. You invite me, and I don't come, and the problem solved. Enough already. When Macedonia was under blockade by its southern neighbour, its eastern and western neighbours were there to help.'

Example 3 illustrates the ML1 subsystem, characterized by the presence of diacritics used in standard Serbian Latinica (SL). ML2 (Example 4) forms are characterized by the loss of diacritics. The de-diacriticization can be explained relative to the Latin standard subsystem in SL (e.g., <č>, <š>, <ž>), which serves as an intermediary subsystem underlying this strategy (see De-diacriticization below).

Figure 2, a Macedonian advertisement for mobile phones, is an illustration of the “encroachment” of Cyber-Latinica in the Macedonian linguistic landscape (Kramer et al., 2014). The same message is presented with slight variations on two signs on a storefront, most likely set up at different times, with ML2 on top and ML1 on the bottom.

Top sign, ML2: Servisiranje, Dekodiranje i Prodazba na site vidovi na mobilni telefoni i moblina galanterija / MOZNI I ZAMENI od stari za novi mobilni ili otkup na novi i stari telefoni

Bottom sign, ML1: Servisiranje, Dekodiranje I Prodažba na site vidovi na mobilni telefoni I mobilna galanterija / MOŽNI I ZAMENI na stari za novi mobilni ili otkup na novi I stari telefoni

‘Servicing, docoding and sale of all types of mobile phones and mobile devices / ALSO POSSIBLE EXCHANGES of older mobiles for new ones or purchase of new and older phones’

Figure 2. < z > on top and < ž > on the bottom: Cyber-Latinica in a store advertisement

On these signs, the Macedonian word for ‘sale’ (продажба) is spelled as both prodažba and prodazba, and ‘possible’ (можни) is spelled as both možni and mozni, using the conventions from the ML1 and ML2 subsystems, respectively. Another feature, and index, of computer-mediated digraphia in Macedonian is capitalization of the Macedonian conjunction i (‘and’) in the sign on the bottom, probably as a result of the ‘autocorrect’ feature in the (English-language based) word processor used to create the text, resulting in automatic conversion of the Macedonian conjunction, in latinized Macedonian written as i, into the first person English pronoun written with a capital I. In contrast, the non-standard capitalization of the first letters of the words for ‘servicing,’ ‘decoding,’ and ‘sale’ on both signs is most likely intentional and used for emphasis.

ML3 (Example 5) shares the variants with the Serbian Composite subsystem (SL2), except for the digraphs <kj> and <gj>, which represent the palatalized phonemes /c/ and / ɟ/, which do not exist in standard Serbian. These phonemes may also be represented with an apostrophe after the grapheme (e.g., srek'en). MA (Example 6) occurs only in the online writings of ethnic Albanians in Macedonia. This subsystem is characterized by the following encodings: 1) substitution of the Macedonian Cyrillic letter <ќ> by the Albanian Latin letter <q>; <ќ> and <q> representing the voiceless palatal plosive /c/ in the respective languages; 2) de-diacriticization of <ç> /tʃ/, producing <c> for <ч>; and 3) double <ll>, the digraph being used inconsistently, but nevertheless reflecting the distinction between the velarized (<ll>, /ɫ/) and non-velarized (<l>, /l/) (hard and soft /l/),3 following the rules of Albanian phonology


In Bulgarian and Russian, the subsystems are also based on the sibilants and affricates, as well as on cases where a single grapheme represents two phonemes (e.g., <я> in Bulgarian and Russian) or where a grapheme does not represent an individual phoneme, but has another function, such as <ь> to indicate palatalization of the preceding consonant in Russian (see Appendix).

Table 3 shows graphemic correspondences in the Internet Bulgarian subsystems. A structural analysis of the Bulgarian sample identified four non-standard latinized subsystems: Bulgarian Latinica simple (BL1), Bulgarian Latinica composite (BL2), Bulgarian Latinica numerized (RL3), and Bulgarian Latinica iconized (RL4), besides the standard Bulgarian Cyrillic subsystem (BC).

Table 3. Orthographic subsystems in Internet Bulgarian (B)

Examples 7 and 8, taken from YouTube and Facebook, respectively, illustrate the Bulgarian subsystems.


Kato dojde (BL1) avgust, vsichki (BL2) balgari (BL1) v strznstvo se zavrashtat (BL2), a pak mestnite zach (BL2) ezvat po vakanzii (BL1).

Като дойде август, всички българи в стрзнство се завращат, а пак местните зачезват по ваканции. (BC)

'When August comes, all the Bulgarians who live abroad return, and the local population goes on vacation.'


Chestit rojden den :) Jiva (BL1) i zdrava, Uspeh vav (BL2) vsi4ko (BL3, BL4), lubov, 6tastie (BL3), kasmet (BL1),, neveroqtni (BL4) momenti sas (BL1) semeistvoto (BL1)

Честит рожден ден. Жива и здрава. Успех във всичко, лубов, щастие, късмет, невероятни моменти със семейството. (BC)

'Happy birthday. Health and long life. Success in everyting, love, happyness, luck, wonderful moments with your family.'

In BL1, the graphemes representing the phonemes /ʒ/, /ts/, and /v/, unlike in Serbian and Macedonian, are borrowed from orthographic conventions in some major European Latin-alphabeted languages: from English, <y> for /j/; from French, <j> and <g> for /ʒ/; and from German, <w> for /v/ and <z> for /ts/. What is characteristic of Internet Bulgarian and Russian is the use of numerals <4> and <6> (BL3 and RL3) to represent sounds in the respective languages. In addition, the numeral <4> and the grapheme <q> visually resemble their Cyrillic counterparts, the graphemes <ч> and <я>.


The structural analysis of the Russian sample identified four non-standard Latinized subsystems: Russian Latinica simple (RL1), Russian Latinica composite (RL2), Russian Latinica numerized (RL3), and Russian Latinica iconized (RL4), besides the standard Russian Cyrillic subsystem (RC). Table 4 shows graphemic correspondences in the Internet Russian subsystems.

Table 4. Orthographic subsystems in Internet Russian (R)

These subsystems are illustrated in Examples 9-11 (from YouTube).


A vot chitat' (RL2, RL1) kommentarii dazhe (RL2) ne hochetsya (RL2). Kakoe otnoshenie (RL2) natsional'nost (RL2, RL1) imeet k TALANTU? Kak-budto vse russkie ili amerikantsy (RL2) talantlivy (RL1)?

А вот читать комментарии даже не хочеться. Какое отношение национальность имеет к ТАЛАНТУ? Как-будто все русские или американцы талантливы? (RC)

'Well, I don't feel like reading the commentaries. What does nationality have to do with talent? As if all Russians or Americans were talented?'


Nu da dlja (RL2) kakih detey (RL1)? etot dima bilan (RL1) ne stoit daje (RL1) bit pevtzom (RL1)! 4to (RL3,4) ti govorish (RL2)? Mojet (RL1) s greciyey sravnivat’ ne nado no tam bili namnogo lu4we (RL3) pevtzi ! Ta daje germaniya byla lu4she (RL4)!

Ну да для каких дете й ? Этот Дима Билан не стоит даже быт певцом! Что ты говоришь Может с Грецие сравнивать не надо то там были намного лучше певцы. Та даже Германия была лучше! (RC)

'Well, for what kind of children? This Dima Bilan shouldn't be a singer! What are you talking about? Perhaps, one shouldn't compare with Greece, but there were much better singers there. Even Germany was better.'


U teb9 ma6ina (RL3) kakaq (RL4)?

У тебя машина какая? (RC)

'What kind of car do you have?'

Russian displays patterns of latinization similar to Bulgarian in all four subsystems. Both languages are characterized by the use of numerals (e.g., <4> for <ч>) and graphemes from the Latin alphabet that bear some visual resemblance to their Cyrillic counterparts, such as <w> for <ш> and <q> for <я>. The major distinction in online latinization between Bulgarian and Russian is the use of an apostrophe in Russian, which in this language serves to represent palatalized consonants (e.g., sravnivat') (RL1). In Russian, unlike in Bulgarian, the ‘hard-soft’ (palatalized-non-palatalized) distinction is systematic, with most consonants coming in ‘hard-soft’ pairs.

Non-standard subsystems are open, unlike standard subsystems, which, as a rule, do not permit variations and are closed systems. The former are mostly unregulated and therefore susceptible to changes, variations, idiosyncrasies, inconsistencies, and hybridization (mixing of two or more subsystems). The non-standard subsystems tend to mix with other subsystems, especially – as shown in Example 8 in Bulgarian and Example 10 in Russian – with the 'numerized' and 'iconized' subsystems. These are the least stable, as they cover only a small number of graphemic substitutions and thefore require complementation from other subsystems. In contrast, some subsystems are relatively stable, such as the loss of diacritics in Serbian and Macedonian, since they cover larger sets of phonemes and are based on a single principle.

Strategies Underlying Orthographic Encodings

Emergent orthographic encodings are in principle motivated and not random, resulting from the interplay of strategies of varying degrees of complexity with various pragmatic and social factors. The strategies identified in this section are described and explained based on introspection (cf. Androutsopoulos, 2009, p. 241, on the role of introspection in describing visual schemes in 'Greeklish'). However, in some cases, material evidence, such as the loss of diacritics or iconic correspondence between referents, is adduced to support the analysis. The orthographic encodings4 identified in the sample of online texts considered in the present study are de-diacriticization, transliteration (anglicization, francization, germanization, italianization), numeration, and iconization.

De-diacriticization refers to the loss of diacritical markings while the base grapheme is preserved. De-diacriticization may result in graphemic syntagmatic syncretism, whereby a single grapheme comes to stand for two or more phonemes within the same subsystem. For example, the loss of the haček (caron) <č> and the kreska (acute) in <ć> in Serbian results in the phonological overloading of the grapheme <c> in SL1, with the grapheme <c> representing two additional phonemes, /tʃ/ and /tɕ/, besides the phoneme /ts/. De-diacriticization is characteristic of Serbian and Macedonian, as shown in the following two examples from these languages:

De-diacriticization of <č>/<ч> /tʃ/

(12) SL1: <c> <= De-diacriticize (č); covek ЧОВЕК/ČOVEK 'man'

(13) ML1: <c>(č) <= De-diacriticize (SerbCroLatinize(ч)); pocetok ПОЧЕТОК/POČETOK 'beginning'

Whereas in the bi-alphabetic Serbian (SL1), the de-diacriticization of <č> is direct, in Cyrillic-alphabeted Macedonian, the change involves an intermediate step of non-standard latinization (marked with an asterisk) drawing from the Serbian Latinica inventory, followed by de-diacriticization. Another example of mediated de-diacriticization is the loss of the cédille in <ç> in MA (the Albanized Macedonian subsystem), as shown in (14):

(14) MA: <c>(ç) <= De-diacriticize (AlbLatinize(ч)); istrocniot ИСТОЧНИОТ/ISTOÇNIOT 'eastern'

Transliteration draws on the existing alphabet inventories of major Latin-alphabeted languages, primarily English, but also French, German, and Italian. More specifically, transliteration is here referred to as anglicization, francization, germanization, and italianization, denoting "a stabilization of transliteration norms drawing on the orthographic conventions of [the respective language]" (Ivković, 2013, p. 342), as shown in the following examples:

Anglicization /ʃ/

(15) RL2: <sh> <= Anglicize(ш); e.g., dumaesh ДУМАЕШЬ 'you think'

Francization /ʒ/

(16) RL1: <j> <= Francisize(ж); e.g., mojet МОЖЕТ 'can' 3rd pers. sing.

(17) BL1: <g> <= Francisize(ж); e.g., Snege СНЕЖЕ 'female name'

Germanization/italianization /ts/

(18) BL1: <z> <= Germаnize/Italianize(ц); e.g., vakanzia ВАКАНЦИЯ 'vacation'

Anglicization mostly follows the Russian version of the ISO 9 standard for Cyrillic-based languages (that is, GOST 7.79 System B), whereby one Cyrillic character corresponds to one or more Latin characters without diacritics (15).5 Where the keyboards do not support diacritics, the English orthography is 'ideal' since the English alphabet has no letters with diacritics. Transliteration schemes based on orthographic practices in other major European languages – French (16, 17), German, and Italian (18) – also occur, but are sporadic and less systematic. Certain conventions, such as the use of <w> for /v/ and <ch> for /ʃ/, are indexical of diasporic writing and are influenced by the conventions of the language from the adoptive country (e.g., German and French in Germany and France) (see Bilingualism).

Iconization results in an iconic relationship being established between the underlying referent (i.e., the Cyrillic letter) and the resulting referent (i.e., a numeral or a letter from another orthographic system) via visual schemes. Describing the numerical substitution of graphemes in digitally mediated writing in Arabic as ASCII-zation, Palfreyman and al Khalil (2007, p. 53) note that in this language, "visual resemblance is clearer in some cases than in others, and in some cases involves mirror-image reversal of all part of the symbol." Iconization is present in Internet Bulgarian and Russian, as shown in (19) and (20):

(19) BL4, RL4: <q> <= Iconize(я); e.g., priqtelka ПРИЯТЕЛКА ('female friend' in Bulgarian)

(20) RL4: <w> <= Iconize(ш); e.g., mawina МАШИНА ('car' in Russian)

The graphical resemblance between Latin <w>, <q>, and Cyrillic <ш>, <я>, respectively, is somewhat sketchy, lacking all the details, but preserving key features. The letters <q> and <я> have a circle in common (that is on the same side of the main vertical), whereas the letters <w> and <ш> (in the cursive form represented as <ш>) share three vertical bars with connections at the bottom, which seems to be enough to approximate (that is, allow users to code and decode with a relatively high degree of accuracy) their correspondences even in the absence of context.

Numeration is a form of numeric graphization, or encoding of graphemes into numerals based on visual resemblances. The numerals thus acquire alphabetic functions and lose the numerical functions of quantification and ordering. Systematic numeration takes place only in Russian and Bulgarian (21, 22). Numeration may be coupled with another supporting function, such as digitization, as shown in (23):

(21) BL3, RL3: <6> <= Numerize(шесть); e.g., ma6ina МАШИНА ('car' in Russian)

(22) BL3, RL3: <4> <= Numerize(4 <= Digitize(Iconize(ч)); e.g., zna4i ЗНАЧИ ('mean, signify' in Bulgarian)

In Slavic, the word for the numeral 6 (шесть in Russian, шест in Bulgarian, Macedonian, and Serbian) begins with the phoneme represented by the letter <ш>. Similarly, the word for the numeral 4 (четыре in Russian, четири in Bulgarian, Macedonian, and Serbian) begins with the phoneme // represented by the letter <ч>. The numeral <4> also has an iconic relationship with the Cyrillic letter <ч> in its non-digital variant. In the case of substitution of the letter <ч> with the numeral <4> in Bulgarian and Russian, an additional strategy, here termed digitization, takes place, as demonstrated in (23).

(23) RL3, RL4: '4to'

1. (ч)to <= Latinize(что) ('what') ; cf. Bulgarian 'pove4e' ПОВЕЧЕ ('more')

2. <= Numerize(ч),

3. <4> <=Digitize()

4. <4to> <=Append(4, to)

The letter <ч> in что (Russian for 'what') undergoes a number of intermediary changes: from the letter <ч> to the almost identical visual representation of the numeral 'four' written manually (non-digitally), that is, as <>, to its digital version, that is, as <4>. The underlying phonological value in the word что for the letter <ч>, however, is not // but /ʂ/,6 which may serve as evidence that in Russian and Bulgarian online writers rely primarily on visual schemes in latinization. Note that the relationship between the numeral and the grapheme <ч>is iconic.

In contrast, in Serbian and Macedonian, where the spelling-sound correspondence is direct, latinization schemes are tied to phonological representations. In a set of five experiments conducted among Serbian-language skilled readers of both alphabets on the role of word-level recognition and visual transference from one alphabet to another, Kumar (2008, p. 313) concluded that "skilled readers do not appear to rely on a style of analysis that is primarily tied to the visual form of a word." Rather, recognition is on the level of phonological representations, where the mapping is one-to-one. Internet Serbian, in contrast, tends to eliminate "redundant" diacritical markings that otherwise exist in standard Serbian Latinica, following the principle of orthographic economy in computer-mediated writing.

The subsystemic switching among variants may also occur and is motivated by ad hoc pragmatic concerns at the word-, sentence-, or discourse level. As shown in Example 10, the letter <ш> in the Russian word лучше ('better') is written both as <sh> and <w> (that is, lu4she and lu4we) by the same poster and in the same paragraph. While these idiosyncracies may not be explainable even by the poster herself/himself, the resulting patterns may be explained by circumstantial factors such as the available technology, alphabet standardization, and delocalized writing communities, as well as general principles of human cognition, such as economy in linguistic production. These pragmatic, technological and social factors are discussed in the following section.

Factors Underlying the Emergence of Variation Patterns

What accounts for the non-standard orthographic variants in BMRS? This section discusses several factors that give rise to the emergence of orthographic subsystems and that also account for interlinguistic similarities between Bulgarian and Russian on the one hand, and Macedonian and Serbian on the other: orthographic economy in computer-mediated writing, shared alphabet inventory, alphabet and keyboard mappings, and societal bilingualism.

Orthographic Economy in Computer-Mediated Writing

Economy is at the heart of language use. Pragmatic considerations favor those forms that align themselves with some function, giving priority to cognitive/mental and/or physical shortcuts. One of these shortcuts is the omission of diacritics where these secondary markings may not be seen as absolutely necessary, especially if the presence of context provides additional information. Given sufficient contextualization cues to avoid possible ambiguity, these arguably redundant forms tend to disappear, if forms are not imposed from above. In this sense, the elimination of diacritics constitutes one of the core features of orthographic economy.

By orthographic economy in computer-mediated writing, I mean achieving maximum effect with minimum cognitive and physical effort in orthographic production (writing). The principle of orthographic economy is in accordance with Grice's Maxim of Quantity (1989, p. 26), which states:

1. Make your contribution as informative as is required (for the current purposes of the exchange); and

2. Do not make your contribution more informative than is required.

Translated from spoken to written language use (specifically, the pragmatics of orthography), the maxim of quantity in computer-mediated writing might read as follows:

1a) Spell/write your words as informatively as is required (for the current purposes of the written exchange); and

2a) Do not use "redundant" orthographic markings (e.g., cédille/cedilha, háček, kreska, hooks, tilda) more than is required, e.g., to avoid ambiguity.

The maxim as applied to CMC can be interpreted as an explanation for typographical "laziness"; at the same time, the Maxim of Quantity has a cognitive basis. Anis (2007), for instance, argues that "the loosening of norms," such as loss of accents in French, is related to "orthographic negligence," which in turn is associated with the "reduction of cognitive resources allocated to spelling" (p. 95). The principle of orthographic economy complies with Zipf's Principle of Least Effort (1949). According to Zipf, people naturally choose the "path of least resistance" that facilitates forward motion by a given object or entity.

At the same time, it is important to note that the diacritics are not entirely redundant, as they serve to differentiate word meanings in some cases. For example, in Internet Serbian, without the kreska in <ć> and contextual cues, the sentence Moja kuca je l(ij)epa ('My house (kuća)/little dog (kuca) is beautiful') is ambiguous. The loss of the diacritic may consequently lead to an increase in ambiguity and higher processing demands. In this case, the role of context is crucial. Even when context is present, the omission of diacritics may represent an additional hindrance when reading, especially for foreign language learners. The overwhelming tendency in online writing to drop diacritics (see Ivković, 2013), however, supports the claim that diacritics carry a relatively light information load. As a consequence, they can be omitted without much loss of comprehensibility.

In addition to the low information load diacritics carry, they also require extra effort to produce. If one's software and hardware are not already adjusted to standard orthographic practices in a particular language, producing diacritics may require using additional keystrokes (e.g., Alt+Shift in Windows), performing extra steps – such as installing and selecting/changing a language, searching for language-specific unmarked Cyrillic graphemes on a Latin-based keyboard (including punctuation symbols) (see Tables 3 and 4) – and switching between languages on the keyboard. These are all possible deterrents to their use.

While the above Latinization strategies and the suggested Zipfian principles of orthographic economy in CMC may account for the emergence of specific subsystems across languages (cross-linguistic similarities) in general, shared alphabet inventory, alphabet and keyboard mapping as well as societal bilingualism (diasporic, linguistic minority and second language writers) help to explain the inter-linguistic similarities in online writing between Bulgarian and Russian on the one hand, and Macedonian and Serbian on the other.

Shared Alphabet Inventory

It is no coincidence that Macedonian displays non-standard latinization patterns similar to those in Serbian, while Bulgarian compares to Russian, since these language pairs also share similar alphabet inventories (Figure 3). When selecting novel, non-standard graphemic representations, online writers draw from existing, standard conventions and make necessary accommodations in the digital context. In Bulgarian, Macedonian, and Russian, standard writing requires the use of Cyrillic, whereas in Serbian both alphabets are permitted. However, Bulgarian uses a similar version of Cyrillic as Russian, whereas Macedonian uses Serbian Cyrillic, with some modifications to accommodate Macedonian phonology. Figure 3 shows the total number of 42 graphemes used in BMRS Cyrillic (for phoneme-to-grapheme correspondences in BMRS, see the Appendix).

Figure 3. A Venn synchronic view of the Cyrillic inventory in BMRS

In addition to a shared class of 24 graphemes, shown in the central circle, there are six language-specific circles: 1) graphemes common to Bulgarian and Russian only, with the same phonological referents (<я>, <ю>, <й>, <ь>); 2) graphemes common to Bulgarian and Russian only, with different phonological referents (<щ>, <ъ>); 3) graphemes specific to Russian (<э>, <ы>, <ё>); 4) graphemes common to Serbian and Macedonian only (<љ>, <њ>, <ј>, <џ>); 5) graphemes specific to Serbian (<ђ>, <ћ>); and 6) graphemes specific to Macedonian (<ѓ>, <ќ>, <s>).

While Bulgarian and Russian draw on the Cyrillic inventory of Old Church Slavonic, Serbian and Macedonian use a number of graphemes either directly from, or that are influenced by, the Latin alphabet. Based on the Serbian version of the Cyrillic alphabet, the Macedonian standard adopted the same letter, and also introduced the letter <s> from the Latin alphabet, albeit for the phoneme /d͡z/, not /s/ (see Appendix). In addition, to indicate the 'softness' of the palatalized consonants, the Macedonian standard introduced the superscript line (kreska or 'little stroke' in Polish) (e.g., <ѓ>), which is the marker of 'softness' in Polish Latinica (e.g., <ć>, <ń>, <ś>, <ź>). The kreska is also used in Serbian Latinica in the letter <ć>.

Alphabet and Keyboard Mappings

Alphabet-to-alphabet (in the bi-alphabetic Serbian) and keyboard mapping (distribution of graphemic representations of phonemes on the keyboard) also play a role in the selection of particular orthographic resources, resulting in possible inter-linguistic similarities. In modern Serbian, Cyrillic-to-Latinica mapping is one to one, with two exceptions: the ligatures <љ> and <њ> map to the digraphs <lj> and <nj>, respectively. Graphemic representations of phonemes occupy the same position in both the Latin and Cyrillic keyboard layouts. For instance, the keys for the phoneme /p/, represented with <p> in the Latin alphabet and <п> in Cyrillic, occupy the same relative position on the keyboard. This is in accordance with the principle of orthographic economy, which relies on the minimum of cognitive and physical effort: The keyboard user needs to be familiar with only one layout for the two alphabetic inventories.

The Serbian Cyrillic layout, inherited from that of Serbo-Croatian, is the Latin-based phonetic QWERT(Z) layout (the first five letters in the left upper part of the keyboard). This is the most common layout and, as QWERT(Y), it is also used in North America.7 The current Macedonian ordering, which is based on the Serbian keyboard layout, is also phonetic. Table 5 shows the non-phonetic correspondences on the Serbian and Macedonian keyboard layouts for the phonemes that do not exist in the US English-based QWERTY scheme.

Table 5. Keyboard layout correspondences in Serbian and Macedonian QWERTY

In Serbian and Macedonian, in the majority of cases, the correspondence is straightforward and is based on the standard transliteration ISO encodings into the Latin alphabet. However, for a number of Cyrillic letters (<ч>, <ш>, <ж>, <ћ>, <ђ>, <ќ>, <ѓ>, <љ>, <њ>) there are no corresponding keys on the keyboard. Therefore, non-alphabetic keys, such as those representing <;>, <[>, <\>, <'>, and <]> in QWERTY, are used, as well as the letters not used in the standard Serbian Latin alphabet, such as <q> and <w>. These factors make typing prone to guessing and missing the right keys. As shown in Table 5, Serbian Cyrillic, Macedonian, and Serbian Latinica all map to the same QWERTY-based layout.

In contrast, in both the traditional and phonetic Bulgarian and Russian layouts, QWERTY mapping produces four different sets of correspondences (see Table 6).

Table 6. QWERTY correspondences in the Bulgarian and Russian keyboard layouts

Table 6 shows examples of correspondences of the Bulgarian and Russian traditional and phonetic keyboard layouts for the characters mapped against the standard QWERTY layout in Mac OS. In contrast, the so-called phonetic keyboard layout, as in Serbian and Macedonian, follows the conventions of standard Latin transcription. For example, the Cyrillic <а> is mapped to the Latin <a>; <п> /p/ -> <p>; and <г> /g/ -> <g>. In addition, as shown in the greyed-out boxes, some letters that do not have corresponding phonetic representations in the QWERTY layout, such as <q> for <я> in Russian and Bulgarian, <w> for <ш> in Russian, and <y> for <ъ> in Bulgarian, reflect de facto conventions in Internet writing (see Tables 3 and 4).8

Societal Bilingualism: Diasporic, Linguistic Minority, and Second Language Writers

Other important factors that impact script choice are keyboard availability in the country where each language is spoken, and the writing practices – including script choice – of diasporic communities. The latter is influenced by the technological options available abroad and the linguistic repertoires of the diaspora writers. The majority of recent immigrants from the former USSR, former Yugoslavia, and Bulgaria are residents of Canada, Australia, the US, or one of the Western European countries. Thus it is reasonable to expect that they would overwhelmingly use Latin-based keyboard layouts, either in localized versions or the Standard QWERT layout. These writers also bring with them writing practices and linguistic repertoires from the language(s) of their adoptive countries, including orthographic innovations.

In addition, various minority ethno-linguistic groups establish their own conventions, based on the orthography and phonology of their first language (e.g., the Albanized Macedonian subsystem; see Example 6), which is usually restricted to usage within the community. In the Albanized Macedonian case, the specific Latinized forms are constrained to usage in, and therefore indexical of, the Albanian speech community in Macedonia. At the same time, as shown in the examples of de-diacriticization in Macedonian (e.g., Example 5), the orthographic practices in a second language (here Serbian) may be adopted as transliteration models (e.g., pocetok <= početok <=почеток).

Concluding Remarks

This study has presented a systematic account of latinization in three Slavic languages that are natively written in Cyrillic – Bulgarian, Macedonian, and Russian – as well as Serbian, which is bi-alphabetic. With its focus on Slavic languages, the study complements research on latinization in other non-Latin-alphabeted languages.

It was shown that online writers of the four languages employ a range of strategies, resulting in a clear distinction between the non-standard latinization practices in Serbian and Macedonian, on one side, and Bulgarian and Russian, on the other. The main findings can be summarized as follows:

i) The principal latinization pattern in Serbian and Macedonian is de-diacriticization.

ii) De-diacriticization in Macedonian is mediated through the Latinica used in Serbian, and follows the Serbian pattern, with a few exceptions.

iii) The numerals (e.g., <4>, <6>, <9>) and non-phonetically motivated correspondences (e.g., <q> for <я> /ja/, <w> for <ш> /ʃ/ in Bulgarian, and /ʂ/ in Russian) systematically occur in Bulgarian and Russian, but not in Macedonian and Serbian.

iv) Various transliteration conventions that draw from the standard orthographies of major European languages, primarily English, are used in all four languages, especially in Bulgarian and Russian.

v) Russian and Bulgarian writers employ conventions that follow the visual schemes tied to numerals, as well as letters from the Latin alphabet, using a variety of strategies including visualization/iconization and digitization.

vi) These strategies (including de-diacriticization) are guided by principles of orthographic economy in computer-mediated writing derived from Grice's Maxim of Quantity and Zipf's Principle of Least Effort, which, it is suggested, postulate that one should spell/write words as informatively as required by the context, and avoid redundant markings.

vii) Other input parameters that may determine orthographic choices (excluding ideological concerns, which are outside the purview of the study) are shared alphabet inventory, keyboard layouts, and the bilingualism of the writers.

Moreover, the data reveal that the encodings in the Latin alphabet in Bulgarian and Russian are comparable to those in Arabic and Greek (involving, for example, ASCII-ization of graphemes as numerals). Nonetheless, some of these strategies are absent in Serbian and Macedonian. One potential explanation for this divergent orthographic behavior within Internet Slavic is that languages whose speakers/writers use the Latin alphabet to a greater extent and in a variety of quotidian situations (that is, Serbian and Macedonian) are “drawn” to the existing standard orthography in their first or second language, that being Serbian in both cases here. Therefore, in CMC, the writers of Serbian and Macedonian resort to de-diacriticization, limited to the existing Latin-based inventory, and closely following the phonological Latin-to-Cyrillic mappings. The writers of Bulgarian and Russian, in contrast, having no analogous Latin-based standard orthography in their first or second language to draw from, tend to explore other strategies and schemes, such as transliteration and numeration. Online orthographic practices in Arabic (e.g., Jarbou & al-Share, 2012), Cantonese (e.g., Lee, 2007), Greek (e.g., Androutsopoulos, 2007), and Mandarin (e.g., Dai, 2009) support this claim.

A special case of numeration that appears to be unique to Slavic (specifically, in Bulgarian and Russian) is the strategy of representing a phoneme (e.g., in Russian, /ʂ/ in машина /mɐʂˈɨnə/ ‘car’) by the numeral that starts with the phoneme (e.g., /ʂ/ in шесть /ʂˈɛstʲ/ ‘six’). By way of comparison, in English, a numeral may replace a word or morpheme that is pronounced the same as that numeral, as in '2' for English 'to.' This latter strategy does not seem to occur systematically in Internet Slavic, however.

Script and variant choice has little to do with the grammatical structures of languages and genetic relatedness. The analytic language East South Slavic Macedonian, for example, displays online latinization patterns almost identical to those used in the highly inflectional, synthetic Serbian, a West South Slavic language, with which Macedonian shares a common alphabet inventory as well as the status of an official language within the former Yugoslavia. In contrast, Bulgarian, another analytic East South Slavic language, displays online latinization patterns similar to those of Russian, which is an East Slavic, synthetic language. The script and variant choices seem to be primarily motivated by pragmatic concerns, coupled with the specifics of current practices.

Informal observation also suggests that online latinization in BMRS constitutes a continuum, ranging from Serbian to Macedonian, as significantly latinized, to Bulgarian and Russian, as less latinized. This hypothesis could be tested in future research. Such research, with a focus on the sociolinguistics of script and variant choice in BMRS, is recommended in order to paint a more complete picture of latinization in computer-mediated writing in these languages.


I am grateful to two Language@Internet reviewers for their helpful feedback. I am especially grateful to Christina Kramer for her insightful comments in our numerous discussions on writing practices in Russian and South Slavic, and to Susan Herring for her extensive feedback.


  1. Old Church Slavonic is the first Slavic literary language, to this day used as a liturgical language by some Eastern Orthodox churches.

  2. In Ivković (2013), I include the now obsolete Tanjug subsystem, which comes from the “telegraph” era, and is inherited from the transcription practices of the Yugoslav news agency Tanjug, which employed a particular convention, such as <cc> for <č> (pp. 343-344).

  3. I am grateful to Christina Kramer for her observation regarding the representation of the hard and soft l in the writing practices of ethnic Albanians in the Republic of Macedonia.

  4. The strategies are here represented using a function notation comparable to the conventions for mathematical function notation. For example, for Bulgarian Latinica composite (BL2), the notation "BL2: <ch> <= Anglicize(ч)" reads as follows, from right to left: "The function Anglicize() takes the input argument letter <ч>, and returns the value (digraph) <ch>, within the domain of BL2:."

  5. "ГОСТ 7.79-2000: Система стандартов по информации, библиотечному и издательскому делу. Правила транслитерации кирилловского письма латинским алфавитом [GOST 7.79-2000: System of standards on information, librarianship and publishing. Rules of transliteration of Cyrillic script by Latin alphabet]" (in Russian). Authentic Russian version of ISO 9. Retrieved July 23, 2012 from http://gost.ruscable.ru/cgi-bin/catalog/catalog.cgi?i=6464 .

  6. With rare exceptions, in standard Russian, the letter <ч> represents the phoneme //. In a very few words, such as the pronoun чтo 'what,' <ч> is pronounced as /ʂ/.

  7. Retrieved July 21, 2012 from http://en.wikipedia.org/wiki/Keyboard_layout#Keyboard_layouts_for_non-Latin_alphabetic_scripts .

  8. Note that in Bulgarian and Russian, different operating systems display slight variations in layout mappings.


Androutsopoulos, J. (2007). Language choice and code-switching in German-based diasporic web forums. In B. Danet & S. C. Herring (Eds.), The multilingual Internet: Language, culture, and communication online (pp. 340-61). New York: Oxford University Press.

Androutsopoulos, J. (2009). 'Greeklish': Transliteration practice and discourse in a setting of computer-mediated digraphia. In A. Georgakopoulou & M. Silk (Eds.), Standard languages and language standards: Greek, past and present (pp. 221-249). Aldershot, UK: Ashgate.

Angermeyer, P. S. (2012). Bilingualism meets digraphia: Script alternation and script hybridity in Russian-American writing and beyond. In M. Sebba, S. Mahootian, & C. Jonsson (Eds.), Language mixing and code-switching in writing: Approaches to mixed-language written discourse (pp. 255-272). New York: Routledge.

Anis, J. (2007). Neography: Unconventional spelling in French SMS text messages. In B. Danet & S. C. Herring (Eds.), The multilingual Internet: Language, culture, and communication online (pp. 87-115). New York: Oxford University Press.

Coulmas, F. (2003). Writing systems: An introduction to their linguistic analysis. Cambridge, UK: Cambridge University Press.

Dai, X. (2009). Thematic and situational features of Chinese BBS texts. Language@Internet, 6, article 1. Retrieved June 12, 2015 from http://www.languageatinternet.org/articles/2009/2011

Dale, I. R. H. (1980). Digraphia. International Journal of the Sociology of Language, 26, 5-13.

Dauber, D., & Hinrichs L. (2007). Dynamics of orthographic standardization in Jamaican Creole and Nigerian Pidgin. World Englishes, 26(1), 22-47.

De Francis, J. (1984). Digraphia. Word, 35, 325-340.

Friedman, V. (1993). Macedonian. In B. Comrie & G. G. Corbett (Eds.), The Slavonic languages (pp. 249-305). London, UK: Routledge.

Grice, H. P. (1989). Studies in the ways of words. Cambridge, MA: Harvard University Press.

Grivélet, S. (2001). Introduction. International Journal of the Sociology of Language 150, 1-10.

Hentschel, E. (1998). Communication on IRC. Linguistik online, 1. Retrieved June 12, 2015 from http://www.linguistik-online.de/irc.htm

Ivić, P. (2001). Language planning in Serbia today. International Journal of the Sociology of Language, 151, 7-17.

Ivković, D. (2013). Pragmatics meets ideology: Digraphia and non-standard orthographic practices in Serbian online news forums. Journal of Language and Politics, 12(3), 335-356. doi: 10.1075/jlp.12.3.02ivk

Jarbou, S., & al-Share, B. (2012). The effect of dialect and gender on the representation of consonants in Jordanian chat. Language@Internet, 9, article 1. Retrieved June 12, 2015 from http://www.languageatinternet.org/articles/2012/Jarbou

Jensen, J. (1995). Writing Portuguese electronically: Spontaneous spelling reform. Hispania, 78(4), 837-845.

Kramer, C., Ivković, D., & Friedman, V. (2014). Seeing double: Latin and Cyrillic in the linguistic landscape of Macedonia. Proceedings PLLP, Skopje 2014, 14-20 (UDK 81’246.2) Зборник ПЈЈП, Скопје 2014, 14-20. Retrieved June 16, 2015 from www.fon.edu.mk/files/ZBORNIK_lingvistika.pdf

Kumar, A. (2008). Bi-alphabetism and design of reading mechanism. In Visual communication: A media for research and planning (pp. 301-326). New Delhi, India: Global India.

Lee, C. (2007). Linguistic features of email and ICQ instant messages in Hong Kong. In B. Danet & S. C. Herring (Eds.), The multilingual Internet: Language, culture, and communication online (pp. 184-208). New York: Oxford University Press.

Magner, T. (2001). Digraphia in the territories of the Croats and Serbs. International Journal of the Sociology of Language, 150, 11-26.

Mironovschi, L. (2007). Russian SMS compliments. Written Language and Literacy, 10, 53-63.

Palfreyman, D., & Khalil, M. (2007). "A funky language for teenzz to use": Representing Gulf Arabic in instant messaging. In B. Danet & S. C. Herring (Eds.), The multilingual Internet: Language, culture, and communication online (pp. 43-63). New York: Oxford University Press.

Rogers, H. (2005). Writing systems: A linguistic approach. Malden, MA: Blackwell.

Rosowsky, A. (2010). "Writing it in English": Script choices among young multilingual Muslims in the UK. Journal of Multilingual and Multicultural Development, 31, 163-179.

Sebba, M. (1998). Phonology meets ideology: The meaning of orthographic practices in British Creole. Language problems and Language Planning, 22(1), 19-47.

Tseliga, T. (2007). "It's all Greeklish to me!" Linguistic and sociocultural perspectives on Roman-alphabeted Greek in asynchronous computer-mediated communication. In B. Danet & S. C. Herring (Eds.), The multilingual Internet: Language, culture, and communication online (pp. 116-141). New York: Oxford University Press.

Yang, C-H. (2007). Chinese Internet language: A sociolinguistic analysis of adaptations of the Chinese writing system. Language@Internet, 4, article 2. Retrieved June 12, 2015 from http://www.languageatinternet.org/articles/2007/1142

Zima, P. (1974). Digraphia: The case of Hausa. Linguistics, 124, 57-69.

Zipf, G. K. (1949). Human behaviour and the principle of least effort. Cambridge, MA: Addison-Wesley.


Biographical Note

Dejan Ivković [dejan.lj.ivkovic@gmail.com] has a Ph.D. from York University. His research interests include multilingualism, computer-mediated communication, and linguistic landscapes (www.dejanivkovic.com). He is currently a sessional lecturer in the Department of Slavic Languages and Literatures at the University of Toronto and Senior Research Associate in the Faculty of Education at York University.


Any party may pass on this Work by electronic means and make it available for download under the terms and conditions of the Digital Peer Publishing License. The text of the license may be accessed and retrieved at http://www.dipp.nrw.de/lizenzen/dppl/dppl/DPPL_v2_en_06-2004.html.