Issues in the standardization of the Romani language: An overview and some recommendations

Ian Hancock

Te astaras o jekhipe amara čhibake avel o angluno phir karing amaro jekhedžengo khetanipe.
[“To achieve the unity of our   language will be the first step toward achieving our unity as a people.”]

And Yahweh said: “So this is what they can do when all share one language!  There will be no limit on what they can accomplish if they have a mind for it.” [Genesis X]

Since my first monograph on the standardization of Romani which appeared in 1975, the topic has gathered considerable momentum.  In that essay, the opening statement in its historical and social summary maintained that we were one Romani population when our ancestors reached the gates of Europe, and that we spoke one language.  Half a lifetime later, I have changed that position, though that change does not alter the points that followed it; rather, it helps explain the roots of the situation that we are here to discuss.  Briefly, I have come to accept that both the Romani people and the Romani language were formed in Anatolia out of a conglomerate of Indian, Byzantine Greek and other elements, and that in the period during which that was taking place, i.e. from the late-11th to the mid-14th centuries, there were at least three major moves out of Asia Minor into Europe.  I believe that the consequences of this are fundamental to our understanding of both the social and linguistic history of our people.  I repeat here what I first stated over thirty years ago:

• a. Because of historical and ongoing factors, not least of which are antisocial pressures from the host societies that continue to divide the Romani-speaking populations, there are today a great many widely differing dialects of that language.

• b. Perhaps the greatest obstacle in achieving political and cultural unity is the lack of communication among the various Romani populations throughout the world.

• c. It may be assumed that progress towards reunification would be more easily made if a common dialect were available to all groups.  And the problems in creating a standardized dialect were identified thus:

  1. No single dialect spoken anywhere is so close to the common protoform spoken upon arrival in Europe that it may be adopted with no modification. In other words, whatever dialect is chosen will have to be adapted to a more internationally acceptable form - especially phonologically and lexically.

• 2. Using existing means of education, the propagation of such a standard will be very unevenly achieved. Sedentary, already literate Roma, such as predominate in eastern European countries, will have a far better opportunity to acquire such a standardized dialect. For illiterate and nomadic Roma, the task would be very much harder.

• 3. Not all Roma everywhere will ever learn, or be disposed to learn, such a dialect. This will create “linguistic elite” composed solely of those who have learned to use the international standard.

Addressing the problem

When gypsilorist Matt Salo stated that his research revealed “a lack of common ethnic consciousness” among the different Romani groups (Salo 1975: 2), he made it clear nevertheless that each is aware of a common ethnic and linguistic link, but that group identity beyond that is negligible. What Salo should have said was a lack of a constructive sense of ethnic consciousness. While this may be explained in light of the chronology of Romani history, sociolinguistic and sociohistorical evidence incontrovertibly demonstrates a fundamental shared linguistic, cultural and genetic history.  Only now, with increased education, is this history beginning to be learnt and accepted, both by Roma and gadje alike.

As a result, this lack of concern on the part of our people for members of groups other than our own is changing; we may cite the European Roma and Travellers’ Forum as an example. o live without reliance upon the others.  This has not been the case in Europe.  Still, Porter (1975: 186-187) and others have commented upon the marked tendency of those who study Romanies to deal with a single group, and then to apply their findings to all groups, or else not to acknowledge any group other than the one they have studied, an example of which is what Thomas Acton has called “Kalderashocentrism.” Salo (loc. cit.) made this clear:

Recently, political activists, both those who claim Gypsy identity and those who do not, have attempted to construct a pan-Gypsy identity, dismissing as irrelevant the ethnic categories of the actors themselves . . . some scholars have reacted by adopting the ethnocentric perspective of the group they have studied, dismissing the others as not true Gypsies.

This attitude is central to the formation of a standardized dialect, and it is important that the definitions and attitudes established in the gadjikano mind be revised, since cooperation from the non-Romani population is essential if moves forward are to be made. This became a central issue, in fact, at the Rom and Cinti Union’s National Congress, which was held in Mülheim-Ruhr in November, 1990, and at which the foundation was laid for our international Romani Parliament. Among its demands was the call for “the immediate dissolution of non-Romani organizations who speak for the Roma and who help perpetuate a circumscribed definition of who we are.”

As I say, our current findings support the likelihood that the divisions within the population have existed from the different times of the very arrival in Europe (in Romani referred to as the aresajpe; I suggest too use of the word nakhipe for the crossing over from Asia into Europe when we write our own history in our own language); Marcel Courthiade’s three strata demonstrate this.  While Graffunder’s “grand reunion” has to be seen as an impossibility, a “grand union”—an awareness of common origins—is happening, and is desirable. Without sacrificing identity as a Sinto or a Romanichal or a Xoraxano, it is possible to be aware of sharing a common linguistic and cultural heritage, and of the blood which binds Roma together as one people. And without that, we are drastically weakened as a whole people to confront and deal with the massive social problems that beset us today.  At Babel, the confusion of tongues wrought upon humanity by God was precisely to prevent communication and cooperation.  It is imperative that we do something about our own linguistic confusion.


There are only two workable options available in selecting a standard dialect: the creation of an artificial Union variety, or the selection and cultivation of a dialect that already exists. There are arguments to be made for each of these, and arguments against each. In order to tackle the problems of standardization, it is necessary first of all to determine which dialect or group of dialects is to constitute the basis for the new standard, whether it be an a priori or an a fortiori choice.

According to the model proposed by Haugen (1966; see also Hancock, 2003), the initial stage in language planning is selection, and it rests primarily upon social and political considerations. Romani exists in some 60 dialects, which fall into five or six branches. These share a high proportion of common grammar and lexicon, but because of the fragmentation of the population in Europe, there are also far-reaching differences apparent among them. These are the result of external, rather than internal, factors, something I have elaborated upon in a paper dealing with the Romani reunification movement (Hancock 1988b). Some varieties of Romani have become so attenuated that their morphosyntax and phonologies belong now to other languages; Caló and Pogadijib are examples of this and have been discussed in print by Marcel Courthiade (see bibliography). Like Courthiade, I would also omit such varieties from consideration (except perhaps lexical consideration) since their retention of native Romani grammar, phonology, and semantic content is minimal or nonexistent. Internal factors, on the other hand, account for natural divergence: for the development of /h/ from /s/ in some Northern Romani dialects, for instance; for vocabulary loss, such as masak ‘month’ and vraker- ‘talk’ in the Vlax group; or for morphological reduction, such as loss of the first and second person singular and plural emphatic subject pronouns in all but Welsh Romani and dialects in the Southern branch.

External differences are the result of contact with the various gadjikane idioms spoken in the environment of the various Romani dialects and also includes phonological modification, such as the loss of the aspirate/non-aspirate distinction in French and Italian Sinti, the acquisition of palatalized sounds in Kalderash Vlax, and morphological intrusion, such as the incorporation of a narrative suffix -li (from Bulgarian) in Drindari, as well as extensive calquing-upon and relexification. It is the latter factor in particular, as I said earlier, that constitutes the greatest barrier to interdialectal intelligibility, for even where morphologies may differ, a shared lexicon will usually continue to provide a basis for communication.

It is unlikely that an artificially created dialect, perhaps a linguistic reconstruction of Proto- Romani such as that attempted by Schultz (1974) or Higgie (1984), would attract much support. It will be interesting to see the extent to which reconstructed Iberian Romani, such as that used for a translation of the Spanish Constitution (Heredia 1989) will be learnt by Kale Roma in Spain. Recently (December 14th 2006) a strongly worded response was posted on the Roma Virtual Network protesting an attempt by the Indian linguist Janardhan Pathania to create a highly Indianized standardized dialect of our language:

This gadjo Janardhan aims at destroying our language and it is evident that there is a Hinduist political agenda throughout his letter. You can see by yourself, Romale, how does he write words that do not exist in our language, and that not a single Rom in the world knows such terms. Please do not pay attention to his letter, that is pure silliness.
Whoever among you who knows any Rom who greets saying “namaste”, please tell us. Our greeting is “lashó dzhes” (Kalderash), “lacho dives” (Sinti), or other similar expressions. We do not say “lekh”, this word is not Romany but Hindi. We Roma do not know what does “antarashtri” written in this gadjo’s letter. It is not Romany what is written in his letter!
Our Roma in the whole world, those who have lost our language, now want to learn Kalderash so that they can communicate with Roma wherever they go, as Kalderash is the best known tongue among Roma worldwide.
But baxt thai sastimós savorhenge.

The varieties of Romani used most extensively for purposes of documentation at the present time are Central Vlax (such as Kalderašicko), the Erli dialect of Balkan Romani, and the Slovak variety of Central Romani. The Balkan dialect, spoken in Niš, Skopje, &c., supports a considerable local literature, including Puxon and Kenrick’s (1990) translation of The Destiny of Europe’s Gypsies. Indeed, a recent conference at which our colleague Hristo Kyuchukov was present dealt with the specific issue of creating a regional standard—something I shall return to. The Central dialects are being increasingly used for written media by a growing number Czech and Slovak activists, and is the principal variety used in the journal Džaniben.  Northern dialects appear frequently these days in e-mails from Poland, where the current President of the IRU, Stanisław Stankiewicz, lives.

Elsewhere in Europe, however, as well as in North and South America and in Australia, it is Vlax that, appears to serve as the vehicle for the widest communication, Since its speakers are most widely scattered geographically. All Romani-language materials published in the United States are in Kalderašicko, as well as most of those being published in Europe. There are more contemporary grammars and dictionaries available for Vlax than for any other dialect, and more unpublished theses and dissertations. It is quite clear that, for the most widely applicable practical use, a Kalderaš-based dialect would be the logical choice for a standardized dialect. Balkan dialects differ from Vlax more than conservative Central dialects do, and their geographically restricted use argues against their adoption. Differences between Vlax and Central dialects can be minimized with only a small risk of creating too artificial a dialect native to no one; differences which are primarily phonological and lexical. The lexical differences are the subject of a monograph by Kochanowski (1986).


Haugen’s (1966) second stage is called codification, defined by Fishman et al. (1971: 295) as dealing with “the normalization (standardization) of regional, social, class or other variation in usage via the preparation of recommended (or ‘official’) grammars, dictionaries, orthographic guides, etc.” At the Fourth World Romani Congress, which was held in Warsaw in April, 1990, the Language Commission (o Kolo le Alomaske la Rromana Čhibake) voted upon an orthography to serve for the standard dialect now being developed. It made use of the archiphonemic characters <ç> (to represent either /s/ or /ts/) in the postposition –sa, <θ> (to represent either /t/ or /d/) in the postpositions -tar and -te ‘at’; and <q> (to represent either /k/ or /g/) in the postposition -ke and the adjectival series in -k- + V, the second in each pair following the masculine and feminine plural oblique marker /-n-/. Consonantal modifications are indicated with an acute accent, thus alveolar <s> contrasts with palato-alveolar <ś>; uvular /r/ is written as <rr>, contrasting with the front /r/ written with a single character <r>, and a second non-Romani symbol <Z> has been introduced to stand for the voiced palatoalveolar fricative. What are voiceless aspirated palatal fricatives in some dialects are articulated with retroflexion by speakers of others; one grapheme <ƒh> represents both possibilities. <c> represents /ts/, and <x> is a voiceless uvular fricative. The new orthography also incorporates the accents < 4 > (Eastern Europe) or < ! > (Western Europe) and < ( > to indicate /j/ onset (for example, <ǎ> = /ja/), non-predictable, non-final stressed syllables on athematic stems, and vowel centralization, respectively.

Even if a Vlax dialect, say Russian Kalderaš, were to be selected as the basis for a new standard, we ought ideally to have at our disposal a complete grammatical and lexical study based upon all of the Romani dialects from which to draw in order to supplement the base dialect selected. Vlax has a number of forms lacking in some other branches, such as causative and inchoative verbs, but it has lost in turn the thematic comparative construction, for example, and supplementation or replacement of these should ideally be with thematic (that is, pre-European) models.

There are a number of ways in which the base lexicon might be augmented: (a) incoining, (b) phrasing, (c) native retrieval, and (d) foreign adoption.

Incoining is already a widespread mechanism for lexical expansion in Romani. This involves combining already existing morphemes in the language in innovative combinations either having no exterior model, or else being calques on another language. Examples are American Vlax šudro-bakso ‘refrigerator’ (lit. ‘cold box’) and gadžengo pleso ‘public’ (lit. ‘place of the gadje’).

Phrasing involves replacing a single, athematic (non-native) item where it has become lost with a descriptive phrase-employing native vocabulary. Examples from American Vlax include glinda te dikhen palal ‘rear-view mirror’ (lit. ‘mirror that you [use to] look behind’) and mačhina kaj ramol ‘typewriter’ (lit. ‘machine which writes’).

Native retrieval has been used for a number of developing languages, such as Malay and Hebrew, and consists of reviving obsolete words from the historical native stock to augment the contemporary lexicon. For Romani, this would mean the resurrection of Sanskrit words, for example *merdika ‘freedom’, for which only adoptions from European languages exist (such as American Kalderaš frijimos, Russian Kalderaš slobuzenja). It would then remain a problem as to whether such items would retain their Sanskrit form or be modified according to the rules of change that have produced modern Romani phonology, in which case *merdika (<Sanskrit mrdīká-) would have the form *mareko in Romani. This presupposes considerable linguistic sophistication on the part of the linguistic committee whose task it would be to select and modify such items. An alternative has been proposed by Kochanowski (1971: 76-77), who suggests modern Hindi as a lexical reservoir for Romani.

Foreign adoption means simply the acquisition of new lexical items from any athematic source. Kochanowski (1971) again suggests that international vocabulary be adopted and, where these are insufficient, words common to French and English be incorporated, in each case made to conform to Romani grammar. This is already happening in the European dialects, where it is not always possible to identify the immediate source of such widely occurring items as tilefono ‘telephone’ or mikroskopo ‘microscope’. The choice of Vlax is particularly useful in this regard, since it has both a thematic and an athematic grammatical paradigm, any new items already having their morphological behavior determined for them. But while it is true that both French and English have wide international currency, neither is commonly heard in eastern Europe, where pan-Slavicisms would seem to be a more logical source to supplement the language.

To obtain an idea of the proportion and character of the non-core lexicon of Romani, a breakdown is provided here of two different paragraphs chosen at random from letters that have been sent to me. In each, the non-core items are italicized, and the spelling remains as received:

Balkan, from southern Yugoslavia:

Bičhalava tuke jekh foto-kopi katar o žurnali i Arena, kai vakerela o Prof.  S.J. baš i Indija -kaj vov arakhla o M.D. ano London, kaj i Indija vazdinda amari rezolucija ko U.N.O., taj so ka kerel, so rodel, o Duito Kongreso ani Ženeva avutno bers.

Vlax, from Trieste:

Sayekh mangav tutar tay me či bišalav tuke šoha khanči. Te trubul tu vareso, te na lažes, numa motho. Si tu kodi knyiga katar o V. tay M. pa e šib le Lovarengi ando Ungriko? Te niči, bišalav tuke fotokopiya te kames; but interežnyime [= interežno] si.

Of the two passages, non-native elements constitute about 20 percent of the whole, though over half of this consists of proper names. Of the remainder, the majority may be considered to be “international vocabulary,” (foto-kopi, fotokopiya, žurnali, rezolucija, kongreso, interežnyime), two are grammatical particles (šoha ‘never’, trubul ‘need’), and one a local adoption from Serbian (knyiga ‘book’), with which a native form, lil, alternates. Thus, less than a tenth of the vocabulary is derived from external sources in these passages, exclusive of proper nouns. In less conservative dialects the percentage is much higher; thus in this sample of French Sinti, some 40 percent of the lexicon represents accreted material:

Mē am trin nebudi, žjam te rodas šáfreba ačas kek. Ijo “Jean-Jean”, bišštar beršengro, “Niglo,” bišjek un o “Ratam,” bišberšengro čače morš, ledige čave. Memke hart šáfreba darā gar, safráxa fort. Vejam so fus trianda panč kilomēngri ano foro. Ačam gar o rašaj, krat dui batríja un i pisla sastar pur te xas i kotar māro.

Of the 3,600-item glossary of Swedish Kalderaš by Gjerdman and Ljungberg (1963: 193-396), some 1,750 words, or almost 48 percent of the total, derive from non-native sources, overwhelmingly from Romanian. But, as the authors point out (1963: xx), “. . . the vocabulary originally brought by the Gipsies from their Indian motherland is, despite its paucity, of much greater significance. For this is, after all, the material from which the principal features of the Romani language are derived.” In this respect, it compares with the Anglo-Saxon component of modern English, some 28 percent according to dictionary count, but as high as 85 percent in ordinary discourse.


Having discussed some of the issues related to the selection and implementation of an international standard for written Romani, I suggest that the following scheme be adopted in the creation of such a dialect:

a. A number of representative Romani dialects from the most widespread or numerically most important branches be selected. This would include a conservative dialect of Vlax (such as Kalderaš), as well as Balkan (such as Erli), Central (such as Bašaldo), and Northern (such as Sinti).
b. All foreign material (lexical, phonological, morphosyntactic, etc.) be removed from each of these representative dialects, and codification made of the remaining thematic material. Shared and non-shared features would be listed separately.
c. Features absent in the natural dialect selected as the base of the Standard, which I suggest be Rusicko Kalderaš, be supplemented from other dialects where they exist.
d. A standard lexicon be developed using the techniques outlined above. My own position, and that of a growing number of my colleagues, is that Romani crystallized in Asia minor, not in India, from a conglomerate of pre-Anatolian languages in the matrix of Byzantine Greek.  I believe therefore that all of the languages that have contributed to this are equally valid as constituting the “core” lexicon.  That the speakers were leaving under different circumstances and entering Europe at different times across the span of perhaps three centuries clearly accounts for the different representations of these core items from one present-day dialect group to the other.  Nevertheless if we are to scour all the dialects for Indian-origin items in order to supplement the lexicon, we must equally well look to items derived from Greek, Armenian, Persian and other pre-European languages.  It is only after the arrival of the different early groups into Europe and their subsequent dispersal that the different dialect groups began to acquire separate, non-core words.  I have appended to this paper a list of all such pre-nakhipe items as I have been able to locate in all documented dialects; together they provide a rich basis from which to draw. 
e. If regional standards are to be developed, as was the focus of the conference in Skopje on December 17-19 last, which discussed a common variety intended to serve populations in Macedonia, Serbia, Kosovo and Bulgaria, then the same recommendations apply as those made here, but limited to the specific area.
f. My suggestion regarding orthography is that we follow the lead that is being taken naturally in our e-mails to each other: rather than using diacritics, English graphemes be employed, thus <sh> rather than <š>, <ch> rather than <č> and <zh> rather than <ž>.  I would retain <x> for the voiceless uvular fricative, and <rr> for the voiced.  This has been used in a number of publications, especially from Hungary; a sample is found in the samples of different orthographies that are attached.


