T.A. Acton (ed.)
Scholarship and the Gypsy Struggle: Commitment in Romani Studies
Hatfield: University of Hertfordshire Press (2000), pp. 1-13.


Ian Hancock


THE PRESENT hypothesis rests upon the supposition that the ancestors of the Romanies were members of the Kshattriya or military caste, who left India with their camp followers during the first quarter century of the second millennium in response to a series of Islamic invasions led by Mahmud of Ghazni. Linguistic and historical arguments in support of this position are found in Hancock (1999), and are summarised below.

It is maintained that together with Urdu, Romani developed from a contact language (of the type known as a koïné, and for which I employ here the term Rajputic) over a millennium ago, from a levelling of the medley of languages spoken on the battlefields of north-western India. While the speech of those troops who remained in India subsequently normalised in the direction of the surrounding local Indian languages, those who moved away from the area and became linguistically isolated from it, experienced no such metropolitanising factor, their speech and linguistic behaviour, as well as their social patterns, developing differently as a result. I also demonstrate that rather than being an isolated case, language contact situations are typical in India, and propose that the Domari and the Ghorbati languages, as well as others, emerged from the same type ofsociolinguistic framework.


Nida and Fehderau (1970) were probably the first linguists to establish the koïné as a typological category. They saw it as a language of wider communication typified by “modifications in the direction of simplification of morphological and syntactic structure [... but which] presents no such structural break as is clearly present in the case of pidgins” (1970:147), and which is “always mutually intelligible with at least some forms of the standard language” (1970:152). The definition most usually relied upon these days, however, is Siegel’s (1985:375-376):

Koïnéization is the process which leads to mixing of linguistic subsystems, that is, of language varieties which either are mutually intelligible or share the same genetically related superimposed language. It occurs in the context of increased interaction among speakers of these varieties. A koïné is the stabilised composite variety that results from this process. Formally, a koïné is characterised by a mixture of features from the contributing varieties, and at an early stage of development, is often reduced or simplified in comparison to any of these varieties. Functionally, a koïné serves as a lingua franca among speakers of the different varieties. It may also become the primary language of amalgamated communities of these speakers.

Although they are not subject to the drastic restructuring essential in the definition of a pidgin, koihes do share with the latter the fact of being native to no one; thus every speaker of a komeised (or pidginised) variety of a language speaks another language or dialect natively. The particular status of such linguistic systems “that stand between ‘normal’ primary languages... and pidgin and Creole languages” has been addressed in a theoretical framework by Whinnom (1981), and is becoming a specialised area of typological study (e.g. Jahr, 1992; Siegel, 1985 and 1993, Bakker and Mous, 1994); in Reinecke’s (1937:80) earlier ground-breaking study of the classification of different kinds of contact language, he deals with Hindustani briefly in his category of’lingua francas’.

Koïnés may also be characterised by extensive lexical adoption, i.e. their mixed-source vocabularies, either drawn from other languages, or reflecting different dialects of the same language, a process known as levelling. Also - because of their non-native status - they are typified by the lack of certain registers, such as formal style or baby talk. In some cases they may in time acquire some measure of official standardization and status and may be referred to as ‘Union’ languages; Union Dyula, spoken in West Africa, is one such example. When this happens, a koïné can, in the course of time, lose its identity as an auxiliary dialect, and become a first language, thereby developing all of the characteristics associated with the same: socially- and gender-determined registers, proverbs, a folklore and so on. The best-known koïné, that which emerged from various Ancient Hellenic dialects, became the basis for Modern Standard Greek (Comrie, 1990:412). The word itself is from that language: Z 6@^Z *4V8,6J@H “the common dialect.”

Because for its speakers a koïné lacks the emotional rootedness of a first language, it is more readily susceptible to linguistic modification and adaptation than other languages, and its speakers are generally more tolerant of variability within it, in pronunciation, and in grammatical and lexical selection. With multiple inputs contributing to its initial emergence, it remains open to ongoing modification as long as it continues to exist as an adjunct linguistic system.

Rajputic was a such a koïné and was, therefore, able to adapt to its changing linguistic environments. Those speakers who returned to, or remained in, India, accommodated accordingly. In Islamic areas, Rajputic continued to be modified by pressure from Persian and Arabic, eventually giving rise to Urdu, the official language of Pakistan and the state language ofJammu and Kashmir in India. In Hindu areas, the modifying influence was Sanskrit, giving rise to Hindi, the official language of India, by the fourteenth century (Khullar, 1995).

India and language contact

Because of its tremendous linguistic diversity, India has long been recognised as a part of the world where language contact has produced numbers of hybridised and restructured languages - meriting a section of its own in the Bibliography of Pidgin and Creole Languages (Reinecke, 1975:632-5). While the specific case of Rajputic was one of koïnéisation rather than pidginisation, Southworth (1971:270-1) at least “assumes that pidginisation took place throughout the Indo-Aryan area, but that its long-range linguistic effects were tempered or reinforced by other social factors [...which] have led, at the extreme end of the spectrum, to a result that is similar to the classic modern cases of pidginization known from the Caribbean and the Pacific.” He discusses the Marathi language in particular, which is also dealt with in the same framework by Gumperz and Wilson in the same volume.

In wording hardly acceptable today, Petersen (1912:421-32) was the first to discuss this as a phenomenon dating from the very beginnings of Aryan-non-Aryan contact:

As the old Aryans invaded the Indian peninsula and conquered certain aboriginal tribes, they would impose their language upon those whom they enslaved and which consequently formed a part of their society. But since these black aborigines had organs of speech as well as linguistic habits that differed widely from those of the Aryan invaders, they were unable to learn the language in the same form as the one in which it was spoken by their conquerors, and it was modified to suit their own characteristics in much the same way as the American negro has modified the English language through his own physiological and mental peculiarities. And just as many peculiarities of the negro dialect are common to the whole large area of the South or his original American home, since the peculiarities which cause these aberrations are common to the whole race, just so a number of phonetic changes in Prakrit were common to all of the widely scattered areas where these popular dialects were spoken, since here also common racial peculiarities would cause common effects . . . and it is therefore not surprising that Prakrit and Vedic should have been virtually coexistent not only from the beginning of the transmission, but ever since the Aryans first invaded India and began enslaving the aborigines.

The first to bring attention to this specifically in the context of contact phenomena was James Clough well over a century ago, in his book On the Existence of Mixed Languages. In it, he explains the emergence of Hindustani as a two-stage process: firstly that the Uzbeks (or “Uzbek Tatars”, as he called them) from the north of Kabul invaded Persia in the fifth century, adopting the dialect of the Persian court (Zeban Deri) as their language of administration—rather than any regional vernacular (Zeban Parsi) and then, some centuries later, that they went on to conquer India during the reign of Mahmud ofGhazni (AD 997-1028). Mahmud’s administrators in India... experienced some difficulty in communicating with their new subjects. A lingua franca was composed, consisting principally of corrupt Persian and Hindi, and this was known under the name of Urdu Zeban, or camp language, to distinguish it from the court language, but the poets called it Rekhta, or ‘scattered,’ on account of the variety of elements composing it (Clough, 1876:15).

It was in fact the Huns, a nomadic Turkic-speaking people (Baskakov, 1969; Maenchen-Heffen, 1973), rather than the Uzbeks who invaded Persia and north-western India in the fifth century, and it was Mahmud himself who invaded India rather than the Huns, as Clough intimates. Kachru’s (1990:470-1) more modern (and more accurate) account tells some of the same story, maintaining that Hindi itself originated in this contact situation:

Hindi as a language is said to have emerged from the patois of the market place and army camps during the period of repeated Islamic invasions and establishment of Muslim rule in the north of India between the eighth and tenth centuries AD. The speech of the area around Delhi, known as khari bholi, was adopted by the Afghans, Persians and Turks as a common language of interaction with the local population. In time, it developed a variety called Urdu (from Turkish ordu, ‘camp’). This variety, naturally, had a preponderance of borrowings from Arabic and Persian. Consequently, it was also known as Rexta ‘mixed language’.

Ketalaer’s grammar of Hindustani, dating from the late seventeenth century, describes a clearly koïnéised variety of that language, as does that of Lebedeff published in 1801, and reduced varieties of Hindi continue to be spoken in India and elsewhere today, indicating that this process is an old and ongoing one (c.f. Chatterji 1931, Khubchandani 1963, Zograf 1963, Kelley 1966, Abbas 1969 and Mesthrie, 1993).

Both Clough’s and Kachru’s accounts make reference to Urdu’s being a military lingua franca, although neither elaborates on this. In Turkish, its primary dictionary entry is in fact “army” (Alderson and İz 1959:259). Its use by non-Indians is reflected in its two non-Indian names: Urdu (from Turkish) and Ryekhta (from Persian).

The Rajputs

Among the Hindu soldiers involved throughout the early period of Indian conflict with Islam were the Rajputs, who were a conglomerate force conscripted from many ethnic and linguistic populations in India, including the Ahirs, the Gujjars, the Lohars, the Lobhanas, the Saudagars, the Siddhis and the Tandas; in this way they were “welded out of different non-Aryan material into a martial society” Watson (1988:88). According to Thakar (1969:227), “Most authorities accept the view that the Rajput clans were either descended from the Huns settled in northern and western India or from those tribes and peoples who had entered India together with the Hun invaders.”  collectively, all foreigners entering India were called (in Sanskrit) mlechchha, or “impure,” a status gradually lost by those who became assimilated over time. By the ninth and tenth centuries, during the height of the period of Islamic expansion, the Rajputs had come to wield considerable political power and had achieved warrior caste status. They were divided into a number of clans, four of which had special status: the Pratiharas, the Chauhans, the Solankis and the Pawars. In being given royal lineage by the Brahmins (their name Rajputs means “sons of princes”), they were also allowed to display emblems signifying their descent from the Sun and the Moon as military insignia.

The Rajputs were not originally a single ethnic population but a conglomerate, professional one drawn from many distinct peoples. Today, however, numbers of distinct ethnic populations have emerged in India over the past thousand years, who trace their descent to them. Those associated particularly with Romanies, both in their own tradition and according to the consensus of many western scholars, are the Ghor, or Banjaras. In addition to those cited here, other sources on the Banjara include Thurston and Rangachari (1909), Rathod (1988) and Isaac (1984).

The Banjara

Banjara tradition maintains that during the Ghaznavid period, some Rajputs left India through the Himalayan passes never to return. “The Khyber, most famous of the passes, has been an immemorial trade-link with Central Asian and Mediterranean communications. South of the Khyber the main routes from the Iranian Plateau are by the Gomal and the Bolan passes, and finally along the Makran coast” (Watson, 1988:12). Rathore (1997:2) gives the names of several Banjara and non-Banjara historians who have written about Ghor history, and about the Rathore and Chauhan Rajputs who left Rajputana in response to the Ghaznavid invasions, spreading out to the four points of the compass. Naik (1978:5) writes of the Rajputs who “about 1,000 years ago, during the invasion of Mohammed Ghori and Mohammed Ghazni into Greater Panjab... fled through the Khyber and Bolan Passes and went to Central Asia and moved further into Europe...[they] came to be known as Gypsies, and another group remained in India in jungles and Chambal Khore, a valley in Rajasthan, and came to be known as Lamans or Banjaras.” Western historians too, acknowledge this:

Some Rajput clans, or portions of them, after offering fierce resistance to various Muslim armies—tales of these exploits are also part of widespread folk tradition— drifted north or south into the mountainous regions of Central India or the Himalayas, and some may have gone as far as Nepal. Many of them remained in north India (Minturn and Hitchcock, 1966:11).

The fact that Rajput communities in twenty-one out of the thirty-one states throughout India speak varieties of the same language (in two distinct dialect divisions), is indicative of their having dispersed from a single area at an earlier time. Grierson (l907:ix:56) says:

Banjari falls into two main dialects—that of the Panjab and Gujarat, and that of else- where (of which we may take the Labhani of Berar as the standard). All these different dialects are ultimately to be referred to as the language of western Rajputana. The Labhani of Berar possesses the characteristics of an old form of speech, which has been preserved unchanged for some centuries. It may be said to be based partly on Marwari and partly on northern Gujarati.

Grierson goes on to say that the Banjari language of southern India is mixed with the surrounding Dravidian languages. This is what we would expect, given the social and linguistic history of the population; an almost identical situation exists among the Seminoles, with whom the Banjara may be compared: over the past three centuries they have become a single Native American ethnic people, but in fact they descend from fugitives from the British colonists who represented some twenty or more quite separate language groups, both Indian and African, who were able to find refuge in Spanish Florida. Their very name is from Spanish cimarrones, which means “fugitives,” and they maintain two mutually unintelligible languages, Muskogee and Mikasuki, amongst themselves in their community today (Hancock, 1980).


In order to accept the hypothesis that Romani emerged from a military koïné, it must first be established that the ancestors of the Romanies were indeed a military force at this time and place in history. In Hancock (1999) I assemble linguistic, historical and cultural evidence to support this position, which I summarise here:

1.  The linguistic features of Romani identify it as a new-Indic language rather than an old-Indic language, which dates its time of separation from India at no earlier than ca. AD 1000.
2. The Romani language cannot be traced to any single Prakritic branch of the Indic languages, but has features from several of them, although it is most like those of the Central group. The language closest to Romani is Western Hindi, which itself emerged from Rajputic.
3. Romani includes a substantial Dardic component (particularly from Phalura) and items from Burushaski, a language-isolate spoken in the Pamir and nowhere else. This, and other linguistic evidence, points to an exodus through this particular area—the same area through which the Ghaznavids moved into India.
4. The various Romani terms for non-Romani peoples suggest a military-non-military relationship; thus gadžo is traceable to an original Sanskrit form (gajjha) which means “civilian,” das and goro both mean “slave, enemy, captive, “and gomi means “one who has surrendered.”
5. Romani has a military vocabulary of Indian origin, including the words for “soldier”, “sword,” “attack,” “spear,” “trident,” “battlecry” and “gaiters.” Most of its (for example) metalworking or agricultural vocabulary, on the other hand, consists of words not originating in India.
6. Some Romani groups in Europe today maintain the emblems of the Sun and the Moon, as did the Rajputs, as identifying insignia. Tod (1920:i:69) traces this to the Mongols.
7. Cultural practices of some Romani groups in Europe today resemble elements of Shaktism or goddess-worship, as in the Rajputs’ worship of the warrior goddess Parvati, another name for Kali-Durga. Although the figure of St. Sara in Saintes-Maries comes from an older local myth, and “black” Madonnas and other statues of dark-coloured wood are hardly uncommon in Europe. The European pre-eminence of Les Saintes-Maries among such festivals may be taken to indicate a certain cultural affinity (Fraser, 1995:313). Much as the ancient Romans rediscovered Jupiter in the Greek Zeus, so the Indian goddess Kali may be rediscovered in the Romani Sara-Kali in France today. Her statue is immersed in the Mediterranean just as it is in the Ganges once a year in India.
8. Throughout the earliest fifteenth and sixteenth century written records we find that Romanies told their largely uncomprehending western interlocutors that they had been defeated after conflicts with Islamic forces (Fraser, 1995:72,83). We should recall that the period after the Muslim invasion of India was also a period in which Byzantines, Crusaders and Armenians sustained a patchwork of anti-Islamic military resistance in Anatolia, with the last Armenian principality being reduced by Ottomans only in 1361. The oral tradition of some Romani groups in Europe includes stories of a conflict with Islam leading to the original migration West.
9. The mixed linguistic nature of Romani is evident from the numbers of synonyms of Indic origin in modern Romani, e.g. the multiple words for ‘wash,’ ‘burn,’ ‘awaken,’ ‘back,’ ‘dog,’ ‘fight,’ ‘belt,’ ‘give,’ ‘birth,’ ‘arise,’ ‘bracelet,’ ‘cold,’ ‘comb,’ ‘day,’ ‘excreta,’ ‘fear,’ ‘food,’ ‘heel,’ ‘leave,’ ‘man,’ ‘move,’ ‘non-Romani,’ ‘open,’ ‘pay,’ ‘sing,’ ‘straw,’ ‘thin,’ ‘tomorrow,’ ‘raw,’ ‘wet’ and so on.

If we hypothesise that Rajputic, the military lingua franca of the army camps a thousand years ago, became reshaped differently as its speakers separated and moved into different areas, we can account for its Islamisation in Muslim areas, where it became the Persian- (and increasingly Arabic-) influenced Urdu (written in Arabic script), and its Hinduisation in Hindu areas, where it became Sanskritised (and written in Devanagri script). But some Rajputic speakers “remained in jungles and Chambal Khore, a valley in Rajasthan, [and] came to be known as Lamans or Banjaras . . . the group which fled to the forests and remained there are the present-day hill-tribes called the Banjaras” (Naik, 1978:5). Under such circumstances, their original Rajputic did not develop into either Hindi or Urdu, but emerged into a distinct but related new language—Ghorbati (also called Ghorboli, Lamani or Banjari). The speech of those Ghor (Banjaras) who subsequently left Rajasthan for other parts of India to the South underwent further change in the direction of the surrounding languages, as Grierson noted, paralleling the re-shaping of Rajputic into Romani in the Greek-language environment.

The movement out of India by those Rajputs who were the ancestors of the Romanies is accounted for in Banjara history; less easily explained is why they did not return to India. Possibly they continued to maintain themselves as a mercenary military force, engaging in conflicts with the Muslims along the Silk Road and the Caspian littoral; the linguistic evidence points to this as their route to the West. Soulis (1961:163) suggests that Seljuq raids in the area at the end of the eleventh century were a factor [in Hancock, 2004, I bring the “Seljuq Factor” into the discussion as providing an account of how the pre-Romanies moved from India to Anatolia]. Marushiakova and Popov (1997:63) attribute their eventual movement into Europe to the twelfth and thirteenth century Ottoman invasions. Significantly, and as further support of their original profession as (besides soldiers) shiviranuchara or camp followers they document those early Romanies as “servants in the auxiliary detachments or as craftsmen servicing the army.”

Their separation from India caused something of a linguistic and social trauma. Even within India, the Rajputs were able to maintain a distinctiveness from the surrounding peoples but lacking even that normalising factor, the ancestors of the Romanies were left completely isolated with their language and mixed ethnic identity, becoming increasingly remote from their homeland. As with the Rajputs before them, the sense of being a composite population on one level, but a distinct and “special” population on another, came in time to characterise a distinctively Romani world view which permitted the Romanies to continue to accept new people into their number, providing they took on Romani identity, and a Romani linguistic trait which allowed the language to incorporate and assimilate new grammatical and lexical material. This capacity to absorb and modify, socially as well as linguistically, represents a continuum from the time of the creation of the Rajputs, and remains an integrally defining aspect of Romani identity.


In Hancock (1995), I demonstrated that, contrary to the established belief, Romani and Domari have distinct and unrelated histories, separated by over five hundred years. It was possible to show this by examining the Iranic lexical content of both, and concluding that the percentage of shared items was far too low if both languages had been part of the same migration through Persia. If the third member of the conventional trilogy comprising this theory is included—Lomavren, spoken in Armenia—then there is no single Iranic item shared by all three branches. Domari is clearly of an old-Indic type, and differs significantly from Romani in its phonology, grammar and lexicon.

While it can be demonstrated that Domari, spoken in several dialects throughout the Middle East, but particularly in Syria, is not related to Romani (other than that they are both Indian languages—which could equally well be said about Romani and Sinhalese), accounting for its origin presents a different problem.

The movement of the ancestors of the Romanies out of India has traditionally been explained by looking to Firdausi’s account of the gift of several thousand musicians in the fifth century from the Indian king Shankal to the Sassanid shah of Persia. That such an episode occurred is quite likely, since it has been documented in a number of places (Hancock, 1999). But it does not hold up linguistically, since the musicians were “Sindhian,” from the north-western Prakritic area, and Domari shows much closer lexical similarities with languages of the Central group (Nseir, 1998). According to The Encyclopedia of Islam (Minorsky: V:818), there is a group which identifies itself as Kurdish east of Bohtan in Persia, “which bears the suggestive name Sindi or Sindiyan (the Sindhis).”

Instead, the presence of the Dom (Domari speakers) as well as of other ‘Gypsy’ populations in the Middle East such as the Karači, whose linguistic affiliation has still to be determined (Kenrick, 1976 c,d) might well be accounted for using the same theoretical framework as that proposed here for Romani, i.e. composite military troops moving westwards out of India. Like Romanies, the Dom also refer to non-Dom as kajja “civilians.” To justify this, it must be shown that there was an invasion of India in this area at this time, and that Indian troops left India to engage them.

This proves to be the coming of the Turkic-speaking Huns who, starting from Bactria, eventually occupied other parts of Persia and India, but who were defeated and driven back by the Persians in the sixth century (Thapar, 1966:142), and who were gradually absorbed into the local population in India by the end of the millennium. While their cultural impact was not extensive, the linguistic consequences of their invasion were far-reaching:

Prakrit is of linguistic interest as illustrative of the linguistic evolution from Prakrit to Apabhramsha (literally ‘falling down’). Apabhramsha, a corrupt form of Prakrit, dialect, is believed to have originated in the north-west and travelled from that region with the migrations of people who scattered and settled in central and western India after the Hun invasions (Thapar, 1966:257).

There are accounts of the invasion of the Huns, and references to their being “pushed back” in a “brave defence” by King Skanda Gupta who, “for the duration of his twelve-year reign . . . was obliged to ward off their predatory assaults” (Wolpert, 1977:94):

From the middle of the fifth century a new barbarian invader, the Hun, made his ominous appearance, as his kindred were doing in Europe. For a generation or so the Guptas succeeded in holding off this menace from the northwest. But toward the close of the century it reappeared in the persons of Toramana and Mihirakula, the latter holding Kashmir, western India and part of the Gangetic basin. In fifty years the Huns were pushed back to Kashmir and parts of the northwest, and they never again became a threat, losing their identity among the Rajput clans of later fame.” (Wright, 1969:301; see also Thapar, 1966:140).

The White Huns, or Ephthalites, or as the Indians called them, Hunas, were barbarians in the sense most painful to settled societies... While Attila’s Huns streamed across Eastern Europe to attack Italy in the mid-fifth century, a southward wave overwhelmed the Sassanid Empire in Persia and the Kushan remnants of the Indian north-west. Skanda Gupta (455-470) is credited with a brave defence of his western frontiers in face of this irruption, but by about 500 a Hun chieftain was recorded as far south as Ujjain . . . The Gupta dynasty is taken to extend, though interrupted in the fifth century by Hun invasions, from 320 to about 450 . . . [the Buddhist university city of] Taxila, beyond the Gupta boundaries, [was] in the fifth century devastated by the Huns (Watson, 1988:60,62,66).

There are also some references to the possibility that the Indians who moved out to confront the Huns at this time were from Kabul rather than Sindh, and that the two events have been confused over time. Harriot (1830:524-5) reported that a Persian historian, Fateh Ali Khan, told him that Firdausi’s musicians were from Kabul, not Sindh, and Gobineau’s Kauli informants in Persia in the 1850s assured him that their ancestors were from Kabul, not Sindh, and that their name was originally Kabuli (1857:690). At the time of the Hun invasions, Kabul was not just the city it is today, but an entire Hindu kingdom (also called Kapisha) which extended from the Kabul River Valley to the Hindu Kush. Siraiki, Multani and some other languages transitional between the central and northwestern groups are now spoken in this area, though we need to reconstruct the linguistic situation as it was there fifteen centuries ago.


The mixed nature of Romani, and the social and linguistic clues evident in an examination of (particularly) its lexicon, make a strong case for its having taken its initial form as a military koïné which left India with its speakers, subsequently developing outside of its homeland. Work begun on an examination of its historical phonology and grammar also supports the fact of its multi-source origins. The identity of its first speakers, and the circumstances of its emergence, established a sociolinguistic ‘character’ which continues to typify the Romani people and language.

The same type of military historical scenario may also explain the presence of Indian languages and peoples such as the Domari throughout the Middle East.



