Monday, July 23, 2012

Middle chinese part 3 - coronals

Previous posts on this topic: 1 2

MC had three types of stops: voiced, unvoiced, and aspirated. There are complications in distinguishing these from each other. For simplicity's sake, I'll focus exclusively on MC aspirates for now, and in a later post I'll describe how to distinguish them.

The following consonants were some possible initials in MC: *tʰ *ʈʰ tsʰ ʈʂʰ *s *ʂ. Compare their reflexes in the modern languages:

 太 'great'
Mandarin: tai4
Cantonese: taai3
Japanese: tai
Korean: tae
Vietnamese: thái
徹 'penetrate'
Mandarin: che4
Cantonese: cit3
Japanese: tetsu
Korean: cheol
Vietnamese: triệt
菜 'vegetable'
Mandarin: cai4
Cantonese: coi3
Japanese: sai
Korean: chae
Vietnamese: thái
 車 'cart'
Mandarin: che1
Cantonese: ce1
Japanese: sha
Korean: cha
Vietnamese: xe 
三 'three'
Mandarin: san1
Cantonese: saam1
Japanese: san
Korean: sam
Vietnamese: tam

山 'mountain'
Mandarin: shan1
Cantonese: saan1
Japanese: san/sen
Korean: san
Vietnamese: sơn

Firstly, note that *s and *ʂ are distinguished only in Mandarin and Vietnamese, and *ʈʰ and *ʈʂʰ are distinguished only in Japanese and Vietnamese. Vietnamese is the only of the languages which does not distinguish between *tʰ and *tsʰ, so it must be used in conjunction with another language.

With some inspection, we see that the following combinations of languages are sufficient:
  1. M+J
  2. V+M
  3. V+C
  4. V+J
  5. V+K

Interestingly, 1 and 3-5 are the same sets of languages which sufficed to distinguish the liquid initials! (See last post.) Thus it makes sense to eliminate option 2 and again focus on the following combinations of languages:
  1. M+J
  2. V+C
  3. V+J
  4. V+K
For option 1, the algorithm is:
  1. M /t/ => MC *tʰ
  2. M /s/ => MC *s
  3. M /sh/ => MC *ʂ
  4. M /c/ => MC *tsʰ
  5. M /ch/ & J /t/ => MC *ʈʰ
  6. M /ch/ & J /sh/ => MC *ʈʂʰ
For options 2-4, the algorithm is:
  1. V /tr/ => MC *ʈʰ
  2. V /x/ => MC *ʈʂʰ
  3. V /t/ => MC *s
  4. V /s/ => MC *ʂ
  5. V /th/ & C/J/K /t/ => MC *tʰ
  6. V /th/ & C /c/ or J /s/ or K /ch/ => MC *tsʰ

(Side note: In the first post I mentioned using Mandarin + Japanese to reconstruct final consonants. I have to add that there's another ambiguity created by this, as some Japanese loans have long vowels due to an original diphthong rather than a final consonant, e.g. 高 Japanese kou Mandarin gao1. This means that sometimes one cannot tell if the MC final consonant is -p or null. Really the best thing is just to use Cantonese, Korean, or Vietnamese.)

Sunday, July 22, 2012

Middle Chinese part 2

(Continued from previous post)

The Middle Chinese (MC) initial consonants *l *m *n are reflected in a pretty straightforward way in our set of languages. The only things to note are:
(1) The reflex of *l in Japanese and Korean may be written as <r> (in fact, these languages don't distinguish between [r] and [l])
(2) In Japanese, the nasals are denasalized (*m>b, *n>d) in older borrowings (the so-called "Go-on" readings)

六 'six'
Mandarin: liu4
Cantonese: luk6
Japanese: roku
Korean: ryuk
Vietnamese: lục

明 'bright'
Mandarin: ming2
Cantonese: ming4
Japanese: myou / mei
Korean: myeong
Vietnamese: minh

男 'male'
Mandarin: nan2
Cantonese: naam4
Japanese: nan / dan
Korean: nam
Vietnamese: nam

MC *m developed an allophone *mv in some environments which becomes /m/ in Mandarin and /v/ in Vietnamese, but is otherwise identical:
文 'writing'
Mandarin: wen2
Cantonese: man4
Japanese: mon / bun
Korean: mun
Vietnamese: văn

The MC initial *ŋ has more diverse reflexes. In Mandarin and Korean it is completely dropped, becoming null /∅/ (which may leave the word with an initial glide). In Cantonese it is sometimes preserved as /ŋ/ and sometimes dropped. Japanese reflects it as /g/.

五 'five', 語 'language'
Mandarin: wu3, yu3
Cantonese: ng5, jyu5
Japanese: go, go
Korean: o, eo
Vietnamese: ngũ, ngữ

The peculiar MC initial *r, whose phonetic realization is not totally clear, is reflected quite differently in each of these languages:
Mandarin: /r/
Cantonese: /j/
Japanese: /n/ (early borrowings - Go-on), /z/ (late borrowings - Kan-on)
Korean: /∅/
Vietnamese: /ɲ/
人 'person'
Mandarin: ren2
Cantonese: jan4
Japanese: nin / zin
Korean: in
Vietnamese: nhân

Now to create an algorithm:

Firstly, to distinguish *mv we need one language from the set {M V} and one from the set {C J K}, since in the former it becomes like a glide and in the later like a nasal. Conveniently, this allows us to distinguish *r, since M and V have unique reflexes of this sound. If we chose V, we can also distinguish *ŋ, but if we chose M we need to chose J from our second set (since C and K may also reflect *ŋ as null).

Thus we see that the possible combinations are:
  1. V+C
  2. V+J
  3. V+K
  4. M+J
The algorithm to reconstruct the MC initial is as follows:
  1. If M/V has initial /m/, /n/, /l/, /r/ or /ɲ/, the MC initial is *m, *n, *l, *r (respectively)
  2. If M/V has initial /w/ or /v/ and C/J/K has initial /m/ or /b/, the MC initial is *mv
For combinations #1-3:
  1. If V has initial /ŋ/, the MC initial is *ŋ
For combination #4:
  1. If M has null initial /∅/ and J has initial /g/, the MC initial is
One last point: Per the last post, combinations #1-3 allow reconstruction of MC final consonants uniquely (since V preserves them). Combination #4 allows slightly incomplete reconstruction (final *m and *n are indistinguishable, but the other finals can be reconstructed).

Wednesday, July 11, 2012


Recently I've been pondering the issue of how the Proto-Semitic language fits into Jewish theology. There's a popular conception among religious Jews that all languages descend from Hebrew, making it the ancestor of the other Semitic languages. In addition, does the sanctity of the Hebrew language preclude it having descended from a precursor? I'm thinking I may post a few articles over time about small aspects of this topic.

Disclaimer: I will base my discussion on a literal reading of Genesis and the associated Jewish sources. Creative approaches have been suggested for dealing with similar issues, e.g. evolution, and they would most likely be relevant to the Proto-Semitic issue as well. However, I would like to strengthen the standing of Proto-Semitic by showing that it does not contradict the literal approach.

In the description of the creation of woman in Genesis, the Torah states that Adam named the woman formed from his rib:

לְזֹאת יִקָּרֵא אִשָּׁה, כִּי מֵאִישׁ לֻקְחָה-זֹּאת
"This one shall be called woman ("ishah"), for she was taken from man ("ish")."

On this, the midrash in Bereishit Rabbah (18:4, also brought down by Rashi on the above verse) states:

רבי פנחס ורבי חלקיה בשם רבי סימון אמרי: כשם שניתנה תורה בלשון הקודש, כך נברא העולם בלשון הקודש, שמעת מימיך אומר גיני גיניא?! אנתרופי אנתרופא?! גברא גברתא?! אלא, איש ואשה. למה? שהלשון הזה נופל על הלשון הזה:  

Rabbi Pinchas and Rabbi Chilkiah said in the name of Rabbi Simun: Just as the Torah was given in Lashon HaKodesh [Hebrew], so too was the world created with Lashon HaKodesh. Have you ever heard someone say "gini ginia"? "Anthropi anthropa"? "Gavra gavreta"? Rather, "ish" and "ishah" [in Hebrew]. Why? Because these two words share the same form.

 According to this midrash, we see that the world was created with Hebrew, since the play on words "ish"-"ishah" does not work in languages such as Aramaic or Greek where the words for "man" and "woman" are unrelated (Greek: "anthropos", "gyni", Aramaic: "gavra", "itteta").

First of all, it's worth mentioning in passing that the words "ish" and "ishah" are probably not etymologically related. Although "-ah" is indeed the feminine gender marker in Hebrew, these words actually have dissimilar vocalization -- "ishah"  אִשָּׁה has a dagesh in the shin, while "ish" אִישׁ is spelled with a yud, reflecting earlier pronunciations /iʃːaː/ and /iːʃ/. Now, based on the rules of Hebrew morphophonology, a word like אִשָּׁה /iʃːa/ word have an expected masculine form אֵש /eʃ/ or אַש /aʃ/, since the underlyingly geminate /ʃ/ does not allow a preceding long vowel. Similarly אִישׁ /iːʃ/ has expected feminine אִישָה /iːʃaː/.

 In fact, the congnates of these words in other Semitic languages show that the two ש's were originally different consonants. איש ish is cognate to Aramaic אנשא enasha "man" and Arabic ناس naas "mankind", where the alternation "sh"-"sh"-"s" is thought to reflect proto-Semitic */ʃ/. On the other hand, אשה ishah is cognate to Aramaic איתתא itteta "woman" and Arabic أنثى antha "female", where the alternation "sh"-"t"-"th" is thought to reflect proto-Semitic */θ/.

Now, this doesn't really present a problem with the midrash. On the contrary, it actually emphasizes the fact that Hebrew stands out in having the words "ish" and "ishah" sound similar. For more info, see Balashon on "ish" and "ishah".

There are certain issues this midrash raises that aren't directly relevant. For instance, English "man" and "woman" also sound similar -- does this midrash not disprove that the world was created with English? For now, I'll leave this question open.

The real question for me here is whether this is consistent with the existence of a proto-Semitic language. When the midrash says that the world was created with Hebrew, does this mean that all current languages descend from Hebrew? Moreover, if Hebrew descended from a previous language, then how could Adam have spoken Hebrew?

Without discussing literalism versus allegory, I think that saying that Adam spoke Hebrew is not inconsistent with proto-Semitic. My argument revolves around the story of the Tower of Babel. The Torah (Bereishit 11:1) states that before this incident:

וַֽיְהִ֥י כָל־הָאָ֖רֶץ שָׂפָ֣ה אֶחָ֑ת...
The whole word [had] one language...

The Targum Yerushalmi and Midrash Tanchuma assume that this language was Hebrew:

TY (ibid): בלישן קודשא הוה ממללין דאיתבריא ביה עלמא מן שרויא
They would speak in Lashon HaKodesh (Hebrew), for with it the world was created in the beginning. 

MT (Noach 19): שהלשון הראשון היו מדברים בלשון הקדש ובו בלשון נברא העולם
For the first language they [the generation of Babel] spoke was Lashon HaKodesh, with which the world was created.

This opinion, also brought down by Rashi and the Baal HaTurim, holds that the inhabitants of the world continued to speak Hebrew, the language of Adam. Quite a lot of time had passed since the creation of the world (1996 years, according to Artscroll), but even if the language had evolved quite a bit, but perhaps it could still be called "Lashon HaKodesh". Indeed, the Torah and the Mishnah were more than a thousand years apart, and despite major differences in language, both could be termed "Lashon HaKodesh". (Cf. Sotah 49b: והאמר רבי בא"י לשון סורסי למה אלא אי לה"ק אי לשון יוונית)

In the continuation of the story of Babel, Hashem causes the people to disperse, their languages becoming distinct from each other. Rashi (Bereishit 11:7), based on Bereishit Rabbah 38:10, understands this to have been an immediate change:

Rashi: זה שואל לבינה וזה מביא טיט, וזה עומד עליו ופוצע את מוחו

One would ask for a brick, another [misunderstanding] would give him mortar, and another would stand over him and bash his head.

According to this opinion, the existence of Proto-Semitic may be explained. Perhaps it was one of the languages which was created during the dispersion. Since the glottogenesis event was immediate, it did not need to descent immediately from Hebrew. Thus the divine hand caused it to come into being at this point.

An alternative approach is that of the Ibn Ezra (ibid), who states that the language change was a result of the dispersion. A similar opinion is recorded in Yerushalmi (Megillah 1:9), that even before the dispersion הוי מדברים בשבעים לשון they spoke in seventy languages, meaning that significant language change had already occurred, but as Rabbi Chaim Kanievsky explains, they could still understand each other. (I suppose either their languages were still mutually intelligible, or more likely they were all versed in one international language, which enabled them to conspire.) Presumably the dispersion then caused them to be unable to understand each other. Both the Ibn Ezra and the Yerushalmi are similar in that the language change happened as a natural consequence rather than as a result of divine glottogenesis.

Either way, one could still explain the existence of Proto-Semitic. Perhaps it did descent from Hebrew, but in such a long and complicated process that it is not traceable through historical linguistics. (Historical linguists to believe that comparative methods cannot reconstruct beyond a certain time depth.) This is somewhat difficult to say due to the short time frame involved (~2000 years), but the time-depth problem is one which also faces other sciences (geology, astronomy, biology) and thus the answers which they give may also apply here. It's also possible that some drastic language change processes occurred, such as pidginisation, or as the Ibn Ezra suggests, perhaps the groups deliberately changed their language in order to become seperate. All of these possibilities could create a dramatically new linguistic ecosystem, allowing for the later development of Proto-Semitic.

There is a seperate question of whether Hebrew continued to be spoken. For example, Avraham, alive during the dispersion, has his son Yitzchak afterwards, whose name has Hebrew etymology. I may deal with this in a later post, but it is beyond the scope of the immediate discussion. In any case, one can still say that the Hebrew spoken at the time of the giving of the Torah descended from Proto-Semitic rather than the Hebrew of Adam.

It also seems like there is a deeper question here. It's hard to deny that Hebrew has a special status Jewishly, both from a halachic standpoint (certain things, e.g. sifrei Torah and tefillin, must be written in Hebrew) as well as from other statements of chazal (e.g. darshening the shapes of the letters in the Hebrew alphabet). There are also other midrashim that state that the Torah was given in Hebrew (cf. Bereishit Rabbah 31:8, Megillah 2b). Does the sanctity of Hebrew preclude it having developed naturally from a previous language?

I would like to argue that this is not an issue. One could say that Hashem chose for proto-Semitic to develop into Hebrew specifically because of its intrinsic holiness. Indeed, it seems that one is forced to say this -- for if Hashem created Hebrew as the first language, then the chance that Proto-Semitic would randomly develop into the language that Adam spoke is infinitesimally small, and thus it must have been Divine providence which caused this to occur.

A literal reading of Jewish sources supports that Hebrew was the first language, and that Hebrew has intrinsic sanctity. This does not preclude the conclusions of historical linguistics about a Proto-Semitic language. The dispersion at Babel may have caused this language to come into being, which Hashem then caused to evolve over time into Hebrew.

I'm not sure if Hebrew as the original language of Adam is the only authentic opinion. Sanhedrin 38a states that Adam spoke Aramaic. Perhaps I'll investigate this further.

Sunday, July 8, 2012

Beginning with finals...

I'm back, after a long absence...

My latest armchair linguistics project is finding a convenient method to reconstruct Middle Chinese words using a few widely-spoken East Asian languages. According to Wikipedia, Middle Chinese (MC) was the variety of Chinese spoken in the 6th-12th centuries CE. My understanding is that most modern Chinese languages, as well as the Chinese loanwords found in vast quantities in Japanese, Korean, and Vietnamese, derive from MC. A particularly useful tool for reconstructing MC was rhyme tables which were written in China, listing groups of rhyming words; this in conjunction with comparative evidence leads to a pretty good reconstruction. See Wikipedia for a phoneme chart for MC. Right now my goal is to find a method to reconstruct MC words using easily-accessible data, without having to check out a book from the library. I'll post one interesting tidbit now, and perhaps follow up with some posts in the near future: In MC, as in modern Chinese languages, each morpheme (unit of meaning) is monosyllabic. (In Mandarin Chinese, for instance, a word like 中国 zhōngguó 'China' is composed of two monosyllabic morphemes, 中 zhōng 'central' and 国 guó 'country'.) Each Chinese character corresponds to one such morpheme/syllable.

Each MC syllable could end in either a vowel/diphthong (I'll write this as *0) or one of six consonants, *-m *-n *-ng *-p *-k *-t. Seven characters which are reconstructed with these endings are 四 'four' 三 'three' 人 'person' 上 'above' 十 'ten' 六 'six'  一 'one', respectively.

In some modern languages these endings (MC *0 *-m *-n *-ng *-p *-k *-t) are reflected as follows (with the pronunciations of the characters 四 三 人 上 十 六  一 in parentheses)
Cantonese: 0 -m -n -ng -p -k -t (sei3 saam1 jan4 soeng5 sap6 luk6 jat1) [the numbers represent tones]
Korean: 0 -m -n -ng -p -k -l (sa sam in sang sip ryuk il)
1Lengthens preceding vowel
It follows that either Cantonese or Korean alone can be used to reconstruct MC final consonants uniquely.

Mandarin: 0 -n -n -ng 0 0 0 (sǐ sān rén shàng shí liù yī)
Japanese: 0 -n -n 01 01 -ku/-ki -tsu/-chi (shi san jin jou juu roku ichi)
With Mandarin and Japanese combined, one can almost uniquely reconstruct MC finals, except for the distinction between *-m and *-n. Here is a verbal description of the process:
  1. If Japanese has -ku/ki, the MC final consonant is *-k
  2. If Japanese has -tsu/-chi, the MC final consonant is *-t 
  3. If Mandarin has -ng, the MC final consonant is *-ng
  4. If Japanese has a long vowel and Mandarin does not have -ng, the MC final consonant is *-p
  5. If Mandarin and Japanese have -n, the MC final consonant is either *-m or *-n
  6. Otherwise, there is no MC final consonant (0)
One last point to mention is that Vietnamese also has many Chinese loanwords, but they are a bit harder to find since Chinese characters (known in Vietnam as Chu Nom) have not been widely used in Vietnamese for close to a century. One can find Chinese loanwords by using an online Chu Nom lookup tool or Wiktionary -- Chinese characters were often pronounced with borrowed readings from Chinese, so entering a Chinese character into this tool will return a Chinese loanword cognate to the readings of the Character in other languages. Using this tool, we see that Vietnamese has the following reflexes of the MC final consonants:
Vietnamese: 0 -m -n -ng -p -c -t (tứ tam nhân thượng thập lục nhất)
Thus Vietnamese may also be used to uniquely determine MC final consonants, if one knows that a Vietnamese word is borrowed from Chinese.

Friday, August 26, 2011

Bukhori Echad Mi Yodea

I thought it was cool to find a video in Bukhori (Judeo-Tajik) with Bukhori subtitles using the Hebrew alphabet:

Tuesday, August 23, 2011

yeshivish orthography

Despite the lack of seriousness with which it is treated, it seems to me that the Yeshivish sociolect of English bears great similarity to other Jewish languages in their formative states. It is interesting to speculate how it would be written if the Hebrew alphabet were adapted to it, as happened to many other Jewish languages. Although I doubt this will ever happen given the contemporary sociolinguistic situation, here is an experimental proposal (which probably requires some tweaking):

1) Hebrew and Yiddish loanwords are written in their original orthography. Thus nezek = נזק, Yiddish = אידיש, etc.

2) English words which contain sounds that also exist in Yiddish are written phonetically according to Yiddish conventions:
soon = סון
eat = איט
table = טייבל

This allows acceptable variation where Yiddish orthography would also vary:
do = דו or דוא
nod = נאַד or נאד

Silent consonants may be included if desired:
debt = דעט or דעבט

3) The English consonants /θ ð ŋ w/ (written ) have the following representations:

/θ/ : ת
thimble = תימבל
(other options: ת', תה)

/ð/ : ד
this = דיס
(other options: ד', ת, תה, דה)

/ŋ/ : נג
thing = תינג

/w/ : ו; if needed to distinguish from vocalic ו or consonantal וו, use וא;
week = ואיק
well = ועל
womb = ואומב

4) Each word below shows how its first vowel is written. Note the influence of Daytshmerish-style doubled consonants on vowel quality:

cat = קאַטט
cot = קאַט
caught = קאָט
about = עבאַוט
[spott]ed = ספ?טעד
sit = סיטט
seat = סיט
date = דייט
bed = בעד
bird = בערד
bared = ביירד
but = באָט
put = פוטט
you = יו
my = מײַ
boy = בוי
no = נאָו
now = נאַו

Note that there is some ambiguity (the quality of אָ), but this is probably an acceptable amount. Also, the Daytshmerish doubling is optional and lengthening ה's may be used instead or in addition, e.g. סיהט = seat.

Now to illustrate this orthography, here is an excerpt from the Yeshivish Gettysburg address:
בערך א יובל אַנד א האַלף עגאָו, דע מייסדים שטעלד אַוועק אָן דיס מקום א נײַע מלכות ואית דע כוונה דאטט נאָואָן שולד האוו בעלות אָואווער דייר חבר, אנד אָן דיס יסוד דאטט עווריואָן האזז דע זעלבע זכותים.
Be'erech a yoivel and a half ago, the meyasdim shtelled avek on this makom a naiya malchus with the kavana that no one should have bailus over their chaver, and on this yesoid that everyone has the zelba zchusim.

Monday, July 18, 2011

doof duif

I've been doing some reading lately on my heritage language, Litvish Yiddish. The "Standard Yiddish" variety which is found in literature and academia is pretty close to Litvish, but I found to my frustration that Litvish forms aren't totally derivable from Standard. The main difference is that Standard has /oi/ in many words where Litvish has /ei/, e.g. Standard oivn, Litvish eivn 'oven'. The catch is that sometimes both have /oi/, e.g. Standard=Litvish hoiz 'house'. The question for me was: how can I learn Litvish from a book that teaches Standard?

Now, it turns out that Litvish /oi/ is the reflex of one particular vowel (something like */uw/) in Proto-Yiddish, denoted by the number 45 in Yiddish linguistics. Cognate words in Middle High German (MHG) have /u:/, and in Modern German /au/, e.g. German haus 'house'. Unfortunately Modern German /au/ also results from MHG /ou/, cognate to Litvish /ei/ e.g. MHG ouge, Modern German Auge, Litvish eig (Standard oig). This does, at least, mean that if a word doesn't have /au/ in Modern German, it won't have /oi/ in Litvish, so for example Modern German Brot 'bread' is breit in Litvish (cf. Standard broit).

Eventually I discovered that that the key lies with Dutch. Dutch also originally had */u:/ at an early stage in words with Yiddish 45, which eventually developed into /œy/, spelled ui. This is the only source of the vowel ui in Dutch. Thus to determine the form of a Standard word in Litvish, replace any "oi" with "ei" unless its Dutch cognate has ui. For example, Standard toib may mean either 'deaf' or 'dove'. Its cognates in Dutch are doof 'deaf' and duif 'dove'. Thus in Litvish, the words are teib 'deaf' and toib 'dove'.

Incidentally, some Litvish Yiddish varieties have /eu/ in words where the Standard has oi and (conventional) Litvish has ei, for instance having breut for broit/breit 'bread'.

Here is a nice map of the distribution of the Yiddish words for 'deaf' and 'dove' in Litvish Yiddish.