Sunday, September 17, 2006

Thunder rain sky spirit forbids to hit hand machine

Last March, before going to Beijing, I thought I would try to learn a few useful characters and sentences. As it happened, I did not have enough time to really learn anything useful before the trip, but I have been fascinated by the language ever since, and I have continued studying. I hope this is not a metaphor for the relationship between theory and practice in computing.

After having lived in the US for almost ten years, the way I pronounce interesting, pseudorandom, and other long words is still the butt of jokes, so I am under no illusion of ever speaking understandable Mandarin. I would like, however, to make some progress on reading and writing Chinese and on understanding Mandarin as spoken by a Beijinger or a Taiwanese.

(If you speak Chinese, either stop reading here, or by continuing reading, you pledge not to make fun of my neophyte enthusiasm.)

An educated Chinese speaker knows at least 5,000 characters, and a basic level of literacy corresponds to about 2,000 characters. I hope to eventually learn the 1,067 characters in the main part of this book. There is a method to the madness of so many characters. There are about 200 basic components, called radicals, of which all characters are made of. In the simplest cases, the radicals combine to give the meaning: for example the character 好(hao) is a combination of the radicals 女, "woman," and 子, "child," and it means "to love," "to be good," and also "good" as an adjective, or 安(an) is a combination of the radicals for "roof" and for "woman," and it means "peace." (There is peace if there is a woman in the house.) In other cases, one combines a similarly pronounced character, which suggests the pronounciation, with a radical that suggests the meaning. For example 客 (ke) means "guest" and contains the radicals for "roof," "to follow" and "mouth." The explanation is that if combines "roof," which suggests the meaning, with the character 各 (ge) which suggests the pronounciation. Why 各 (ge), which means "each," is made of "to follow" and "mouth," I have no idea.

Knowing many characters is not, however, enough to have a good vocabulary. Many words, in fact, are composed of two (sometimes three) characters. Sometimes, the combination makes perfect sense. For example, 电 (dian) means "electricity," 视 (shi) means "to look at" and 机 (ji) means "machine," hence 电视机 (dianshiji) "television." Or consider that 避 (bi) means "to avoid," 孕 (yun) means "(to be) pregnant" and 套 (tao) means "case" (as in pillowcase), hence 避孕套 (biyuntao). Other combinations are strange, for example 太 (tai) means "too" (as in "excessively"), but 太太 (taitai) means "wife," or 东 (dong) means "East," 西 (xi) means "West" and 东西 (dongxi) means "something."

Anyways, now that I have learnt a little bit of the language, I thought I would go back to some pictures of signs that I had taken in China and see if I could reconstruct what they meant.

So here is one sign:

I start by looking up the characters in a dictionary, but how do you look up a character in a dictionary? There is a shortcut if you know the pronounciation, but what about a character you know nothing about? We said each character is made of a set of radicals, and one radical is considered the "main" radical for the character. I don't quite understand how you recognize it, but at worst one can do trial and error. Another fact is that by looking at a character it is typically possible to reconstruct how it is supposed to be drawn, and how many strokes it takes to draw it. With this information (main radical and total number of strokes) you go to the dictionary, which has an index of radicals, and then, for each radical, all characters that have it as a main radical, ordered by number of strokes, and you find your character. It is interesting that the way we look up a word in a dictionary for an alphabetic language is essentially binary search; here, instead, we have more of a hash function that maps a character to the pair (radical,strokes), and collisions are handled by linear search.

Back to the picture. We have the characters
雷 (lei) 雨 (yu) 天 (tian) 气 (qi) 禁 (jin) 打 (da) 手 (shou) 机 (ji)

Where 雷 (lei) means "thunder" and 雨 (yu) means "rain," so together they are "thunderstorm." Then we have 天 (tian), which means "heaven" or "day," and, in this case, "sky" and 气 (qi) which means "breath," "energy" or "soul." Is it heavenly spirit? No, 天气 (tianqi) means "weather," and it's a two-character word. So the first part is sort of "thunderstorm weather." Then 禁 (jin) means "to forbid." 打 (da) means "to hit," and sometimes it means "to play," as in playing a musical instrument or, more generally, operating a machine, especially one that produces sound. 手 (shou) means "hand" and (remember the TV) 机 (ji) means machine. The "hand machine" 手机 (shouji) is a cell phone. So
It is forbidden to use cell phones during a thunderstorm


(If you can't see the characters in this entry, and you are using Windows XP, go to start->control panel->regional options->regional options->languages and check the "Install support for East Asian Languages" box. It just takes a few seconds.)


  1. Blogger Unknown
    9/17/2006 10:03:00 PM

    I recommend learning the traditional characters first. Learning traditional characters makes it easy to learn simplified characters or Japanese kanji. Going from simplified to traditional is much harder.

    Most of the Chinese books I want to read are published in Hong Kong or Taiwan or are older books from Singapore or are in libraries as pre-language reform (there are of course worthwhile contemporary books from the Mainland, but they seem scarce.) The new huge Eslite bookstore in Taipei near 101 -- a five story bookstore that is open 24 hours (not Page One) -- has a special "mainland room" with about 10,000 volumes in it, so it is easy to compare the mainland and Taiwanese versions of texts. I usually find the Taiwanese versions better.

    However, I learned the Joyo kanji (and even their kyuujitai forms) before I learned hanji, so your experience may vary.

  2. Anonymous Anonymous
    9/17/2006 10:15:00 PM

    I guess "好(hao)" is mostly translated to "good". You've known that "好" is a combination of "woman(女)" and "child(子)", but do you also know that actually the word 女子 means "lady".

    There is another Chinese character: 妙(miao), which means "wonderful". You know what's the Chinese meaning of 少女? It means "girl" or "young lady". Ok, now: a lady is good, but a young lady is wonderful. :)

    Enough about joking. Reading this post reminds me of what I learned from primary school. I almost forgot those stuff since after many years of using a language, all these basic facts/rules seem so natural or trivial. But for anyone who doesn't know Chinese, it's the first step to learn.

    This is exactly what I've been doing here in US. My English was really not good when I first moved to this country. I hoped to speak good English, so I tried to mimic others' words/tones/expressions. But sometimes one used an tone to stress something. I didn't know that; I thought it was the way in which the English sentence should be spoken. But that tone/expression may be not appropriate, even disrespectful, in another conversation scenario. Sometimes I learned from others' reaction, sometimes not. I guess this language problem must have happened to everyone, but native-speakers may have forgotten this. I feel that I'm doing what people did in their primary schools.

    Not surprising, this also happens due to the cultural difference, which may be more subtle. But anyway, practice (and only practice) can help. I'm happy to see that I'm improving myself on this. :)


  3. Anonymous Anonymous
    9/17/2006 10:47:00 PM

    Prof. Luca,I live in Hong Kong.I speak Mandarin Chinese,and if you need any help on it,I'm ready to give you a hand.:-)

  4. Blogger Luca
    9/18/2006 08:47:00 PM

    SY: I did not know about 妙, it's very cute.

    The best way to learn English (or any other language that one has to learn as adult) is to find yourself a girlfriend/boyfriend who speaks English as a first language and speaks no other language you know. Then (i) you will have an extra motivation to learn; (ii) you will practice a lot; (iii) you will practice with considerably more complex conversations than the ones you would have with friends.

    (In the spirit of complexity theory, I am offering a reduction from a hard problem to a harder one.)

  5. Blogger Luca
    9/18/2006 08:58:00 PM

    Hi Doug, thanks for your comment. I was very undecided which system to study, especially because I got conflicting advice from different people. Typically, those who have studied simplified characters in school tend to downplay the "one-way" problem you describe and suggest the simplified system, while the Taiwanese abhor simplified characters, and point out the "legacy" problem of older books. The best story I heard is that ai, "to love," is 愛 in traditional characters, with a heart 心 in the middle, but it's 爱, without the heart, in simplified characters. And, really, who wants to write in a system in which love has not heart. I had not thought of the possibility of studying Japanese in the future, it's another good point.

  6. Anonymous Anonymous
    9/18/2006 11:14:00 PM

    My best advice for learning chinese is to date a chinese person.

  7. Anonymous Anonymous
    9/19/2006 03:17:00 AM

    What is also interesting is how two contemporary and geographically adjacent civilizations could address the language problem with such radically opposite solutions.

    While Mandarin has more than 5000 atomic structures, Devanagri Sanskrit, like English has just a handful. I gather from what Prof. Luca has said, that Chinese words are usually aggregates of a few of these atomic charecters. In sanskrit you just pile them on and on, constructing massive words that often translate into several english sentences. Its the oldest and most classic formalization of RISC vs CISC i guess.

    Sorry if the comment is tangential to the original subject. Just a thought.

  8. Blogger Luca
    9/19/2006 05:49:00 PM

    I have heard this RISC/CISC analogy before, but my impression is that it goes in the other direction.

    If you look at the smallest units that have an independent semantic value, Chinese has only 5000 characters, of which 2000 suffice in most cases, while Indo-European languages like English and Italian (and, I suppose, Sanskrit, their common ancestor) have tens of thousands of words.

    Then, in Chinese, you get more words by combining characters in interesting ways (as in electric-brain for "computer," electric-see-machine for "tv," a-lot-a-little for "how much," short-tall for "height" and so on). In my opinion, it's Chinese that is more like RISC.

    If you are counting the number of letters in the alphabet, then the right analogy is the number of different types of strokes, of which I don't think there are much more than a dozen or so.

  9. Anonymous Anonymous
    9/20/2006 05:04:00 AM

    Does it?

    A large portion of English/Italian/Sanskrit words are proper nouns like you said. Possibly the outcome of specifying everyday objects over ages of scientific progress - so that they can be conveniently addressed : e.g. "Bob, throw me the refrigerator!"
    If we discount a large number of words like these - stripping it down to its "primitive" primitives - then would the expressive ability of phonetic script based languages reduce? Throw a refrigerator back in time, and whos to say you couldnt communicate the idea by calling it a ColdBox? This ofcourse would be the pictographic script approach (Chinese/Japanese) and it would count as a different word in written English.

    Also, a large number of English words simply specify degree - the relative forms, the hundreds of synonymous adjectives, verbs, etc.. Strip all these redundancies away as well. I do not know if Chinese has these as well, but surely not as many.

    Once we have the fundamentals, we can measure. Obviously, the size of your primitives will determine expressive ability. Too coarse and your language is all about approximation and interpretation (e.g. "electric-brain"). Too fine grained and you end up with Advanced Webster. But IMHO, neither of these represent the power of the essential approach.

    Doesnt Chinese also suffer from overspecification by the learned few (a la the Complexity Zoo)? :)

  10. Anonymous Anonymous
    9/20/2006 05:09:00 AM

    I guess I concur in part. But funnily enough I get the feeling that grammar and sentence construction rules will determine which is RISC and which is CISC on a subjective case by case basis. The question is : Is the computecycle-memory tradeoff objectively assessible in this case?

  11. Anonymous Anonymous
    9/21/2006 08:11:00 AM

    To lookup ideograms by shape or to learn with etymology, you need this book : (Traditional)

  12. Anonymous Anonymous
    9/21/2006 04:53:00 PM

    The word is pronunciation.


Post a Comment

<< Home