ARCHIVE - news aggregator | The Science of Language

Navigation

User Login

Events

Language Log

Linguist List: Discussion

Linguist List: Book Reviews

Linguist List: Journal Contents

Linguist List: Media

news aggregator

Quadrilingual Washlet Instructions

Language Log - Sat, 2009-08-22 07:30

Half an hour before touchdown at Narita, the pilot turns on the "fasten seat belt" sign. Because something (or some things) served during the in-flight meals on the 14-hour flight did not quite agree with your alimentary tract, you are already experiencing ominous rumblings down in your bowels.

You do your best to ignore the bouncing and jolting of the huge 747 as it descends through the various layers of stormy clouds. Breathing deeply and slowly, you focus all of your thoughts on the first toilet you will encounter when you enter the terminal.

Finally, the plane screeches to a halt, then slowly, ever so slowly and with many pauses and turns, it taxis to the gate. Since you know that you will have a major evacuation and it may take some time, you deplane along with everyone else. But, horrors! You are guided down lengthy hallways and escalators, then stand in line to wait for a bus that will take you to another part of the terminal to go through immigration. After arriving at the immigration hall, you stand in line, alternating between doing a jig and exercising maximum sphincter control. At last you pass through immigration and customs, then race to the nearest toilet you can find, open the door, dash to he only unoccupied stall you can find, enter, and come face to face with THIS.

What to do? Which button(s) to push? You can't even spot which language to read for any given instruction. There's no help for it but just to sit down, do your business, and read all the instructions later, hoping that such rashness will not lead to a major calamity in the WC.

In case you couldn't guess from the coinage, a "washlet" is a toilet that also washes your bottom and does all sorts of other fun things to make you feel nice and clean. The Japanese invented washlets and have become increasingly dependent upon them.

Here are a couple of videos that demonstrate how they work.

The star of the second video is W. Hodding Carter IV (son, grandson, and great-grandson of other distinguished Hodding Carters).

He's also the author of Flushed: How the Plumber Saved Civilization.

As someone who learned to like Japanese squat toilets back in the 70s (see paragraph 5 here), I must say that I was quite intimidated the first time I encountered a top-of-the-line washlet. But once you get used to them, you can't do without them. I'm pretty sure that's why Mr. W. Hodding Carter IV had one installed in his house up in Maine.

My thanks to Miki Morita for sending me the photograph of the instructions.

Ask Language Log: Prescriptivism in Europe

Language Log - Fri, 2009-08-21 04:51

From yesterday's mail:

An idle question from a big Language Log fan: Do you have any idea if the nice folks in, say Germany or Italy or Spain, go as nuts as Americans seem to when native speakers make "fundamental" grammar errors?

It appears that the strong form of "going nuts" that we've called word rage is mainly an Anglophone phenomenon, with the British as the originators and still the champions. But the sociolinguistic settings in Germany, Italy, and Spain are very different from the situation in the U.S. — and as a result, they have their own kinds of language wars over there.

The most obvious difference is the role of traditional local language varieties. Each of the European standard languages developed in the midst of a complex dialect continuum, where differences increase with geographical and social distance, and enough distance creates differences like those between German and Dutch, or French and Italian. As a result, many if not most Europeans speak a local "dialect" that is very different in morphology, pronunciation, and word stock from the standard national language that they also control to one extent or another; and in practice, the local and standard varieties are often mixed to a variable degree depending on circumstances.

Something of the same kind is also true in the U.S., but the differences are generally not as great.

Do our European friends comment among themselves as often as Americans seem to about their neighbor's grammar?

I don't know the facts about the conversational density of metalinguistic commentary, and I don't think that anyone has ever studied this empirically. But in Europe, there's more to talk about, since geographical and social differences among language varieties are bigger and more complicated.

Do Germans hotly chastise newspaper editors for an occasional faulty case? Do the Spanish roll their eyes when a writer fails to employ the subjunctive? Do Italians suspect the imminent demise of civilization if a subject and verb fail to agree?

There's apparently quite a bit of concern in Germany about the fate of the dative case, with journalists very much under the gun on this question. I'm not sure about the ideology of mood in Spanish, but there was a fair amount of discussion a few years ago about whether a Francophone mass murderer used the subjunctive appropriately.

Those are both instances of concern about the evolution of the standard national language, and there are plenty of those around. But most European countries have one or more governmental institutions charged with establishing and maintaining language standards — the Institut für Deutsche Sprache, the Académie française, etc.– and perhaps this makes the citizenry less prone to take up pitchforks and torches on their own initiative.

A different sort of struggle is described by Jillian Cavanaugh in "Remembering and Forgetting: Ideologies of Language Loss in a Northern Italian Town" (Journal of Linguistic Anthropology, 14(1), 2008):

One wintry afternoon in the northern Italian town of Bergamo, I had coffee with Giani, a retired engineer in his mid-sixties, and talked about the impending loss of his dialetto (‘dialect’), Bergamasco. Giani, speaking Italian with a strong Bergamasco accent, told me a story about the punishment children endured for speaking Bergamasco when he attended elementary school. Every morning, the first time the schoolteacher heard a child say something in Bergamasco, he or she would hand that child a wooden baton, which the children cheekily called—in Bergamasco—a bastù. This child held the baton until they heard another child speak Bergamasco, and then passed it on; this continued throughout the day. At the end of the school day, the teacher called the last unlucky child to the front of the class and made them tell who had passed them the baton. That child was then called up to the front to tell who had given it to them, and so on back to the original offender. The entire chain of Bergamasco-speaking children was then punished in front of everyone else (often with a strap). The next day, the gruesome relay began again. Giani laughed as he told me that on some days, practically the whole class would end up in the front of the room. “Everyone was poor and spoke Bergamasco,” he said, “and though we suffered for it, we all got through it.”

In most European countries, I believe that struggles of this kind remain very much alive. In some cases, the result is the loss of the local variety; in other cases — for example Catalan — the local variety has become established as a standard in its own right. In either case, the struggle seems to leave less ideological energy to spare for questions like whether a change in word sense threatens the foundations of civilization.

Preaching the gospel of wrong is right?

Language Log - Thu, 2009-08-20 08:46

If you want to see all the illogic and angst of the prescriptive poppycock merchants on display, Howard Jacobson provides one-stop shopping. I don't think the UK has a more unprepossessing columnist of the foaming-at-the-mouth language-is-going-to-the-dogs persuasion. Oddly, he is not in the Telegraph but in the relatively liberal Independent. You might (or you might not) want to look at the way his last piece of rambling, ranting, frothing bitterness ends. It is entitled "In the face of overwhelming ignorance, it is the pedant's duty to keep battling on". Read on if that title holds any appeal…

We have no regard for schoolteachers. There are countries where to be a schoolteacher is to enjoy considerable esteem. Here, we pay them badly and encourage our children to treat them with contempt. The reason for this is that we fear learning and would rather mock it than acquire it. No one must draw attention to our ignorance, no one must teach us how to think, how to say what we mean or how to mean something better, no one must correct our spelling or our syntax or our speech. The very concept of correction is anathema to us.

The capitulation of the pedant himself to this free-for-all of knowing nothing was in evidence this week on Fry's English Delight, Stephen Fry's Radio 4 programme about the English language. A schoolmasterly man himself, Fry listened, I thought, with regret, as an assortment of language experts — I mean no disrespect: some of my best friends are lexicographers and linguisticians — preached the gospel of wrong is right because whatever the people decide to make of language is what language must become.

Say less when you mean fewer, infer when you mean to imply — none of it matters because what the unlettered populace does with words today the rest of us will meekly do tomorrow. Brute proof, of course, is on the side of those who argue in this fashion; yesterday's sins do indeed become forgotten in the democracy of usage. But that doesn't mean there is not a vice called illiteracy, and that we shouldn't, every now and then, seek to save something from its all-devouring maw.

Take the uninterested/disinterested confusion which Fry's programme mentioned. It is true that these words have changed places over time; that disinterested once meant unconcerned and uninterested meant without bias, whereas it is now the other way around. Or would be the other way around had absence of bias not become a forgotten concept and unconcern — do I look bovvered? — not carried all before it. But it is not pedantry for pedantry's sake that makes one argue for the retention of disinterested. It is because the state of mind it describes — freedom from self-seeking, preparedness to think and act impartially, without taking account of personal advantage, a grand carelessness of profit — is one we cannot afford to lose.

Differentiation matters. Ignorance is not argument. Disinterestedness is not another word for "Whadever!". We are quick to outlaw words when they don't suit the temper of the times. We should, to defy the temper of the times, try rescuing a few.

What is this stuff about preaching "the gospel of wrong is right"? (I'm afraid I do not listen to Stephen Fry's Radio 4 series on language. I have heard promos, and a snatch of it, and despite being a Stephen Fry fan I find it unbearable.) It's hard not to read Jacobson as declaring wrongness to be permanent. That is, he seems to deny that the spreading of what was once an error or a confusion can eventually solidify into a feature of the language. Yet Jacobson admits that the words "disinterested" and "uninterested" have changed their meanings during the recorded history of English and he accepts the newer, changed meanings, so he is not being consistent.

Jacobson also shows some signs of being in the grip of the fallacy that if anything goes, everything goes. If ever some form of words once thought to be a solecism is taken to have become part of Standard English, then all is lost. Acceptance would be capitulation to the schoolteacher-hating ignorance of a culture that tosses aside all generalizations about usage, refusing to accept them precisely because they involve judgment and the possibility of correction. "Say less when you mean fewer, infer when you mean to imply — none of it matters…", he wails. But why does none of it matter, just because one opinion about acceptable usage is revised?

I am not suggesting that there is anything to revise about disinterested and uninterested, by the way: I am not a fan of the tendency to use the former for the meaning that the latter standardly bears. (And it's interesting, I think, that there seems to be no current sign of any tendency to shift meanings in the other direction. The two words are not collapsing together.) But suppose we did decide that it had become standard for disinterested to be ambiguous between "unbiased" and "uncaring". Why would that imply a cataclysm of abandonment, a whole domino series of cascading usage mergers?

I happen to think that the generalization about less and fewer (the former goes with non-count nouns and the latter with count nouns), which Jacobson mentions, has been erroneously formulated by many usage authorities. (This is particularly clear when we consider count nouns that are units of time: I am unable to believe that less than five years violates the syntax of my native language.) But that doesn't mean I have to toss away the distinction between imply (something that the speaker does) from infer (something that the hearer does in response): that distinction seems well grounded, and I am happy to follow the usual dictionary descriptions of it. Certainly, I am
not bound to abandon that distinction just because I have a revisionist opinion about less than.

For those who take an intelligent interest in language, there can be reasoned discussion about what exactly the rules exclude or permit — discussion that is disinterested in the modern sense, rather than committed in advance to a defense of current conservative dogma and uninterested in hearing anything to the contrary. But not for Howard Jacobson. For him it seems to be a choice between, on the one hand, adherence to every single rule any purist nutball has ever defended, and on the other, flushing all syntactic and lexical distinctions down the toilet. I reject this insane dichotomy.

People report that Jacobson has given up his former academic pretensions (he once taught in higher education institutions), and that when not pretending to be apoplectic over dangers to the English language he writes extremely funny novels about Jewish life in Britain. It's odd how little of his humor comes through when he writes about English instead of in it. But I think I've said as much before, in "Educational sky is falling says blithering windbag" and at the end of "Canoe wives and unnatural semantic relations".

Levels of misunderstanding

Language Log - Thu, 2009-08-20 03:50

The most recent xkcd:

(The original has the title tag "You know what really helps an existential crisis? Wondering how much shelf space to leave for a Terry Pratchett collection.")

Smallpox / Ceiling Light

Language Log - Wed, 2009-08-19 12:22

Fail Blog has a picture of a panel with two switches labeled as follows:

天花燈夜燈
SMALLPOX NIGHT LIGHT

This photograph elicited considerable discussion at Fail Blog, but — despite well over 150 comments — there was much consternation and little comprehension of why or how the confusion occurred. The quality of the discussion at ADS-L was much higher (though far more limited), yet still left a number of questions unresolved.

Since, in the past, many Chinese friends (and even many Chinese teachers) have asked me why the Mandarin words for "smallpox" and "ceiling" share the same two characters, I've decided to make a fairly determined effort to explain how it happened. Here's the etiology, not of smallpox, but of the failure.

夜燈 means exactly what the translation on the panel says: YE4DENG1 — "night light" — so we won't worry about that.

天花燈 is TIAN1HUA1DENG1 — "sky flower light." How in the world do we get "smallpox" out of that?

The problem arises because the word for "ceiling" in Mandarin is TIAN1HUA1BAN3 天花板 ("heaven flower board," a reasonable enough term since proper ceilings were decorated and "heaven" signifies "above," hence, "a decorated board above"), while the word for "smallpox" is simply TIAN1HUA1 天花 ("heaven flower[s]"). Obviously, the label on the left should have been "ceiling light," not "smallpox."

There is also a more scientific term in Chinese for "smallpox" and that is DOU4CHUANG1 痘瘡, but it is much less used than TIAN1HUA1 (Google hits 3,390,000). Aside from the simple fact that it is more technical, I suspect that people avoid DOU4CHUANG1 (Google hits 88,400) partly because it looks and sounds a lot more scary than TIAN1HUA1. First of all, both characters have the frightful Kangxi radical 104 for "illness, sickness" on the top and left side; just looking at 痘瘡 bashes you with a double dose of disease. Second, the first character means "pox," and its phonophore calls up associations of some bean-like eruption. Third, the CHUANG1 character means "skin ulcer," not a very pleasant thing to contemplate. Certainly, TIAN1HUA1 天花 ("heaven flower[s]") is a lot easier to deal with than DOU4CHUANG1 痘瘡 ("bean-like pox — skin ulcer")!

There is much controversy in the medical literature over just why "smallpox" is called TIAN1HUA1 in Chinese. Most people would agree that the HUA1 ("flower[s]") part refers to the appearance of the pustules that cover the body of the afflicted. See the photographs here and here.

Now comes the hard part: why should smallpox pustules be characterized as "heaven(ly)"?

Fundamentally, there are two main theories in Chinese medical thought about the characterization of "heaven(ly)" for smallpox pustules. The first is that they were caused by a smallpox deity (TIAN1 can also refer to deities). Others hold that the smallpox deities (which were very much in evidence in premodern Chinese towns and villages) were there to protect people from smallpox, not cause them to get it. The second main theory is that smallpox was "natural, innate, inborn"; I shall explain in detail below what the connection with TIAN1 is in this case. A growing consensus among contemporary researchers seems to accept the second main theory over the first one.

I would add an additional theory of my own that I don't think has ever been broached before. Namely, perhaps "heavenly flowers" was used as a sort of euphemism for this horrible disease. I do not think that, if indeed such a euphemism were operative to any degree, it necessarily would have been employed instead of one or another of the above explanations, but rather it might have been used concurrently as a way to soften the harsh reality of the affliction. The main reason I make this suggestion is because the expression TIAN1HUA1 天花 ("heaven flower[s]") was already well established in Buddhist terminology before it was applied to smallpox. Consequently, it would have been a familiar term that could have been used euphemistically in a novel way to refer to smallpox pustules.

TIAN1HUA1 天花 ("heaven flower[s]") was the Chinese translation of Sanskrit KHA-PUS.PA = KHA-CITRA ("a picture in the sky") — anything impossible or not existing. TIAN1HUA1 天花 ("heaven flower[s]") was also the Chinese translation for various Sanskrit plant terms, but I don't want to go into them because they are too botanically complicated and not really essential for our purposes anyway. I should mention, however, that one of the plants with which TIAN1HUA1 天花 ("heaven flower[s]") has been associated is Hibiscus mutabilis (common name "cotton rose"). If we look at pictures of Hibiscus mutabilis, the red ones do bear a resemblance to smallpox pustules (the white ones resemble the final stages of the flaking scabs to a lesser degree).

Be that as it may, KHA-PUS.PA, or TIAN1HUA1 in Chinese translation, were the divine flowers in the Lotus sutra, one of the most popular Buddhist texts in East Asia. These divine flowers in the Lotus sutra were of four kinds, two red and two white. It is curious that, in the 10-12 day period of development of the disease from macules to papules to pustules to lesions and finally scabs, they pass through stages of firm, fleshy redness to flaky, depigmented whiteness.

Another way of writing TIAN1HUA1 天花 ("heaven flower[s]") in Chinese is TIAN1HUA1 天華, which also means ("heaven flower[s]"); this form was used to translate Sanskrit DIVYA-PUS.PA ("divine flower" — Ner[i]um odorum).

Smallpox became endemic in China around the 10th century, well after the Buddhist terminology in the Lotus sutra had become established and people were thoroughly familiar with the notion of TIAN1HUA1 天花 ("heaven flower[s]").
Once smallpox was endemic, it became a disease of children, almost a rite of passage. If they survived smallpox, they were safe and had a good chance of growing up to adulthood. People began to assume that some component of smallpox was inborn, a "fetal poison" (TAI1DU2 胎毒) that everybody carried around — the toxic residue of conception, some said — and that under the influence of seasonal energy (SHI2QI4 時氣), it would erupt into a case of smallpox. In this sense, smallpox was "innate" ("inborn," "natural" — in Chinese, TIAN1 天 can imply all of these things as well as "heaven"). This theory of the "fetal (i.e., innate) poison" that could potentially cause smallpox was already prevalent from the Tang period (618-907).

The theory of "fetal (i.e., innate [heaven-born] poison)" also ties into Chinese medical ideas about XIAN1TIAN1 先天 ("pre-heaven, i.e., congenital") traits and HOU4TIAN1 後天 ("post-natal") disorders.

Just before I finished writing this blog, Randy Alexander wrote to me from Jilin (China) and mentioned that he had made a new addition to the discussion at ADS-L. Randy's remarks are mostly focused on Manchu materials but are very helpful for understanding the sensitivity of late imperial Chinese toward this terrifying illness. There can be no doubt that the Manchus dreaded this disease; two of their emperors died from it, Shunzhi (1638-1661) and Tongzhi (1856-1875).

The leading researcher on the history of smallpox in China is Chia-feng Chang. Here are some of his important publications on the subject (Randy mentions the last one in his post):

Chang Chia-feng. 1995. “Strategies of Dealing with Smallpox in the Qing Imperial
Family,” in Hashimoto, Jami, Skar, eds., East Asian Science, 199-205.
______. 1996a. “Aspects of Smallpox and its Significance in Chinese History.”
Ph.D. dissertation, University of London, School of Oriental and African Studies.
______, [張嘉鳳]. 1996b. “Qing chu de bi dou you cha dou zhidu” 清初的避痘與查痘
制度 (Eradicating and diagnosing smallpox during the early Qing dynasty”).
Hanxue yanjiu, vol. XIV, no. 1, 135-56.
______, [張嘉鳳]. 1996c. “Qing Kangxi huangdi cai yong rendou fa de shijian yu yuan
yinshi tan,” Zhonghua yishi zazhi 26.1, 30-2.
______. 2000. “Dispersing the Foetal Toxin of the Body: Conceptions of Smallpox
Aetiology in Pre-modern China.” In Conrad and Wujastyk, eds., Contagion:
Perspectives from Pre-Modern Societies, 23-38.
______ 張嘉鳳. 2001. “‘Jiyi’ yu ‘xiangran’—yi Zhubing yuanhou lun wei zhongxin
shilun Wei Jin zhi Sui Tang zhijian yiji de jibing guan” ‘疾疫’與‘相染’以
“諸病源候論”為中心試論魏晉至隋唐之間醫籍的疾病觀 (“Epidemics and
Contagon: Using the Treatise on the Origins and Symptoms of Various Diseases
to discuss the medical perspective on illness from the Wei and Jin to the Sui-Tang
period”). Taida lishi xuebao 27 (June): 37-82.
______. 2002. “Disease and Its Impact on Politics, Diplomacy, and the Military: The
Case of Smallpox and the Manchus (1613–1795)” Journal of the History of
Medicine and Allied Sciences 57.2: 177-197.

Thanks to Che-chia Chang, Marta Hanson, Hilary Smith, Charlotte Furth, and Wenkan Xu for assistance with the medical literature.

The truth about iqualuit

Language Log - Wed, 2009-08-19 09:37

In response to my question here, an authoritative answer from Alana Johns, who was asked by Ewan Dunbar, who was asked by Bill Idsardi:

iquq means stuff hanging down around the anus (dingleberries?). S___ says when they were kids they would tease each other by calling each other "iquq" (in English we also say "you dirty bum!")

Adding -aluk would intensify the noun 'large, impressive' and then of course it is pluralized with -it:

iqu(q )+ alu(k) _it 'many large dirty bums' → iqualuit

BUT iqaluit (the name of the capital of Nunavut) is

iqalu(k) 'fish, normally char' + it plural → iqaluit

For Americans, perhaps a more idiomatic translation of iqualuit would be "big poopybutts", or "major dingleberries".

Next question: is the syllabification i-qu-a-lu-it vs. i-qa-lu-it, so that the place name has one fewer syllables than the (Inuktitut pronunciation of) the wrongly spelled version?

An Old Person's Guide to "No Homo"

Language Log - Tue, 2009-08-18 19:10

Those who enjoyed Penny Arcade's take on ghey may also like Jay Smooth's "Old Person's Guide to 'No Homo'":

Why journalists need to know morphology

Language Log - Tue, 2009-08-18 14:11

According to Terry Pedwell, "PMO Iqaluit bumble draws smiles, frowns", The Canadian Press, 8/18/2009:

A bumble by the Prime Minister's Office has residents of Nunavut alternately chuckling and cringing.

A news release sent out Monday outlined Prime Minister Stephen Harper's itinerary as he began a five-day Arctic tour.

The release repeatedly spelled the capital of Nunavut as Iqualuit - rather than Iqaluit, which means "many fish" in the Inuktitut language.

The extra "u" makes a big difference.

"It means people with unwiped bums," said Sandra Inutiq of the office of the Languages Commissioner of Nunavut.

I suspect that I'm not alone in hoping for an interlinear transcription, along with some related examples and discussion of the relevant phonological, morphological and syntactic issues.

[Hat tip: Peter Breslauer]

[Update — morphological and lexicographic details here.]

A little more on Stephen Hawking

Language Log - Tue, 2009-08-18 08:33

Sarah Lyall's piece "An Expat Goes for a Checkup" (front page of the NYT Week in Review, August 16) disusses American attacks on Britain's National Health Service (and affronted British responses, and her own experiences with the NHS), leading with the Investor's Business Daily invoking the physicist Stephen Hawking in an August 3 editorial opposing Barack Obama's health care proposals. As Geoff Pullum posted here last week, IBD (an American enterprise) barreled into the matter with the (utterly mistaken) assumption that Hawking is an American. The question Geoff asked was where IBD got this idea.

It's possible that the people at IBD just took it for granted that any really eminent scientist would be American, but Geoff had a cleverer idea: that Hawking's voice synthesizer (which allows him to communicate despite grave neurological deficits) doesn't sound at all British.

I suggested to Geoff that the (U.S.) Presidential Medals of Freedom might have contributed to IBD's misapprehension, via the assumption that such awards go only to Americans. But this is a misapprehension as well; this year's 16 recipients included not only Hawking, but also Archbishop Desmond Tutu, Mary Robinson (former president of Ireland), and Bangladeshi economist Muhammad Yunus. Unfortunately for my idea, the timing is bad: the award ceremony was on August 12, more than a week after IBD's editorial. So it looks like it's back to the speech synthesizer proposal.

[Addendum: Melissa K. Fox writes to say, "while I think Dr Pullum's speech-synthesizer explanation for IBD's confusion has a lot in it, I think there might be something in the Medal of Freedom explanation as well — the ceremony was August 12, sure, but the recipients were announced July 30, several days before the editorial. You can both be right!"]

Think B4 You Speak

Language Log - Tue, 2009-08-18 06:03

According to Tycho at Penny Arcade ("The True Face of Our Enemy", 8/17/2009)

The Think B4 You Speak campaign is basically incoherent, and operates from some deep misconceptions about how and why people communicate. These assertions have been collated and placed sequentially in today's comic offering

The strip in question:

(Click on the image for a larger version.)

His conclusion:

No-one responds to this kind of diffuse scolding, least of all young men, least of all from strangers who present themselves as archwizards of prim speech and perfect morality. Bigots and stupid kids speak this way expressly to promulgate the root concepts or to provoke a reaction. Telling them to "knock it off," as this campaign hilariously does, is like exposing your belly to these wolves.

Lexically speaking, the word Gay is a battleground of warring meanings, uses, and baggage. The fact that the slur has retained its power - for all parties involved - is evidence that the conflict is ongoing, and that its destiny is not yet established. I have tremendous support for them in their aim: the wresting of language, which is identity, from the unworthy foe. If you want to hunt this kind of game, you need bigger ordnance.

This is only partly true, it seems to me.

First, a lot of people do respond to such campaigns. In some cases, they respond by complaining about "political correctness", or in other negative ways; but often, they actually try to modify their language, as complicated and confusing as that sometimes seems to be. It's true that the people who change their usage are generally not the people with genuinely offensive attitudes — but the campaign is specifically aimed at those who offend others without really wanting to do so, not at hard-core bigots.

And second, people's word choices are sometimes ignorant rather than thoughtless. I was in my 20s before I learned that gyp (meaning "cheat") had anything to do with gypsies. It's not that I thought it didn't — I just had never given the question any thought at all, because it had never occurred to me to do so. A friend of mine often used the expression "jew down" (as in "I jewed the dealer down to 5% over his cost"); when I pointed out to him that this reinforced anti-semitic stereotypes in an especially ironic way, since he as a WASP was a notoriously tenacious bargainer, he was taken aback, and claimed never to have realized that there was a connection between "jew down" and jews. And I once met a college freshman who was convinced that gay as a term for sexual orientation was derived from a pre-existing word meaning "foolish, stupid, socially inappropriate or disapproved of", rather than from a pre-existing word meaning "brilliant, showy, merry, sportive".

It's true that campaigns of this kind are generally pretty ineffective. "Just say no" and "This is your brain on drugs" seem to have enriched the culture's stock of catch-phrases without having much impact on the popularity of sex and drugs. But maybe political correctness is different.

No non-Portuguese textbooks?

Language Log - Mon, 2009-08-17 17:59

I was just looking for something in international mail regulations and stumbled on something curious. Among the items that it is prohibited to send to Brazil are: "Primary educational books not written in Portuguese". I have no desire to send any such textbooks to Brazil - in fact I'm not planning on sending anything to Brazil - I noticed this while looking for something else - but I'm curious as to the reason for this prohibition. It stands to reason that in a country whose primary language is Portuguese most primary textbooks will be in Portuguese, but I should think that there would be some schools in which some textbooks are not, such as international schools. And even if no schools use such textbooks, I can imagine foreign residents importing books in their own language for the use of their children, or teachers and educationists who want to examine textbooks from other countries. Against these legimitate uses for non-Portuguese textbooks, it is hard to imagine the threat posed by non-Portuguese textbooks. Do any of our readers know what this is about?

My illiterate search for the Sicilian animals (3)

Language Log - Mon, 2009-08-17 11:13

Well, now it is time to tell you the answer. (If you are saying "The answer to what?", you're in the wrong place. Start here, then go to here, and then come back.) Before I do, I should mention that half the readers of Language Log seem to have mailed me with their suggestions or quibbles or whatever. I'd like to express my sincere thanks to the other half. For the ones who suggested "sessilians", sorry, there are indeed animals that are sessile (rooted to the spot and immobile), and even a kind of barnacle called the sessilia, but they do not constitute an order called "sessilians" — you made that word up.

The thing I forgot is that there are spellings like Caesar, caesarean, caesium, etc. The fact that Americans generally use the Webster-simplified spelling cesium for the latter is as irrelevant as the fact that the bad Italian restaurant at the end of the mall that you mistakenly went to once spells caesar salad as ceasar salad: the crucial thing is that some words beginning with [s] do start with ca-,
or to be more precise, cae-. (In fact there are one or two, such as coelocanth, that start with co-, leaving only cu- as always indicating [k]. The English spelling system really is an utter mess.)

But I never did find the word via dictionaries or word lists. (By the way, I fixed up the original post to mention that people had pointed out that the initial letters could have been ps- for all I knew. Quite right. But that turned out to be just another red herring to follow.) The way I found the order in question was through aimless searching of zoological taxonomies. I had a vital clue that you didn't have: I had heard Sir David refer to the Sicilian creatures in the same breath as amphibians and snakes. So I just went to Wikipedia and just started working up from reptiles and amphibians to older and more inclusive classes, going back and forth and following links to possibly relevant articles about reptilia and amphibia, until by accident I hit upon one that had a link to the order I was looking for (but had never heard of before): the caecilians (their order is also known as the Gymnophiona or a Apoda). Strange animals indeed. Blind subterranean legless amphibians with teeth, living only in the tropics. Very little is known about them in some respects. Fascinating. I really want to see one now. And my temporary illiteracy is over, thank goodness.

The people who figured it out were largely people who had heard the Radio 4 program and were thus equipped with the clue that they should look up amphibians and browse around in that biological area. As far as I know, virtually no one got it by searching dictionaries or word lists. The spelling defeated us all. Except for Bill Walderman, who hit on the brilliant and very rapid technique of telling Google to search for "cycilian": it promptly corrected him to "caecilian" and showed him pictures! Nice work, Bill. Honorable mention. But The very first person to mail in a correct answer was Paul Bickart — exactly thirty minutes ahead of the second, Ast Moore. All three of these will have their subscription fee to Language Log waived for one year as a prize.

The etymology of caecilian, by the way, goes back to a Latin root for blindness, or (equivalently) the Latin word for the "slow worm", which is a legless lizard that lives under things (it is not actually blind, but the Romans apparently thought it was).

Computational eggcornology

Language Log - Mon, 2009-08-17 09:02

Chris Waigl, keeper of the Eggcorn Database, brings to our attention a paper that was presented at CALC-09 (Workshop on Computational Approaches to Linguistic Creativity, held in conjunction with NAACL HLT in Boulder, Colorado, on June 4, 2009). As part of a session on "Metaphors and Eggcorns," Sravana Reddy (University of Chicago Dept. of Computer Science) delivered a paper entitled "Understanding Eggcorns." Here's the abstract:

An eggcorn is a type of linguistic error where a word is substituted with one that is semantically plausible – that is, the substitution is a semantic reanalysis of what may be a rare, archaic, or otherwise opaque term. We build a system that, given the original word and its eggcorn form, ﬁnds a semantic path between the two. Based on these paths, we derive a typology that reﬂects the different classes of semantic reinterpretation underlying eggcorns.

You can read the PDF of Reddy's paper here. Yet another advance in the recognition of eggcornology as a legitimate linguistic subdiscipline.

My illiterate search for the Sicilian animals (2)

Language Log - Mon, 2009-08-17 06:27

You shouldn't be reading this if you didn't read My illiterate search for the Sicilian animals (1): if you're starting here, don't. Follow this link and read that first. Then come back. Because all I am doing in this brief follow-up post is giving Language Log readers a clue concerning the crucial feature of the awful English spelling system that I had temporarily forgotten. I had forgotten (how?) about the emperors of Rome, and the most southeasterly of that city's hills, and bypassing the birth canal, and the radioactive soft metal isotope used in atomic clocks, and the opening part of the large intestine. That's your clue. (What do you mean that's not enough? I'm the quizmaster here. I'm the one who says what's enough.)

TOC: Language Learning Vol 59, No 3 (2009)

Linguist List: Journal Contents - Sun, 2009-08-16 19:54

A word on the wall

Language Log - Sun, 2009-08-16 11:29

"Best. Cartoon. Ever", says Jesse Sheidlower, about The Rut for 7/15/2008:

"Well, maybe not greatest, but I really love this", he hedged.

I also enjoyed many of the other cartoons at the same site.

The and a sex: a replication

Language Log - Sun, 2009-08-16 10:41

On the basis of recent research in social psychology, I calculate that there is a 53% probability that Geoff Pullum is male. That estimate is based the percentage of the and a/an in a recent Language Log post, "Stupid canine lexical acquisition claims", 8/12/2009.

But we shouldn't get too excited about our success in correctly sexing Geoff: the same process, applied to Sarah Palin's recent "Death Panel" facebook post ("Statement on the Current Health Care Debate", 8/7/2009), estimates her probability of being male at 56%.

Although it's easy to make jokes about this, it's based on a solid and interesting result. A recent survey (Newman, M.L., Groom, C.J., Handelman, L.D., & Pennebaker, J.W., "Gender differences in language use: An analysis of 14,000 text samples", Discourse Processes, 45:211-236, 2008) looked at more than 50 "linguistic dimensions", and the difference in use of the/a/an was one of the largest sex differences found.

A few days ago, in a post on "Linguistic analysis in social science", I observed that

Traditional mass media are now nearly all digital; new media are documenting (and creating) social interactions at extraordinary scale and depth; more and more historical records are available in digital form. The digital shadow-universe is a more and more complete proxy for the real one. And in the areas that matter to the social sciences, much of the content of this digital universe exists in the form of digital text and speech.

As a result, I argued, data based on the analysis of speech and language will play an increasingly large role in the social sciences. The relevant effects are generally small ones, but they're easy to measure, and when properly characterized and measured, they can be quite reliable. Furthermore, they're similar in magnitude to effects measured with much greater trouble and expense using traditional social-science methods like surveys and tests. So for today's lunch experiment™ (I was busy with other things over breakfast…) I thought I'd see if I could replicate Newman et al.'s result on sex differences in article usage, using a completely different data set.

First, let's get clear on what Newman et al. found. Their abstract:

Differences in the ways that men and women use language have long been of interest in the study of discourse. Despite extensive theorizing, actual empirical investigations have yet to converge on a coherent picture of gender differences in language. A significant reason is the lack of agreement over the best way to analyze language. In this research, gender differences in language use were examined using standardized categories to analyze a database of over 14,000 text files from 70 separate studies. Women used more words related to psychological and social processes. Men referred more to object properties and impersonal topics.

The summary table of dimensions and effect sizes is on pages 19-20, reproduced for your convenience here. The results for percentages of the/a/an, in particular, were:

Female mean (stddev) Male mean (stddev) Effect Size 6.00 (2.73) 6.70 (2.94) d = -.24

The "effect size" here is estimated using "Cohen's d", which is the difference in means divided by the pooled standard deviation. This difference in article-use percentage is on the large side, not only among sexual-textual characteristics, but also among cognitive sex differences in general, for example the measures of verbal ability in meta-analytic studies like Janet Shibley Hyde and Marcia C. Linn, "Gender Differences in Verbal Ability: A Meta-Analysis", Psychological Bulletin, 104:1 53-69 (1988).

For more on the interpretation of effect sizes in general, see "Gabby guys: the effect size", 9/23/2006. In this case, if we assume that the distribution of article-use percentages by sex is "normal", then over a large collection of writing-samples, the cited means and standard deviations would yield a distribution of percentages like this:

It's important to be clear that this is not a very big effect, when you look at it from the point of view of men and women as individuals. Since the correlation ("Pearson's r") is related to Cohen's d as

r = d/sqrt(d^2 + 1/(p*q))

(where p and q are the proportions of the two groups being compared), and since their sample was about 58% female, the correlation between sex and article use should be about r = 0.119 — and thus the percent of variance in article use that is accounted for by sex, in their dataset, is about r^2 = 1.4%.

Looking at it from the other side, you'd have around a 55% chance of guessing sex from the percentage of the/a/an in random examples drawn from a population with equal numbers of males and females exhibiting these distributions of article-usage by sex.

And there are bigger differences due to genre and topic — the rate of the/a/an usage in formal written text will generally be much higher than in informal conversation, for example, and the expected magnitude of that difference is more than twice as great as the sex effect.

So this sex difference in article usage (like other perceptual and cognitive sex differences) doesn't provide any meaningful scientific support (in my opinion) for Dr. Leonard Sax's single-sex-education movement. On the other hand, differences of this magnitude can be quite important in some contexts. This much of an edge in investing or gambling, for example, would be a reliable source of income. Similarly, in politics or in marketing, effects of this size can be highly useful. (I don't mean that article-usage distributions per se are of any interest to investers, politicos, or marketeers; but other reliable effects of this size certainly would be.)

And a correlation of about 0.12 is right in the mix, for effects in published social-science research. Compared to the values in a recent meta-analysis (F.D. Richard, C.F. Bond, and J.J. Stokes-Zoota, "One hundred years of social psychology quantitatively described", Review of General Psychology, 2003), it's below the mean, but above the mode:

This article compiles results from a century of social psychological research, more than 25,000 studies of 8 million people. A large number of social psychological conclusions are listed alongside meta-analytic information about the magnitude and variability of the corresponding effects. References to 322 meta-analyses of social psychological phenomena are presented, as well as statistical effect-size summaries.

The distribution of r values that they found:

OK, on to the replication.

In order to show that this effect is reliable and easy to calculate, I took the transcripts and speaker demographics from the Fisher Corpus of conversational speech, a collection of more than 10,000 telephone conversations lasting up to 10 minutes each, recorded in 2003-2004 and published by the LDC.

In 9,789 conversational sides spoken by males, I found 9,409,848 words, of which 471,820 were the/a/an, for an overall percentage of 5.01%. In 13,007 conversational sides produced by females, there were 12,186,985 words spoken, including 554,827 articles, for an overall percentage of 4.55%.

These percentages are lower than in Newman et al.'s overall tabulation, as we expect given that this is informal conversation rather than written text.

What about the distribution of percentages across speakers? Here's a graphical representation of what I found (0.2%-wide bins from 0.1% to 10.1%):

And here's a table of the summary statistics

Female mean (stddev) Male mean (stddev) Effect Size 4.47 (1.16) 4.89 (1.27) d = -.34

Thus the effect is in the same direction, and the effect size is somewhat larger, consistent with Newman et al.'s observation that

Although these effects were largely consistent across different contexts, the pattern of variation suggests that gender differences are larger on tasks that place fewer constraints on language use.

The key thing is that this kind of analysis is now very easy to do. Starting from the raw Fisher-corpus transcripts and metadata files, writing and running the (gawk and R) scripts for this replication took me 17 minutes of wall clock time.

I haven't tried to persuade you that this effect is an interesting one, only that it's reliable (in the aggregate), comparable in size to many traditional social-psychological measures, and easy to compute. Though something might be learned by trying to figure out where this phenomenon comes from, it seems to me that it shouldn't be seen as an end in itself, but rather a feature that might help us understand other social and psychological differences.

There are hundreds of features that can now be calculated in similarly trivial ways. (And the resulting distributions show large age, class, and mood effects as well as sex effects, as Jamie Pennebaker and others have found — are there also effects of political philosophy, for example?)

As more and more text and speech become available, as better and more sophicated automatic analyzers are developed (such as those involved in "sentiment analysis"), and as the modeling of feature distributions in these larger datasets becomes more sophisticated, it's inevitable that the scope of such research will become broader, and the number of studies will increase.

I believe that the social value and intellectual interest of these studies will also increase — and I'll try to persuade you, in occasional future posts, that this is already happening.

[Update: D.O. in the comments asked about pronoun percentages. The numbers from Newman et al. are

Female mean (stddev) Male mean (stddev) Effect Size 14.24% (4.06) 12.69% (4.63) d = 0.36

I'm not sure that I've replicated their calculations exactly, because I'm not certain my list of pro-forms is the same as theirs. But for what it's worth, here's what I get for the Fisher data:

Female mean (stddev) Male mean (stddev) Effect Size 16.155% (2.352) 15.496% (2.531) d = 0.27

And here's the graphical version:

Again, the direction is the same, though there's an effect of genre that's larger than the effect of sex.]

My illiterate search for the Sicilian animals (1)

Language Log - Sun, 2009-08-16 08:35

My parents tell me that I could read well before my 4th birthday. As a result, I have virtually no experience of what it would be like to be illiterate. It would be easier for me to imagine blindness than complete inability to read. I did have a glimpse of it when I first spent some time in Japan, and was surrounded by an advanced culture using an utterly alien writing system in which I couldn't even read out the names off the signs (as I can in any of the alphabets of Europe). But I had another glimpse this morning when I heard a word on the radio that I couldn't guess how to spell, not even vaguely. Tracking it down was a terrible job. My dictionary was no help, precisely because dictionaries are organized in such a way as to be helpful only to the literate. The great naturalist Sir David Attenborough, on Radio 4, mentioned a curious-sounding class of animals that he appeared to be calling Sicilians. (Not a class in the technical terminology; technically they are actually a whole separate order of animals.) I listened carefully; it definitely sounded like "Sicilians". But what was this word? These creatures (he made it clear) did not live in Sicily.

I went to the American Heritage Dictionary (I wanted to know the meaning and learn about the animals that had the name, so I used a hard-copy dictionary that includes pictures), and simply examined all the words in the dictionary that had anything like a plausible beginning.

The only letters that can represent the [s] sound are c and s. (Just in case, I checked x and z, which actually represent [z] when initial; I thought perhaps I had misheard the voicing. But I had not; it was a blind alley, and I will ignore it hereafter. People have pointed out to me since I wrote this that the silent p words like psychology reveal another possibility. True. But it turned out not to be relevant.) The vowel letters that could represent [I] after [s] would be i as in city or sit and y as in cyst or system, and just possibly (in an unstressed syllable) e as in Cecilia or serenity. It couldn't be a or o or u, because before those letters a c stands for the stop consonant [k]. And the third sound, the second [s], could be spelled (for all I knew) c or s or ss.

And I came up with zip. Nada. Nichts. Nothing.

There I was, illiterate in English (while holding the position of head of the top Linguistics and English Language department in the U.K. — what a fraud!), fumbling through the dictionary, unable to find a word — solely because dictionaries are organized entirely on the assumption that you know how to spell, at least to an approximation. I found no sign of the word at all, in any of the relevant places. I simply couldn't remember this happening to me before. Most unpleasant. So this is how adult illiteracy feels. Dictionaries become useless.

Now, I do have a fair command of Unix tools, and those, used appropriately, can dramatically reduce your feelings of illiteracy and inadequacy. I knew that what I had to find was in the set of all words that would match the egrep regular expression: "^[cs][iey](c|ss?)". That is, I wanted to see a list of any words beginning with c or s followed by i or e or y followed by either c or else s with perhaps a second s after that. (To allow for ps-, the expression could be modified to "^(ps|[cs])[iey](c|ss?)". Doesn't make any difference.)

On any Unix system you can use the egrep program to produce an exhaustive list of all of them by searching the standard word list in /usr/share/dict/words (it is /usr/dict/words on some systems). In fact you can make egrep give you a list of all the words that begin the right way and include an l (there had to be one of those in the word) and ends in n plus (just possibly) a silent e. So I tried the magic of egrep. I typed this to the prompt in the Terminal program on my Mac OS X laptop:

egrep "^[cs][iey](c|ss?)[a-z]*l[a-z]*ne?$" /usr/dict/share/words

But the results were disappointing: the four words it comes up with are cyclone, cyclopean, cyclotron, and seclusion. No plausible candidates there. Where the hell was this zoological word that sounded just like Sicilian that I had never heard before and that couldn't be found in an excellent dictionary or in the Unix standard word list?

I turned to a larger word list. There is a 235,000-word list in a file called /usr/dict/web2 that is now supplied with many Unix and Linux systems (it is linked to /usr/dict/share/words on some of those), and I tried out this command:

egrep "^[cs][iey](c|ss?)[a-z]*l[a-z]*ne?$" /usr/dict/web2

And I still drew a blank. I found out later that the word is in there, but the above command will not find it. It produces this list of 45 words (which reveals to you why web2 is often less useful than the shorter list — it is way too big, and contains all sorts of learned and scientific junk):

ciclatoun cyclene cyclohexanone cyclopentadiene seclusion cisalpine cyclian cyclohexene cyclopentane sectionalization cisleithan cyclization cycloidean cyclopentanone secularization cisplatine cycloalkane cycloidian cyclopentene sesquialteran cycadofilicinean cyclobutane cyclomyarian cyclopropane sesquipedalian cyclamen cyclodiolefin cyclone cyclothurine sicilian cyclamin cycloheptane cycloolefin cyclotron sicilienne cyclamine cycloheptanone cycloparaffin secalin sickleman cyclane cyclohexane cyclopean secaline sysselman

Well, it turned out that I was missing something. Eventually I did track the word down, by a less orthodox technique, not based on the alphabet. And then I knew the crucial orthographic fact about English that I had been overlooking. It didn't seem so arcane once I reached that point. But until then it was completely opaque to me.

Let me explain…

No; on second thoughts, I don't have a lot more time right now, and it will be fun for you trying to figure it out. I will tell you tomorrow, and I'll explain how I tracked down the word.

[Comments cannot be allowed at this point, of course, because Language Log now has tens of thousands of readers and many of you are extremely smart, and others are zoologists (in addition to being extremely smart), and you would blow the puzzle within three minutes flat and tell everyone what the secret was.]

The 2009 Linguistic Institute ends

Language Log - Sat, 2009-08-15 16:36

Yesterday the six-week faculty and the second-session three-week faculty ended our teaching stints at the 2009 Linguistic Institute sponsored by the Linguistic Society of America and the University of California at Berkeley. The two second-session Language Loggers, Adam Albright and I, were in complementary distribution with the two first-session Language Loggers, Geoffs Nunberg and Pullum: we did not meet in Berkeley. Not all of us have finished our work for our classes — I still have 15 of my 42 papers to grade — but our tight-knit community — living in the same dorm, sorry, residential unit (palatial by my loooong-ago student-era standards) and eating at the same university dining hall (spectacular by my ditto standards) — is history.

What a great Institute! I learned any number of cool things, partly from my own students (who hailed from places as distant as Florida, Boston, Singapore, France, Poland, and Taiwan), partly from the two classes I sat in on, Adam Albright's morphological change class and Emmon Bach & Pat Shaw's class on Wakashan Linguistics, and partly from fellow faculty members. From my students I learned, for instance, about Singapore's Speak Good English Movement (launched in 2000 and based on the earlier Speak Mandarin Campaign, vintage 1979, which was designed to encourage Chinese speakers to abandon other Chinese "dialects" and switch to Mandarin). The SGEM is supposed to promote Standard Singapore English and demote "Singlish", a.k.a. Singapore Colloquial English. I also learned about Bhindi, a mixture of Hindi and Marathi and maybe Dravidian languages as well, spoken in Bombay/Mumbai. And the students raised the question of whether the Nicaraguan Sign Language is best seen as a pidgin, or a creole, or something else. (I argued for abrupt creole status, on the grounds that the Deaf kids who created it brought their home-sign systems to the school where the NSL developed, and presumably contributed material from those sign systems to the emerging creole.)

From Mark Donohue, who was teaching Phonological Typology of Papuan Languages in the second session, I learned a whole bunch of fascinating things: about the Doutai speakers (NW New Guinea) who suppressed their language's implosive consonants for several weeks while Mark was studying their language (a phenomenon reminiscent of Dan Everett's experience of living three years among the Pirahas in the Amazon before they stopped suppressing their linguo-labial stops in talking to him); about plugging in typological features — word order, consonant types, etc., etc., etc. — to biologists' statistical models and coming up with areal rather than genetic groupings in known cases (e.g., Rumanian grouped with Slavic rather than with the rest of the Romance languages); and about a wide range of other things.

There's lots more, but you'll have to attend an LSA Linguistic Institute yourself — say, in Boulder, Colorado, in 2011 — to get your own collection of exciting linguistic facts, perspectives, new theories. No matter how old or young you are in age and/or in the field of linguistics, there's something at an Institute for you. About half the students in my language contact class at this Institute were undergraduates, for instance, and a few of them were studying at almost linguistics-free colleges. Neophytes, in other words. And they did just fine. For language-lovers, it doesn't get much better than a Linguistic Institute. And that's even aside from the occasional perks, like the view of the Golden Gate Bridge from my dorm window.

When peeves collide

Language Log - Sat, 2009-08-15 07:28

… the result is a grammatical bar brawl. An excellent example is on display over at Ask MetaFilter, where someone innocently asked

So which sentence is proper English grammar: "If you eat like Bob and me, you will be healthy." or "If you eat like Bob and I, you will be healthy."

KA-POW: "it's the second one…" WHAAM: "No, it's the first…" BIFF: "The verb 'do' is implied…" DOINK: "'like' … is indisputably a preposition in this case. It can't even function as a conjunction."

And so on. There's some sensible and well-informed advice mixed in with the mud and the blood and the beer, including a link to my discussion of a similar question a few weeks ago ("Write like me?", 7/24/2009). But overall, the chaos of contradictory confidence is likely to reinforce the culture's general state of nervous cluelessness about grammar.

In this case, an ample supply of the relevant clues can be found in two entries in MWDEU. The first is the entry for between you and I (p. 181-182):

And the second is the entry like, as, as if (p. 600-603):

Unfortunately, the answer suggested by the MetaFilter free-for-all still stands — no matter what you do with this question, some peevologist is likely to take a poke at you. But maybe this background will help you get out of the place in one piece.

The Science of Language