Language Log

Syndicate content
Updated: 18 min 19 sec ago

How many ethnic groups?

Mon, 10/12/2009 - 7:05am

Counting languages isn't an easy task; in particular, it's hard to say whether two varieties are related languages or dialects of a single language. Making these decisions on linguistic grounds is difficult enough, but political, cultural, and social considerations often intervene, to compound the difficulty. The latest Ethnologue (16th ed., 2009) advertises itself as "an encyclopedic reference work cataloging all of the world's 6,909 known living languages", but the introduction lays out the problems in identifying and counting languages and acknowledges that the methods used in reaching this very exact number are not the only possible ones and that these methods involve judgment calls at several points.


Same thing in counting ethnic groups. But sometimes an authority just stipulates a figure, as in this report on school textbooks in China ("The fragility of truth", The Economist, October 10, p. 45):

The authorities have said that 56 columns erected on Tiananmen Square for the celebrations [the recent 60th anniversary celebrations] will stay put. They represent China's officially recognized 56 "ethnic groups". It is a number that few Chinese schoolchildren would dare to challenge. Yet when the communists came to power there were found to be more than 400. Officials eventually settled on the much lower figure and suppressed further debate.

(That's 55 official minority nationalities, plus Han Chinese.)

I don't see any official pronouncement on the number of languages in China. What the Ethnologue says is:

The number of individual languages listed for China is 293. Of those, 292 are living languages and 1 has no known speakers. (link)

Language and ethnicity are related in complex ways, but 291 languages (discarding the 2 special cases) seems to me to be an awful lot for only 56 ethnic groups.

The genitive of lifeless things

Sun, 10/11/2009 - 1:53am

I've heard many interesting papers here at AACL 2009. Here's one of them: Bridget Jankowski, from the University of Toronto, "Grammatical and register variation and change: A multi-corpora perspective on the English genitive".  She was kind enough to send me a copy of her slides, from which I've taken (most of) the graphs below.

In order to study the history of choices like "Ontario's government" (s-genitive) vs. "the government of Ontario" (of-genitive), she created two small historical corpora, sampling Maclean's magazine and the Hansard transcripts of debates of the Ontario Provincial Legislature at three time points: 1906, 1956, and 2006. She picked three authors or three speakers from each source at each time point. All of the speakers and authors were men aged 30-60 at the time of the sample.

Her first result is a replication of the observation that the s-genitive has been gaining ground:

Compare, for example, this figure from Hinrichs and Szmrecsanyi, "Recent changes in the function and frequency of standard English genitive constructions: a multivariate analysis of tagged corpora", English Language and Linguistics 11(3): 437–474, 2007:

Jankowski then broke the trends down further by coding the possessors as

  1. Human: a student’s schoolwork, Mrs. Hale’s reaction
  2. Organizations (animate “collectivities of humans which display some degree of groupidentity”): the local school board’s ruling; the federal government’s plan
  3. Places: Canada’s foreign language press, Ontario’s roads, the streets of Rome, the raw edge of the world, the people of this American continent
  4. Inanimate objects, activities, units of time, states

This made it clear that the increase in use of s-genitives has been especially strong in the case of organizations, and even stronger in the case of places:

Her category 4 ("Inanimate objects, activities, units of time, states") was realized overall with of-genitive 96% in Maclean’s and 99% in Hansard. So her results are generally consistent with Otto Jespersen's observation in A Modern English Grammar on Historical Principles: Part VII (1949) that

In poetry and in higher literary style, the genitive of lifeless things is used in many cases where of would be used in ordinary speech. […] During the last few years the genitive of lifeless things has been gaining ground, (especially among journalists)…

but only if "lifeless things" is taken to include organizations and places, and not "inanimate objects, activities, units of time, states".

She also compared her results to data from a corpus of conversational speech collected recently in Toronto, using speaker age to create two "apparent time" collections comparable to the 1956 and 2006 samples. This suggests that in the spoken language, human possessors have almost always gotten the s-genitive, consistently across time, while inanimate possessors (in this graph including her categories 3 and 4) have consistently gotten the of-genitive:

On this analysis, the increase in s-genitives for human possessors in Maclean's magazine makes the journalistic prose more and more like the spoken language; but the parallel increase in s-genitives for inaminate possessors makes the journalistic prose less and less speech-like.

Her presentation also considered the effect of the length of the possessor (a shorter possessor is more likely to take an s-genitive) and the possessum ("shorter possessum will be more likely to take an of-genitive and so appear first in the construction, while a longer possessum is more likely to take the s-genitive"), as well as other relevant features such as "lexical density":

and "thematicity":

and did a multivariate analysis of all the various factors taken together.

You'll have to read her (I trust forthcoming) paper to learn how it all comes out — I have a 6:40 a.m. plane to catch — but I hope that this much is enough to convince you that there's a rich and interesting pattern of variation to be untangled here. It certainly convinced me.

And it also increased my general feeling that the time is right for the application of automatic or semi-automatic methods of analysis (here in assigning her four categories of possessors, in determining the lengths of the possessor and possessum constituents, in counting local phrases co-referential with the possessor, etc.) to the study of syntactic variation across time, genre, register and so on. Because she had to annotate everything by hand, Jankowski's sample was fairly small — 50K words of Maclean's, and 100K words of Hansards. With automatic or semi-automatic annotation, she could look at larger collections with denser time samples of more sources, and easily add other features, like various word and phrase frequencies, grammatical role and phrasal position of the whole genitive construction, etc.

Time and the river

Fri, 10/09/2009 - 2:45pm

The latest xkcd is a brilliant way to introduce the topic of child language acquisition and cognitive development:



The trouble is, the first-year college students in this year's intro linguistics courses were only 9 on 9/11. And in about five more academic years, the entering students will be too young to remember 9/11 as a personal experience at all.

University of Alberta's motto: "whatever"

Fri, 10/09/2009 - 11:08am

The University of Alberta, hosting the AACL 2009 conference where I'm spending a couple of days, has recently moved up in the Times Higher Education World University Ranking, from 133rd in 2006, 97th in 2007, and 74th 2008, to 59th in 2009.  (I believe that it comes out 4th in Canada, after McGill, Toronto, and UBC.) It's hard to make that kind of move — the responsible faculty and administrators should be congratulated.

And when I saw it for the first time yesterday, I thought that the motto on the University's seal expressed just the right attitude: quaecumque vera, or after translation from the Latin, "whatever". Well, I suppose literally it means "whatever [things are] true", but the "true" part is redundant, right? I mean, when you say "OK, whatever", isn't what you mean "OK, whatever is true, I'm fine with it"?

More seriously, the university senate's web site explains that

The University Motto, Quaecumque vera, is taken from the Latin Vulgate version of the Bible, the Epistle of St. Paul to the Philippians, Chapter 4, Verse 8:

De cetero, fratres, quaecumque sunt vera,quaecumque pudica, quaecumque justa, quaecumque sancta, quaecumque amabilia, quaecumque bonae famae, si qua virtus, si qua laus disciplinae, haec cogitate.

The same passage from the King James version is: Finally, brethren, whatsoever things are true, whatsoever things are honest, whatsoever things are just, whatsoever things are pure, whatsoever things are lovely, whatsoever things are of good report; if there be any virtue, and there be any praise, think on these things.

Right; whatever. Or to avoid using what is allegedly the most annoying phrase in English, quaecumque vera.

Hoc est enim corpus linguistics

Fri, 10/09/2009 - 5:01am

I'm at the AACL 2009 meeting in Edmonton — that's the meeting of the American Association for Corpus Linguistics, which is neither American nor an Association, as John Newman explained to me.  I'll report later on some of what I see and hear.

So far, the most notable thing has been the outside temperature of 20 F or so, experienced on a morning walk around campus — the conference itself hasn't started yet — but the program looks interesting.

It seems to me that no reputable naming consultant would have approved the choice of the word corpus – Latin for "body" — in corpus linguistics, which involves the study of "bodies" or "collections" of text.  There's an unfortunate resonance with corpse, which makes the whole enterprise sound faintly icky. (It isn't — the method is used on living languages as well as dead ones. Not that the dead ones are icky either…)

The OED gives citations back to the 18th century for corpus in the sense "A body or complete collection of writings or the like; the whole body of literature on any subject":

1727-51 CHAMBERS Cycl. s.v., Corpus is also used in matters of learning, for several works of the same nature, collected, and bound together..We have also a corpus of the Greek poets..The corpus of the civil law is composed of the digest, code, and institutes.

The more specialized sense "The body of written or spoken material upon which a linguistic analysis is based" is cited only back to the 1950s:

1956 W. S. ALLEN in Trans. Philol. Soc. 128 The analysis here presented is based on the speech of a single informant..and in particular upon a corpus of material, of which a large proportion was narrative, derived from approximately 100 hours of listening. 1963 Language XXXIX. 1 In the analysis of the data, the structural features of the corpora will first be described. 1964 E. PALMER tr. Martinet's Elem. General Linguistics ii. 40 The theoretical objection one may make against the ‘corpus’ method is that two investigators operating on the same language but starting from different ‘corpuses’, may arrive at different descriptions of the same language.

There's more to be said about the ideas involved — methodological issues can become quasi-religious for some people, as Geoff Pullum observed here a few years ago, and he was describing the mere residue of earlier battles that were much more bitter.  But as Moore's Law and the digitization of society have made it easier and easier to apply corpus-linguistics methods, the methodological arguments about whether, when and how to apply them have become much less violent.

"Annoying word" poll results: Whatever!

Thu, 10/08/2009 - 9:17pm

Proving once again that peevology is the most popular form of metalinguistic discourse in the U.S., the media yesterday was all over a poll from the Marist Institute for Public Opinion, purporting to reveal the words and phrases that Americans find most annoying. As was widely reported, whatever won with 47%, followed by you know (25%), it is what it is (11%), anyway (7%), and at the end of the day (2%). As was not so widely reported, those were the only options that respondents to the poll were given, so it's not like half of Americans are really tearing their hair out about whatever.

For more on the poll and its media reception, see my latest Word Routes column on the Visual Thesaurus. And check out recent Language Log posts on whatever (here) and at the end of the day (here, here, and here).

Exploring the cliche-by-president matrix

Thu, 10/08/2009 - 12:49am

A couple of days ago, in "Fact-checking George F. Will, one more time", I noted that Will complained about the

…  egregious cliches sprinkled around by the tin-eared employees in the White House speechwriting shop. The president told the Olympic committee that: "At this defining moment," a moment "when the fate of each nation is inextricably linked to the fate of all nations" in "this ever-shrinking world," he aspires to "forge new partnerships with the nations and the peoples of the world."

While admitting that "I don't have a program ready to hand for measuring cliche-density, much less cliche egregiosity ", I nevertheless offered the opinion that "in speeches prepared for ceremonial occasions like this one, the cliche density of presidential rhetoric has been fairly constant for decades if not centuries".

I still don't have a metric for cliche-density, but we can learn something by exploring the site http://www.presidentialrhetoric.com/ for certain fixed phrases.

For example, there's no question that "defining moment" is a defining phrase for Barack Obama, who has used it ten  times in texts indexed on that site. The only other president who has ever used it (in texts indexed there, anyhow) is, interestingly, George W. Bush, who used it once.

As for that "ever-shrinking world", Obama has used this phrase once before the Olympic pitch, and no other president (or presidential speech-writer) has ever done so.

And no other president has apparently ever aspired in so many words to "forge new partnerships". But George W. Bush ("America and China: Address in Thailand", 8/7/2008) promised to "forge new relationships with countries that share our values". In fact, that phrase came from a sentence notably dense in the high-sounding abstract phrases that George Will seems to dislike so much in Barack Obama's speech:

America has pursued four broad goals in the region: reinvigorate our alliances, forge new relationships with countries that share our values, seize new opportunities for prosperity and growth, and confront shared challenges together.

Interestingly, presidents Bush and Obama are the two only presidents to voice the aspiration to "confront * challenges": Obama three times, and Bush four times. Many other presidents have confronted challenges, but only these two have used those words.

And even more than confronting challenges, George W. Bush was fond of (talking about) confronting problems: the string "confront problems" occurs 36 times in his texts, and — amazingly — not once in the texts of any other American president.

GW Bush was also fond of (talking about) seizing things: "seize new opportunities" (W 2, no others), "seize this|that opportunity" (W 2, Gore 1, no others), "seize opportunities" (W 2, Clinton 2, no others) "seize this moment" (W 3, Kerry 1, no others), "seize the moment" (W 4, no others), "seizing this moment" (W 1, no others), "seize the initiative" (W 1, no others), "seize control" (W 4, Carter 1, no others).

Overall, such phrases often seem to be associated with particular time periods and with particular presidents. Thus James Munroe and Andrew Jackson were each "deeply impressed" three times, and James Polk and Martin van Buren once each. The only other presidents to have been "deeply impressed" were — again — that unlikely pair George W. Bush and Barack Obama, once each.

It would be interesting to run a collocation-detection algorithm over the whole collection of presidential speeches. This might give something approximating a cliche-density metric (though really there should be some normalization relative to usage patterns in the wider world). But the eigenstructure of the president-by-cliche matrix might tell us whether Barack and W are really rhetorical brothers beneath the skin, and reveal other hidden (or at least amusing) affinities.

A dangler in The Economist

Thu, 10/08/2009 - 12:01am

My view on the classic prescriptive bugaboo known as dangling modifiers or dangling participles (henceforth, danglers) is, I think, a bit unusual. I don't regard danglers as grammatical mistakes; that is, I think the syntax of English does not block them. Yet I do think they constitute mistakes, in a broader sense, so in a way I am with the prescriptivists on this one. A dangler is an error in a domain that I have compared (for want of a better way to put it) to courtesy or manners. I regard danglers as minor offenses against communicational etiquette, but not against grammar. The argument against danglers being grammar errors is simple: they are too common in even careful published writing, and come too fluently to the keyboards of even excellent writers, and are accepted without remark by too many educated readers. If you ask what evidence there is that, for example, verbs come before objects in English, the answer is that it is overwhelmingly clear from just about all of everybody's usage just about all the time, and from the blank "What's gone wrong with you?" reactions if you try putting the object before the verb. The evidence on danglers goes entirely the other way. Here, for example, is an example in the carefully edited prose of The Economist (October 3rd, 2009, p. 79):

A report to the British House of Commons this year highlighted the case of an elderly British citizen called Derek Bond, who was arrested, at gunpoint, in February 2003 while on holiday in South Africa. After being held for three weeks, it turned out that the American extradition request was based on a fraudster who had stolen Mr Bond's identity.



The only relevant thing the syntax says, I believe, is that subjectless non-finite clauses, and preposition phrases having such clauses as complement of the preposition, and predicative constituents such as adjective phrases, may be used as adjuncts.

And all the semantics says is that the target of predication in such cases is filled in by reference to a grammatically salient noun phrase (NP) in the immediate vicinity. That's it.

Consider in this light the task of interpreting the second sentence in the quotation above. After what? Somebody being held for three weeks. Who was held? We're guessing thus far, so let's wait and see what the subject of the matrix clause is… Hmm, the pronoun it. That's not very promising: what non-human could have been held? Let's go on. It turned out that… This makes it clear that the it was a dummy — a meaningless placeholder in a context where a complement clause is in extraposition (postponed till the end of the clause containing it). Well, what's the subject of the clause in extraposition? The American extradition request. But surely that is not what was held. Let's go on. Was based on a fraudster… Could the target of predication be a fraudster? No, that makes no sense. Any other NPs? Well, there is one more (though we're down to NPs that could hardly be called grammatically salient now): the object of stolen, namely, Mr Bond's identity. But that doesn't make sense either: this isn't about the South Africans holding the man's identity.

Wait a minute, though: if we look inside that NP we see that its determiner is the genitive NP Mr Bond's. Perhaps the thing to do is to ignore the genitive case on that and try Mr Bond as the target of predication. After Mr Bond had been held for three weeks. Yes, that would make sense. We'd better assume that.

You can get there. But what a struggle. Floundering around for what could be as much as an extra second, which in language processing is a very long time, there were four different false leads planted in the text for us to pursue — four NPs that were not the right choice for the target of predication we needed to plug together with the being held clause.

It is true that if we had looked back at the previous sentence instead of plowing on we would have noticed that there was an indefinite NP, an elderly British citizen called Derek Bond, which was a prime candidate. If we had happened to be still holding onto that, and we had tried plugging in a definite version of that ("the aforesaid elderly British citizen called Derek Bond"), it would have worked like a charm. But that NP was embedded in a larger one (the case of an elderly British citizen called Derek Bond), and following it we had read four other NPs (gunpoint; February 2003; holiday; and South Africa. Any syntactic salience that NP might have had was lost before we began the next sentence.

Hearers and readers can't be expected hold onto every NP they run across, keeping all of them live and active in short-term syntactic memory just in case perhaps one of them might be suddenly needed to make a subjectless clause adjunct interpretable. That's not how we work, or so it seems to me. Mostly we expect the sentences we encounter to be parsable independently: take any one of them on its own and you should be able to understand it down to the level where all that remains is assigning antecedents to pronouns and filling in gaps due to ellipsis. And that second sentence does not meet the condition. We had to fumble around and look all over the place to find a target of predication for the subjectless clause in the initial PP.

That's a shortcoming on the part of the writer. Not a disastrous blunder or a major display of ignorance; just a minor discourtesy to the reader. That's what I think danglers are.

But they are extraordinarily common, and they occur now and then even in what is in general terms excellent writing. The more sensitive to syntax you are, the more you will be struck by them and incommoded by them. The more you exercise your common sense rather than your syntactic sense when figuring out what a subjectless non-finite clause adjunct must mean, the less you will notice them. But they will be out there, in everything you read (somewhat less frequently in conversation because of its lower syntactic complexity — we don't use non-finite clause adjuncts so much when chatting about who's going to pick up the milk).

Just for fun (but not out of a lack of courtesy) I embedded a deliberate dangler in the paragraphs above. Now you will know how careful a reader you are. If you didn't notice it, that underlines my point that you probably do not operate by a set of syntactical rules that forbid danglers. And if you did notice it, and experienced that odd extra second of squirming around looking for a target of predication, then you'll know what I've been talking about.

When did the Supreme Court make us an 'is'?

Wed, 10/07/2009 - 10:45am

In my recent post "The United States as a subject", I discussed the often-repeated story that the American Civil War turned "the United States are" into "the United States is", and observed that "no one seems ever to have checked, at least not very thoroughly". It's a good thing that I said "seems", since Minor Myers has gently pointed me to his article "Supreme Court Usage and the Making of an 'Is'", 11 Green Bag 2d 457, August 2008, in which he checks this very point, very carefully, in opinions of the United States Supreme Court from 1790 to 1919.

And the answer? In the case of U.S. Supreme Court opinions, we apparently became an 'is' somewhat gradually, between 1840 and 1910. And the effect of the Civil War (or at least its immediate aftermath) was apparently to retard the change, not to accelerate it.

After citing the Shelby Foote "It made us an 'is'" quote that I also gave, Myers adds some evidence of the ubiquity of this view. Thus he quotes James McPherson, Battle cry of freedom: The Civil War era, 1988:

Before 1861 the two words ‘United States’ were rendered as a plural noun: ‘the United States are a republic.’ The war marked a transition of the United States to a singular noun.

And also William Michael Treanor, "Taking Text Too Serously: Modern Textualism, Original meaning, and the Case of Amar's Bill of Rights", Michigan Law Review, vol. 106, 487-544 (Dec. 2007):

‘United States’ was often matched with a plural verb in 1787 and consistently matched with a singular verb after the Civil War.

In order to evaluate these claims in the case of Supreme Court opinions, Myers used the following method:

For each decade in the survey period, I ran word searches for “United States is” and “United States are” through the Westlaw Supreme Court database. To eliminate false positives, I reviewed the search results to identify opinions where (1) “United States” was a subject and (2) the associated verb was “is” (or “are,” depending on the search). To isolate only usage choices made by the author, anything appearing only in a quotation from a statute, a court rule, or another case was ignored, as was anything in West headnotes. Each opinion in a particular case was treated as a separate work, and thus a case could have more than one entry if more than one justice wrote or if a justice used both “is” and “are” in the same opinion. I collected data on usage in the opinions of justices, the arguments of counsel before the court, and supplementary material prepared by the reporter of decisions (e.g., a syllabus). Except where noted, the focus of the presentation here is on usage in opinions of the justices; data on usage in other portions of the case reports appear in the Appendix.

Here are his basic results in graphical form (click for a larger version):

His conclusion:

The Civil War does not appear to have altered the Supreme Court’s usage in a fashion as dramatic as Foote and McPherson have suggested. In the 1860s, the usage pattern shifts away from “are” and toward “is,” and it is during that decade that usage of “is” first predominates. But the change is not wholesale – “are” and “is” were used roughly equally in the 1860s. In the following decade, Court usage reverted back to antebellum patterns. For the remainder of the nineteenth century, plural usage predominated in Supreme Court opinions, though by slowly declining margins.

Usage was quite clearly unsettled in the latter part of the nineteenth century. One of the most striking demonstrations of this is Justice Samuel F. Miller’s majority opinion in United States v. Lee. Justice Miller managed to compose a sentence with both usages:

“[T]he doctrine [of sovereign immunity], if not absolutely limited to cases in which the United States are made defendants by name, is not permitted to interfere with the judicial enforcement of the established rights of plaintiffs when the United States is not a defendant or a necessary party to the suit.”

He observes that some of the obvious theories about sources of variation don't pan out, at least in this data set:

Geography does not help explain this pattern. Looking at the geographic latitudes of the justices’ residences prior to appointment, there is no meaningful difference between the mean latitude for the exclusive “are” users and the mean for those who dabbled in “is.”

Politics doesn't seem to help either, at least in the obvious way:

To see whether the Civil War might have influenced usage in a different way, I isolated the usage by justices who were appointed by President Abraham Lincoln. In fact, during the period when at least one justice appointed by Lincoln was on the Court, the five Lincoln-appointed justices used “are” slightly more frequently than did the other justices.

Here's his appendix, giving the counts in different sorts of SCOTUS material:

It would be interesting to look at some other features as well — number agreement with verbs other than is/are; (which would help to increase the rather small counts from this source); the distribution of pronouns co-referential with the United States (19th-century newspapers give us examples of they, it, she, and we); what fraction of "United States" instances are subjects as opposed to modifiers or PP complements or whatever; how usage is affected by the topic, e.g. relations of the federal government to foreign governments, to the states separately, to individual citizens or companies, etc.

West's materials are definitely not accessible for automatic processing of such questions; but most if not all of the same documents are available on the web, I think, so this might be a good testing ground for the idea of automatic or semi-automatic analysis of this type.

A "semantic" difference

Wed, 10/07/2009 - 9:59am

From a NYT story (Shaila Dewan, "Pollster's Censure Jolts News Organizations", October 3) on the polling company Strategic Vision, which has been reprimanded by a professional society of pollsters for failing to disclose "essential facts" about its methods:

As for the accusation that the company's claim to be based in Atlanta was misleading, Mr. Johnson [David E. Johnson, the founder and chief executive of Strategic Vision] acknowledged that the main Strategic Vision office was in Blairsville, Ga., 115 miles away, but said the difference was "semantic".

Yeah, yeah, blame it on the words. "Semantic" here means 'only semantic, not substantive' and locates the problem not in differences of matters of fact but in differences in the meanings of linguistic expressions. The claim is that some people use certain expressions (like based in Atlanta) one way, while other people use these expressions somewhat differently, so that any dispute about the state of things is "just / merely / only" a dispute about word meanings.

Now, there's plenty of variation in the meanings people assign to words (and other expressions), and lexicographers, dialectologists, sociolinguists, and theoretical linguists examine this variation all the time. The question is whether SV's use of based in Atlanta is an instance of this sort of variation. As a rule of thumb, you should be suspicious whenever someone who's not professionally involved in the study of semantic variation dismisses some difference as "(just) semantic(s)" or the like; it's likely to be a dodge, or at least a stretching of the truth.

In the case at hand, what's at issue is what counts as being in some location. There's a certain amount of allowable leeway in such things, according to which you can get by saying that your company is located in X when in fact it's in a suburb of X or in a separate jurisdiction within the boundaries of X. So if your company is located in West Hollywood, Santa Monica, or Burbank, it wouldn't be entirely misleading to say (in some contexts) that it's in Los Angeles (though "in the Los Angeles area", or something similar, would be a more scrupulous phrasing).

But even when places are in the same metropolitan area and are close to one another, sometimes few people would accept as "located/based in [principal city of the whole area]" as an identification for a company. A company based in Oakland, Berkeley, Palo Alto, or Mountain View (not to mention San Jose or Santa Cruz) can't get away with saying it's "based in San Francisco". "Based in the Bay Area", yes, but not "based in San Francisco". Sometimes, close doesn't count.

Washington DC and Baltimore MD are different locations, even though they're only 34 miles apart; similarly, Boston MA and Providence RI, only 41 miles apart. And then on to New York NY and New Haven CT, 67 miles apart; Chicago IL and Milwaukee WI, 83 miles apart; New York NY and Philadelphia PA, 86 miles apart; Cleveland OH and Erie PA, 92 miles apart. (All under the 115-mile mark.)

You don't have to cross state lines: Philadelphia and Harrisburg PA are 90 miles apart; Columbus and Cincinnati OH, 100 miles apart; Los Angeles and San Diego CA, 111 miles apart. Just a tad over the 115-mile mark are Columbus and Cleveland OH, 124 miles apart. (These lists are not intended to be exhaustive, merely representative; the distances are from the Geobytes City Distance Tool.)

The point is that it can be seriously misleading to say that your company is based at location X when in fact it's based at location Y, even if Y is not far from X or is in Y's cultural orbit.

Note: "in its cultural orbit". I imagine that David Johnson would like to claim that he can say that his company is based in Atlanta because Atlanta is the largest city near the tiny town of Blairsville (though Knoxville TN is 122 miles from Blairsville). But thinking this way would allow all sorts of mischievous misrepresentation. For instance, a company with its headquarters in Auburn AL could represent itself as "based in Atlanta" (a mere 106 miles away from Auburn).

It might have been useful for Johnson to refer to Atlanta in some way in locating his company. (Who knows about Blairsville, after all?) Or he could have said that the company is located in Union County, at the very northern edge of Georgia. But saying that the company is located in Atlanta just won't do, and trying to deflect criticism of this claim by saying it's all a matter of semantics won't do either.

[Addendum 10/8: Peter Taylor writes: "One of the criticisms levelled at Strategic Vision, LLC is that there is a well-known (and, it is alleged, better-known) polling company called Strategic Vision, Inc. based in San Diego. At the moment your LL post refers simply to "Strategic Vision". You may wish to clarify."]

Variation and second language transcription

Wed, 10/07/2009 - 4:00am

I was trying to keep up with the news on Iran's "secret new nuclear enrichment facility" a couple of weeks ago, as I'm sure many of our readers were also doing. In reading one update in the NYT, I came upon this quotation:

[Vice President Ali Akbar Salehi, head of Iran's nuclear program, said in an interview with ISNA news agency on Sunday, said] that Iran had taken defensive measures against possible military threats against the facility into consideration. "We are always faced with threats," he said. "We don't think that those threats would necessarily take place but we have prepared ourselves for the worse."

Shouldn't that be "for the worst"?, I found myself asking as I read this. But then I remembered the fact that [t] and [d] are highly likely to be deleted (= unpronounced) in this kind of position (word- and utterance-finally and after another consonant) in many if not most (most if not all?) spoken varieties of English, even when the distinction between e.g. worse and worst is at stake — a somewhat subtle distinction in most contexts anyway, including this one. This deletion has also been found to be even more likely among (some groups of) second language speakers, which we can reasonably assume the translator (and/or the transcriber) to be. [ I've not been able to find the original quote, but given that the ISNA is primarily a Persian-language news agency (with an available English-language version), I assume that this English quotation was not original to Salehi but rather that it is a translation of the Persian original. ]

Quick Google searches for {"prepare for the worst"} and {"prepare for the worse"} reveal both that the variant with worst is almost 10 times more common than the variant with worse (~25M ghits vs. ~2.6M ghits) and what appears to me to be a subtle but not insignificant class distinction of sorts: the worst variant seems to be found in more formal, "corporate" sites (book publishers, magazines, and the like), while the worse variant seems to be found in more informal sites (message boards, blogs, and the like). Overall, though, not too shabby a showing for the worse variant.

Fact-checking George F. Will, one more time

Tue, 10/06/2009 - 11:55am

George F. Will, "An Olympic Ego Trip", WaPo, 10/6/2009:

In the Niagara of words spoken and written about the Obamas' trip to Copenhagen, too few have been devoted to the words they spoke there. Their separate speeches to the International Olympic Committee were so dreadful, and in such a characteristic way, that they might be symptomatic of something that has serious implications for American governance.

Both Obamas gave heartfelt speeches about . . . themselves. Although the working of the committee's mind is murky, it could reasonably have rejected Chicago's bid for the 2016 Games on aesthetic grounds — unless narcissism has suddenly become an Olympic sport.

In the 41 sentences of her remarks, Michelle Obama used some form of the personal pronouns "I" or "me" 44 times. Her husband was, comparatively, a shrinking violet, using those pronouns only 26 times in 48 sentences. Still, 70 times in 89 sentences conveyed the message that somehow their fascinating selves were what made, or should have made, Chicago's case compelling.

The last time George F. Will trotted out his opinion that president Obama is "inordinately fond of the first-person pronoun", I did some counts ("Fact-checking George F. Will", 6/7/2009). As I explained:

…since I'm one of those narrow-minded fundamentalists who believe that statements can be true or false, and that we should care about the difference, I decided to check. …

I took the transcript of Obama's first press conference (from 2/9/2009), and found that he used  'I' 163 times in 7,775 total words, for a rate of 2.10%. He also used 'me' 8 times and 'my' 35 times, for a total first-person singular pronoun count of 206 in 7,775 words, or a rate of 2.65%.

For comparison, I took George W. Bush's first two solo press conferences as president (from 2/22/2001 and 3/29/2001), and found that W used 'I' 239 times in 6,681 total words, for a rate of 3.58% — a rate 72% higher than Obama's rate. President Bush also used 'me' 26 times, 'my' 31 times, and 'myself' 4 times, for a total first-person singular pronoun count of 300 in 6,681 words, or a rate of  4.49% (59% higher than Obama).

For a third data point, I took William J. Clinton's first two solo press conferences as president (from 1/29/1993 and 3/23/1993), and found that he used 'I' 218 times, 'me' 34 times, 'my' 22 times, and 'myself' once, in 6,935 total words. That's a total of 275 first-person singular pronouns, and a rate of 3.14% for 'I' (51% higher than Obama), and 3.87% for first-person singular pronouns overall (50% higher than Obama).

As a result of this previous experience, I had a first-person-counting script all ready to go, and it took only a few seconds to check the new transcripts. This time around, Barack Obama's Olympic remarks included 26 first-person-singular words out of 1130, for a rate of 2.3%. This is slightly below his typical rate for presidential press conferences, and a bit more than half the rate of the George W. Bush pressers that I measured earlier (2.3/4.49 = 51%, to be precise).

[Give me some links for presidential remarks at events more comparable to these, and I'll check them out as well — I don't have time to look around this afternoon.]

It's true that Michelle's tally was higher — 45 first-person-singular words out of 781, for a rate of 5.76%.

This is almost as much as the 6.4% first-person-singulars registered by Nancy Reagan's statement on Edward Kennedy's death, or the 7.0% achieved by her remarks at the christening of the USS Ronald Reagan in 2001, or the 10.0% notched by her discussion of the assassination attempt on her husband. [Again, give me pointers to ceremonial remarks by former first ladies on occasions like the Geneva meeting, and I'll tally them as well.]

Mr. Will also complains about the

…  egregious cliches sprinkled around by the tin-eared employees in the White House speechwriting shop. The president told the Olympic committee that: "At this defining moment," a moment "when the fate of each nation is inextricably linked to the fate of all nations" in "this ever-shrinking world," he aspires to "forge new partnerships with the nations and the peoples of the world."

Unfortunately, I don't have a program ready to hand for measuring cliche-density, much less cliche egregiosity, but I'll work on it. My prediction: in speeches prepared for ceremonial occasions like this one, the cliche density of presidential rhetoric has been fairly constant for decades if not centuries.

There are two interesting questions here, it seems to me. The first one is why George F. Will is so struck by rates of first-person usage, on the part of Barack and Michelle Obama, that are significantly lower than has been typical of recent presidents and first ladies on similar occasions. The second question is how many pundits and talking heads will follow his brainless lead this time around.  For some attempts to tally the score from the last go-round, you could check out these LL posts:

"Fact-checking George F. Will" (6/7/2009); "Obama's Imperial 'I': spreading the meme" (6/8/2009); "Inaugural pronouns" (6/8/2009); "Another pack member heard from" (6/9/2009); "I again" (7/13/2009); "'I' is a camera" (7/18/2009).

And if you're curious about what inferences, if any, can be drawn from someone's rate of first-person-singular usage, see Jamie Pennebaker's guest post "What is 'I' saying?",  8/9/2009.

[Now that I think of it, there's another significant question here as well. How in the world did our culture  award major-pundit status to someone whose writings are as empirically and spiritually empty as those of George F. Will?]

[Update — I clearly haven't been paying attention to the right pundits. The "Obama is a narcissist" meme has seen a surge among Republican beltway insiders in recent weeks:

Mona Charen, "Obama's Self-Worship", Real Clear Politics, 9/25/2009:

President Obama's speech to the United Nations has been called naive and even "post-American." It was something else, as well: the most extravagant excursion into self-worship we have yet seen in an American leader.

Michael Gerson, "All about Obama", 9/26/2009, Washington Post:

I can recall no other major American speech in which the narcissism of a leader has been quite so pronounced.

David Frum, "Obama's Narcissism", newmajority, 9/26/2009:

Michael Gerson's reading of President Obama's speech to the U.N. is both shrewd and damning.

Marty Peretz, "Rio, 1 — Chicago, 0. The Politics of Narcissism and General McChrystal", TNR, 10/4/2009:

What I suspect is that the president is probably a clinical narcissist. This is not necessarily a bad condition if one maintains for oneself what the psychiatrists call an "optimal margin of illusion," that is, the margin of hope that allows you to work. But what if his narcissism blinds him to the issues and problems in the world and the inveterate foes of the nation that are not susceptible to his charms?

And so on.  So George Will was just adding his pebble to a pot of stone soup that was already on the boil.  I'm not sure whether this makes his column less stupid — because he's chiming in to support one of his cohort's talking points — or more stupid — because the idea, though apparently vacuous, is not even his.]

[Update #2 — in the comments, Sinfonian points us to his tally of FPS pronouns in three pages of George Will's essay "The Cubs and Conservatism": 29 in 853 words, or 3.4%.  Less than George W. Bush's press conference, but more than Obama's Copenhagen speech.]

Safire on Sunday

Tue, 10/06/2009 - 7:56am

That's what I called my own piece on William Safire, which runs today on "Fresh Air" and is online here. I cover some of the same ground that Ben does in his pitch-perfect Times magazine piece, mentioning his generosity to his critics and his willingness to acknowledge his mistakes. A very different tenor from his weekday columns — I think his Sunday readers got the best of him. I also pay tribute to his disinclination to engage in the rhetorical high jinks of other popular grammarians:

He was no snob. You can't imagine him comparing a poet who confused between and among with someone picking his nose at a party, the way John Simon once did. And he wasn't susceptible to the grammatical vapors that affect writers like Lynne Truss — the people who like to describe lapses of grammar as setting their teeth on edge, making their skin crawl, or leaving them gasping for breath, as if they'd spent all their lives up till now closeted with Elizabeth and Darcy in the morning room at Pemberley. 

Above all, there was his ability to convey his pleasure in ruminating on language: "It wasn't just that he loved words — who doesn't? But he really, really liked them."

Other things on Safire worth looking at include Jan Freeman's piece in the Boston Globe (if I had read this before I wrote mine I probably wouldn't have bothered) and Todd Gitlin's in the New Republic, as well as a Newsweek reminiscence by Aaron Britt, who served as Safire's assistant for a while. (The New Republic also posted part of a 1987 review of one of Safire's language books by Louis Menand.) For a more unforgiving take, see David Bromwich's "Wars Made Out Of Words." Feel free to add links to other pieces in the comments.

The United States as a subject

Tue, 10/06/2009 - 7:05am

The widely-watched PBS documentary The Civil War included this commentary by Shelby Foote:

Before the war, it was said "the United States are." Grammatically, it was spoken that way and thought of as a collection of independent states. And after the war, it was always "the United States is," as we say today without being self-conscious at all. And that sums up what the war accomplished. It made us an "is."

Innumerable history lectures have featured similar rhetoric, but as a biologist friend of mine once said about a popular but flamboyantly inventive documentary in his area of specialization, "this is, well, poetically true". In real life, that is, it's false. The civil war may have "made us an 'is'", but it doesn't seem to have brought about any abrupt change in the grammar of "the United States".

I write "doesn't seem to" because no one seems ever to have checked, at least not very thoroughly. So after a few years of intending to get to it, I've done a bit of poking around. And I've discovered two things. First, we need a change in how historical text archives are managed. (At least, I do.) And second, number-agreement — on whatever time scale it happened — is not at all, in my opinion, the most interesting historical change in the grammatical treatment of "the United States".

The executive summary of these two points: First, web-based search of digital text archives is well and good, but it's also critical for scholars to be able to run arbitrary computer programs over entire historical text corpora. In most cases, there's no provision for distribution of the texts that would make that possible; in some cases, the "business model" for the digitization process may actually prevent it.  Second, and more substantively, there's a striking increase during the 19th century in the propensity of the phrase "the United States" to occur in subject position, reflecting an increase in perceived agency and perhaps even in animacy (i.e. personification). In the early decades of the 19th century, "the United States" hardly ever occurs as a grammatical subject; today, about half of all textual occurrences are in subject position. Much more research will be needed to determine the time course of this change, but in newspaper text, it may have been associated with reporting and editorializing about military and diplomatic activities in the 1840s such as the struggle over Oregon and the Mexican-American War.

Let me start by tracing Shelby Foote's pontification to its historical roots. Basil Lanneau Gildersleeve wrote in Hellas and Hesperia; or, The vitality of Greek studies in America (1909) that

Not that I am ashamed of being a grammarian, and if I chose I might enlarge on the historical importance of grammar in general, and Greek grammar in particular. It was a point of grammatical concord which was at the bottom of the Civil War — "United States are," said one, "United States is," said another; and a whimsical scholar of my acquaintance used to maintain that the ignorance of Greek idiom that brought about the mistranslation "Men and brethren" (Acts ii, 29) is responsible for the humanitarian cry, "Am I not a man and a brother?" which made countless thousands mourn.

Gildersleeve, who fought for the Confederacy, is referring to a popular anti-slavery medallion by Josiah Wedgwood showing a kneeling slave in chains with the inscription "Am I not a man and a brother?". His little joke about United States number agreement was not apparently founded on any textual scholarship, but neither was it original — a similar thought can be found in G H Emerson, "The Making of a Nation", The Universalist Quarterly and General Review, January 1891:

For about a decade the states, under the technical name, "The United States of America," were a Confederacy; but when the Constitution was adopted the United States was. "They" gave place to "it." And as Mr. Fiske in his latest book, "Civil Government in the United States," has noted, the change from the plural to the singular was vital, though it has taken a War of Rebellion to make the difference unmistakable.

And Fiske in turn expressed the thought this way, in his 1891 work Civil Government in the United States Considered with some Reference to its Origins:

From 1776 to 1789 the United States were a confederation; after 1789 it was a federal nation. The passage from plural to singular was accomplished, although it took some people a good while to realize the fact.

All of these pre-Foote versions of the meme assume that the grammatical consequence of this political change was a gradual one, starting with the Constitutional Convention and proceeding through the 19th century. In this picture, the Civil War was one episode in a long argument over interpretation, starting earlier and continuing later; it was not the cause of any abrupt change in grammatical behavior.  Foote's contribution to this area of  meta-linguistic ideology was to invent (or at least popularize) the whole abrupt behavior-change story.

I've taken these citations from Ben Zimmer's discussions in alt.usage.english ("These United States", 10/5/2004), Language Log ("Life in these, uh, this United States", 11/24/2005) and in his Word Routes column at the Visual Thesaurus ("The United States Is… Or Are?", 7/3/2009). As Ben observes, there is at least one limited attempt at genuine textual scholarship on this point, in the form of a newspaper article by John W. Foster, "ARE OR IS?; Whether a Plural or a Singular Verb Goes With the Words United States", NYT, May 4, 1901:

The reason which has largely controlled the use of the plural verb with "United States" is one of euphony. It seems more natural and euphonistic to couple with this phrase "have" or "were," rather than "has" or "was." In public documents, such as the Presidents' messages, I find a number of examples where both the singular and plural forms are used in the same paper, and sometimes in the same sentence. For instance, Secretary Bayard: "The United States have no reason to believe that any discrimination against its citizens is intended." As the writer gets away from the phrase in the plural form, he escapes the euphonistic influence, and recurs to the the true significance of the words.

[…]

The result of a somewhat cursory examination of the treatment of "United States" by our public men and official bodies may be found curious, if not decisive of the proper or permissive use of the verb and pronoun in connection with that phrase. It is found that in the earlier days of the Republic the prevailing practice was the use of the plural, but even then many of our pulbic men at times employed the singular. Among statesmen who have used the the singular form may be cited Hamilton, Webster, Silas Wright, Benton, Schurz, Edmunds, Depew. Of our Secretaries of State Jefferson, Marcy, Sweard, Fish, Evarts, Baline, Frelinghuysen, Bayard, Gresham, and Olney. Among diplomats Motley, C. F. Adams, E.J. Phelps, and Reid. Of living professors of international law and lawyers Woolsey of Yale, Moore of Columbia, Huffcut of Cornell, and James C. Carter of New York. In the earlier message of the Presidents the use of the singular verb is seldom found, Jackson's being the only one noted; but in later years Lincoln, Grant, Cleveland, Harrison, and McKinley. Messages of the last three are found in which the singular verb alone is used throughout the message in connection with "'United States."

The decision of the Supreme Court in the earlier years rarely show the use of the singular, but several cases have been found, and in later years its use has been growing much more frequent.

The result of my examination is that, while the earlier practice in referring to the "United States" usually followed the formula of the Constitution, our public men of the highest authority gave their countenance, by occasional use, to the singular verb and pronoun: that since the civil war the tendency has been toward such use; and that to-day among public and professional men it has become the prevailing practice.

For today, I'll close with a few counts and examples from the Pennsylvania Civil War Newspapers archive at Penn State, "A collection of newspapers from the civil war era dated from February 23,1831 to February 14,1877."

I checked the first 50 articles containing the phrase "the United States" in the year 1836, published between January 1 and April 25. These involved roughly 150 tokens of the phrase (I didn't try to count them, but there are typically several in an article where there is one). Of all of these, there was only one in subject position (with plural verb agreement), in a story about the Texas War of Independence:

Volunteers arrive daily; and our marine is in a state to blockade the Mexican ports. The result of the delay in the actual strife with the central government will be a radical separation; and if we may credit rumors, the United States propel to this; we shall see hereafter.

There is one other where "the United States" refers to the frigate rather than to the nation:

The United States, we believe, was built in Philadelphia. [refers to the ship "which has recently undergone a thorough repair at New York"]

The other examples are all attributives (the United States Mint, the United States Senate, the United States Infirmary, the United States Bank, the United States ports), or heads of prepositional phrases (the government of the United States, the president of the United States, the Bank of the United States, trade with the United States, the northern coasts of the United States), or verbal objects, or etc.

Searching similarly in the year 1846, I looked at the first 50 articles containing the phrase "the United States", published between January 1 and June 27. These contained 18 cases where the phrase occurs in subject position. (And singular agreement is almost as common and plural agreement.) FWIW, here the examples are:

It is contended, on the part of Great Britain, that the United States acquired and hold the Spanish title subject to the terms and conditions of the Nootka Sound convention

In the mean time, the United States were proceeding with the discoveries which served to complete and confirm the Spanish American title to the whole of the disputed territory.

Will the United States allow 20,000 of these bitter and irreconcilable foes [the Mormons] to take possession of any portion of the Pacific coast that is now or may hereafter by purchase become ours?

In the discussion on the address in the Chamber of deputies, the United States and Texas have likewise come in for a good deal of observation. […] He observed that it was appeared to him, from the remarks in the President's message, that the U. States were dissatisfied in the Texas affair, …

The United States in annexing Texas had assumed the responsibility that devolved upon Texas antecedent to that event.

The unprejudiced of all parties, we doubt not, will freely admit that the United States have a clear right to the territory on which Gen. Taylor is stationed with his troops, and if so, the charge of the Gazetter, that the President has invaded Mexico, is utterly untrue.

Certain it is, that if Texas had not that right, then the United States had not;

He attempted to show that President Polk had trampled upon the constitution of his country — that Gen. Taylor, by his orders, had invaded Mexico — that his army was posted upon soil which did not belong to Texas, and over which neither the Republic or the United States had even exercised civil jurisdiction.

This newspaper has always maintained that neither England nor the United States is entitled to Oregon, and it seizes this occasion to recommend the French government to insist on the whole territory being declared neutral.

…it is now entirely proper to remind our readers that the United States has for a long series of years in terms mild and conciliatory, been endeavoring to obtain from Mexico a fair and just rumuneration for the "injuries and wrongs"; sustained by our citizens.

Against Mexico the United States had a black catalogue of robbery, insult and perfidy, anterior to the Texan controversy.

As we said before, we have those in our midst who declare that the United States is in the wrong.

Is he really willing to vote for resolutions recommending a vigorous prosecution of the war, and in the same breath to declare that it is an unjust war, and that the United States is in the wrong?

From what I can collect, I am of opinion that if the United States, at present, were to attempt to conquer Mexico, or even to annex any considerable portion of its territory, they would cause great dissatisfaction in France; …

This calculation is based somewhat upon the idea that the United States will order an expedition from the Missouri river upon the northern provinces.

The United States of America will never recede in the face of Monarchy; they must greet a kindred Republic across the Rio Grande, or advance and entrench themselves upon the ragged steeps and defiles of the Sierra Madre.

In other words, while the treaty of peace and commerce between Mexico and the United States is in full force, the United States, presuming on her strength and prosperity, and on our supposed imbecility and cowardice, attempts to make you the blind instrumnets of her unholy and mad ambition, and force you to appear as the hateful robbers of our dear homes, and the unprovoked violators of our dearest feelings as men and patriots.

Two swallows don't make much of a summer, but that's all for now.

I've done a bit more research, which I'll cover in a later post, along with an account of the ideas about animacy, agency and subjecthood pioneered by Michael Silverstein ( "Hierarchy of Features and Ergativity", in R.M.W. Dixon (ed.), Grammatical Categories in Australian Languages, 1976), and widely discussed since by linguists (e.g. Judith Aissen, "Markedness and Subject Choice in Optimality Theory", Natural Language and Linguistic Theory, 1999) and psycholinguists (e.g. F. Ferreira, "Choice of Passive Voice is Affected by Verb Type and Animacy", Journal of Memory and Language, 1994; Willem Mak et al. "Animacy in processing relative clauses: The hikers that rocks crush", Journal of Memory and Language, 2006).

If I could get hold of the underlying texts, then rather than painfully reading all this stuff by hand, I could classify examples automatically on a large scale, and make more serious progress much more rapidly on a picture of this phrase's changes in number agreement and subjecthood — and their relationship — over time and space. That's just what I hope to do, if the archivists are kind.

Further thoughts on the Language Maven

Mon, 10/05/2009 - 8:48pm

In this Sunday's "On Language" column in the New York Times Magazine (already available online here), I take a look back at the legacy of the column's founder, William Safire. As I write there, "Safire's acute awareness of the limits of his own expertise was often lost on fans and critics alike." Indeed, the "language maven" title that he liked to use was intended to be self-deprecating. (Some might say "self-depreciating," but let's not open that can of worms.)

Part of that self-awareness was a willingness to acknowledge his errors in judgment. In that spirit, I follow up the "On Language" tribute with my latest Word Routes column on the Visual Thesaurus, taking a look at one of Safire's early miscues: declaring, in 1979, that could care less was a "vogue phrase" on its way to extinction. Thirty years later, the verdict is: not so much. Fortunately, Safire didn't often confuse his language mavenry with futurology.

Invented facts from the Vicar of St. Bene't's, part 2

Mon, 10/05/2009 - 12:57am

The Reverend Angela Tilby ended her scandalously unresearched little "Thought for the Day" talk of 1 October 2009 (part of which I have already discussed in this recent post) by suggesting that during the British political party conference season (i.e., right about now) we should try taking a blue pencil and editing out all the adjectives from the political speeches so that we could "see what is really being said about people, places, things, deeds and actions". She holds to the ancient nonsense about how nouns tell us the people, places, and things while verbs give us the deeds and actions but adjectives give us nothing but qualifications and hot air and spin — they contribute no content. And she is clearly implying that she (cynically) expects political speeches to be full of adjectives. But as before, she hasn't done any checking at all, she has just spouted her conjectures straight into the microphone. So let's try a second breakfast experiment, shall we?

I examined the first few paragraphs of the transcript of Prime Minister Gordon Brown's speech to the 2009 Labour Party Conference the other day (curiously, he seems to have begun with the coordinator and). Here is the first part of the text, with the adjectives underlined (and again, I am counting them very conservatively, ignoring many items that traditional grammars include under the adjective heading):

And so today, in the midst of events that are transforming our world, we meet united and determined to fight for the future.

Our country confronts the biggest choice for a generation. It's a choice between two parties, yes. But more importantly a choice between two directions for our country.

In the last 18 months we have had to confront the biggest economic choices the world has faced since the 1930s.

It was only a year ago that the world was looking over a precipice and Britain was in danger. I knew that unless I acted decisively and immediately, the recession could descend into a great depression with millions of people's jobs and homes and savings at risk.

And times of great challenge mean choices of great consequence, so let me share with you a little about the choices we are making.

The first choice was this: whether markets left to themselves could sort out the crisis; or whether governments had to act. Our choice was clear; we nationalised Northern Rock and took shares in British banks, and as a result not one British saver has lost a single penny. That was the change we chose. The change that benefits the hard working majority, not the privileged few.

And we faced a second big choice — between letting the recession run its course, or stimulating the economy back to growth. And we made our choice; help for small businesses, targeted tax cuts for millions and advancing our investment in roads, rail and education. That was the change we chose - change that benefits the hard working majority and not just a privileged few.

And then we had a third choice, between accepting unemployment as a price worth paying, or saving jobs. And we in Britain made our choice, it's meant half a million jobs saved. And so, Conference, even in today's recession there are 29 million people in work. 2 million more men and women providing for their families than in 1997.

That's 23 adjectives in 332 words, or 6.9 percent. Decisively less than the scientific paper analyzed earlier, and roughly the frequency one would expect from any ordinary text.

What the Rev. Tilby says is that we should try deleting all the adjectives, which is really absurd (though in fact it is exactly what Alistair Cooke seems to have thought, delusionally, that he used to do to all his radio scripts).

The self-appointed writing gurus who preach in these extreme terms against adjectival modification seem to forget that sometimes adjectives are there because they are crucial not only to the sense but to the structure. Delete the adjectives in this sentence of Brown's and you get a result that doesn't even seem grammatical, and certainly doesn't have anything like the truth conditions of the original. These two are not synonymous:

In the last 18 months we have had to confront the biggest economic choices the world has faced since the 1930s.
In the 18 months we have had to confront the economic choices the world has faced since the 1930s.

Another example, where the result is not even grammatical:

[T]he challenge of change demands nothing less than a new model for our economy, a new model for a more responsible society and a new model for a more accountable politics.
*The challenge of change demands nothing less than a model for our economy, a model for a more society and a model for a more politics.

To make any sense of the claim that such prose could be improved by removing adjectives one would have to propose completely removing all traces of the adjective phrases to which they belong. That would give us the following:

The challenge of change demands nothing less than a model for our economy, a model for a society and a model for a politics.

Why is the Rev. Tilby suggesting that we would understand his proposals better if he couldn't draw the distinction between models and new models, between societies and responsible societies, between politics and accountable politics?

Not that the Prime Minister would have been totally unable to convey his drift, of course. He could in principle have rephrased using only abstract nouns, thus completely avoiding the anti-adjective critique:

[T]he challenge of change demands nothing less than a model for our economy that has novelty, a model (with novelty) for a society that has responsibility to an extent exceeding the responsibility of society as it now exists and a model (with novelty) for a politics with a degree of accountability that exceeds the degree of accountability that politics has today.

Is the Rev. Tilby expecting us to believe that this is an improvement, bringing greater clarity? Has she completely lost her wits? Or did she simply not give any thought to what she was saying?

The notion that you can better see what is being said when the adjectives are removed is simply (yes, I do have to use an adjective here) asinine. Gordon Brown says at one point:

[T]hese are my values — the values I grew up with in an ordinary family in an ordinary town. Like most families on middle and modest incomes we believed in making the most of our talents.

Deleting the adjectives from it yields this:

These are my values — the values I grew up with in a family in a town. Like most families on incomes we believed in making the most of our talents.

What is the point of this ridiculous pretense that it would be a better political world if Brown were blocked from distinguishing ordinary families from unusually affluent ones, not allowed to draw the distinction between having an income and having a median-level income?

Here's why I bothered to write anything at all about a pathetic little 500-word radio sermon: I am so sick of seeing stupid writing advice handed out by pusillanimous pseudo-experts on language — dim-witted vicars like Angela Tilby, pontificating authoritarians like E. B. White in the chapter he added to The Elements of Style, and all the English teachers who have (while hypocritically making free to constantly using adjectives in their own writing) poisoned the reputation of adjectives down the centuries (see the first chapter of Ben Yagoda's delightful little book on the parts of speech, When You Catch an Adjective, Kill It).

These people are wasting educational time and effort, and helping to drive students into a state that I have written about before, characterized by "vague unease instead of a sense of mastery," and feeling "less sure of themselves, yet no better informed," so that their writing ability is "probably being harmed rather than enhanced" — in short, a state of nervous cluelessness about language.

Repeating the falsehood that adjectives are bad in general makes people less able to see what is wrong when they really are over-used. For a remark about the real lesson of Dan Brown's over-use of adjectives, with a diagnosis of what is wrong, see my piece "He doesn't trust us" on the New York Magazine site. There's a real point to be made, I think; but it's not about the adjective category per se.

Adjectives are neither good nor bad. The dumb usage pundits who recommend eschewing them totally are handing out advice that is at best exactly what Angela Tilby wrongly claims adjectives are (vapid, empty, and superfluous), and at worst clearly mistaken.

BBC signals crash blossom threat

Sun, 10/04/2009 - 9:35am

Josh Fruhlinger sends along today's entry in the "crash blossom" sweepstakes, a headline from the BBC News website:

SNP signals debate legal threat

Crash blossoms (as we've discussed here and here) are infelicitously worded headlines that cause confusion due to a garden-path effect. Here we begin with SNP, which British readers at least will recognize as the abbreviation for the Scottish National Party. Then comes signals, which can be a plural noun or a singular present verb; following a noun, most readers would expect it to work as a verb. The third word, debate, can be a singular noun or a plural verb, and if you've parsed the first two words as Noun + Verb, then you'll be inclined to take debate as the direct object of the verb. So far, so good. But then comes legal threat. What to do now?

Well, you could go back to the beginning of the headline for a reparsing, now construing signals as a plural noun modified by SNP. That would allow you to continue on with debates as a plural verb and legal threat as the object of the verb. But what in the world are SNP signals and why are they debating a legal threat?

Turns out the first path was moving in the right direction. Signals is indeed the verb here, and the object of the verb is debate legal threat — one of those wonderfully opaque compound nouns that British headlines are prey to. You see, debate legal threat refers to threatened legal action that could be taken if the SNP isn't permitted to take part in televised debates before the next UK election. And the SNP is now signaling that it may follow through on this threat.

We've had fun with such outrageous compounding in previous posts (Geoff Pullum in "Noun noun noun noun noun verb," "Canoe wives and unnatural semantic relations," and "Dentist fear girl," and Mark Liberman in "UK death crash fetish?"). This one's a bit different in that the second element of the compound noun, legal threat, is a noun phrase consisting of an adjective modifying a noun. That makes debate legal threat unusually hard to parse.

Noun-Adjective-Noun compounds are possible in English, of course — think of such constructions as Minnesota Supreme Court, Obama White House, Guardian front page, or Microsoft legal team. In those cases, however, the Adjective-Noun component is a set phrase (Supreme Court, White House, front page, legal team), which makes the addition of a premodifying noun unproblematic. But legal threat is not such a set phrase, and debate is not an immediately obvious choice for an attributive noun ready for grafting (certainly not compared to the proper nouns in my examples: Minnesota, Obama, Guardian, Microsoft). So these factors, plus the ambiguous syntactic role of the preceding word, signals, conspire to make this crash blossom particularly crashy.

Invented facts from the Vicar of St. Bene't's, part 1

Sun, 10/04/2009 - 8:09am

"Thought for the Day" is a four-minute reflective sermon delivered each morning on BBC Radio 4 at about ten to eight by some representative of one of the country's many religious faiths. On the first day of October the speaker was the Reverend Angela Tilby, Vicar of St Bene't's in Cambridge, England. (Bene't is an archaic shortened form of Benedict.) Developing a familiar theme from prescriptivist literature, she preached against adjectives. It was perhaps the most pathetic little piece of inspirational prattle I have ever heard from the BBC (read the whole misbegotten text here).

"Adjectives advertise," claims the Rev. Tilby, and "brighten up the prose of officialdom", but she was always "encouraged to be a bit suspicious" of them when she was a girl: "Rules of syntax kept them firmly in their place" (as if the rules of syntax left everything else to do what it wanted!). This was good, she seems to think, because "For all their flamboyance they don't really tell you much." Adjectives "float free of concrete reality" like balloons, and are guilty of "not delivering anything except, perhaps, hot air." Which aptly describes her babbling thus far. But now, inflated with overconfidence, she risks some factual statements. And steps from the insubstantial froth of metaphor into the stodgy bullshit of unchecked empirical claims about language use.

I shall deal with only one such claim in this post. Another will be dealt with later.

Because adjectives are so airy-fairy, the Rev. Tilby holds, "you don't find many adjectives in scientific prose and when you do they are precise and exact." I'm sure that Language Log readers will realize instantly that it is time for what Mark Liberman calls a breakfast experiment.

Keep in mind, as I undertake the experiment, that in most kinds of English prose about 6% of the words are adjectives (see Douglas Biber et al., Longman Grammar of Spoken and Written English, London: Longman, 2002, p. 506). In academic prose it's a little higher, around 8%.

I turned to the home page of what is arguably the most important general science journal in the world, Nature, picked the second article title from the top of the page ("Cheater resistance is not futile", by Anupama Khare, Lorenzo A. Santorelli, Joan E. Strassmann, David C. Queller, Adam Kuspa, and Gad Shaulsky, doi:10.1038/nature08472; it just looked somewhat more interesting to me than the first one), and did just a little bit of counting.

You'll notice that the last word of the title (futile) happens to be an adjective, so that's 20% in the title. The first word of the opening sentence of the abstract (cooperative) is also an adjective, and so is the second, and so is the 5th (that's over 15% so far). Here is the whole of the abstract, with the adjectives underlined (I've been very conservative, not counting many items that traditional grammars classify as adjectives: articles, demonstratives, numerals, other determinatives, genitive pronouns, or nouns functioning as attributive modifiers):

Cooperative social systems are susceptible to cheating by individuals that reap the benefits of cooperation without incurring the costs. There are various theoretical mechanisms for the repression of cheating and many have been tested experimentally. One possibility that has not been tested rigorously is the evolution of mutations that confer resistance to cheating. Here we show that the presence of a cheater in a population of randomly mutated social amoebae can select for cheater-resistance. Furthermore, we show that this cheater-resistance can be a noble strategy because the resister strain does not necessarily exploit other strains. Thus, the evolution of resisters may be instrumental in preserving cooperative behaviour in the face of cheating.

That's over 9% adjectives. A bad sample? I took the first paragraph of the text and did the same:

Dictyostelium cells propagate as unicellular amoebae in the soil. Upon starvation, they aggregate into multicellular structures and differentiate into viable spores and dead stalk cells. Stalk-cell differentiation supports spore maturation and dispersal, but this altruistic behaviour can be exploited by cheaters that make more than their fair share of spores in chimaeric fruiting bodies. The genetic potential for cheating is high and cheaters abound in nature, but cheating behaviour can be restrained by various mechanisms, such as intrinsic lower fitness of the cheater, pleiotropy of the cheater gene, high genetic relatedness in natural populations, and kin discrimination.

That's 16 adjectives in 97 words of that paragraph, or over 16%. In total, the title and abstract and opening paragraph of the first scientific paper that I picked — genuinely a random choice — are nearly 13% composed of adjectives, well over double the frequency that you find in most prose.

Now, I could check a few hundred more words, of course. But wait: why me? Why am I doing the work for her? What am I, an unpaid assistant curate of St. Bene't's? Did the Rev. Tilby do even as much elementary checking as I have done so far — glancing at a couple of hundred words in a random paper — before spouting her ridiculous remark? Of course not. Her method is a time-honored one in amateur writing on language: she just makes stuff up. On the basis of nothing but prejudice about science, she invented her data and went straight to the microphone with it.

Her suggestion that in science the adjectives are "precise" is further evidence of uninformed stereotyping. There's nothing precise about the meanings of words like cooperative, social, viable, altruistic, fair, lower, high, natural… These are vague terms, in the classic technical sense: in any situation there will be clear cases for their application, but also a border area where the appropriacy of applying them is in doubt.

There is of course nothing wrong with vague terms, with denotations partly set through common sense and reference to context; we use them literally every minute that we speak or write. Their logic and semantics can be studied with ruthless precision (see, for an example of the technical literature, Stewart Shapiro's lucid and masterful Vagueness in Context). Science is replete with them, and has to be. (Think of global warming, for heaven's sake: there's a truly vague concept. How warm? How global? Yet it's an important one, and serious science is being done every day to flesh it out and give it clearer content.)

It is merely one more sign of the the Rev. Tilby's contempt for truth, and cluelessness about science, that she thinks science is all precision. Scientists live their lives floating in a probabilistic soup of uncertainty and unclarity, murky associations and ill-defined tendencies, statistical degrees and extents.

Tilby sees herself as a minister of religion and thus a professional talker; and she therefore assumes (the crucial fallacy) that she is an expert on language; so she doesn't need to check a thing. As a vicar, she thinks she can go into the Radio 4 studio and simply invent her facts.

It is not that she lied; it is worse than that. Tilby didn't know what the facts about adjectives in scientific prose were, and state untruths about them to mislead her audience: she simply didn't care whether she was uttering untruths or not. It wasn't lies; it was bullshit, in the sense defined by Harry Frankfurt. And as Frankfurt notes, the purveyor of bullshit is worse than a liar, in virtue of caring less about truth. The liar at least keeps track of what's true and recognizes its special status. (That is precisely why it is a tangled web we weave when first we practice to deceive: the committed liar has to attempt to remain consistent.)

God speed the plow

Sun, 10/04/2009 - 4:06am

A recent xkcd:


In the case of power, the original ordinary-language meaning is still dominant for most people, especially in a frame like "With great __ comes great __". But it's easy to forget how recently words (and concepts) like speed, distance, and duration took on their current "literal" meanings as aspects of ordinary-language physics rather than as terms referring to prosperity, dissension, endurance, and so on.

The physicists' sense of power as "work per unit time" seems to date from the early 19th century, and the specifically electrical sense, featured in this strip's caption, is somewhat later. But it was only a century or two earlier that today's meanings for words like distance came into general use, replacing earlier meanings that (like power) had more to do with personal struggle than with physical interaction.

A deeply flawed character

Sat, 10/03/2009 - 2:03pm

When phrases are coordinated, readers infer that the the juxtaposed elements are in some way parallel. Careless coordination produces unwanted inferences. Today's Daily Beast serves up an object lesson:

Stunned colleagues Friday described veteran CBS News producer Joe Halderman—who was arrested outside the network’s West 57th Street offices Thursday in the alleged scheme to blackmail David Letterman—as a rogue and a womanizer, a lover of literature, a “smart frat boy,” a swashbuckling journalist, and an occasional barroom brawler who distinguished himself in dangerous war zones and occasionally displayed a certain reckless streak.

Fucking literature lovers.