news aggregator

Damn speech synthesizer

Language Log - Wed, 2009-08-12 13:02

It is truly almost beyond belief that the Investor's Business Daily could say in an editorial (which after much ribald mockery on the blogs they have now altered):

People such as scientist Stephen Hawking wouldn't have a chance in the U.K., where the National Health Service would say the life of this brilliant man, because of his physical handicaps, is essentially worthless.

The minor issue for me is the fact that the NHS does not have what the Republicans allege are "death panels" that judge whether an individual's life is worth living. (There is a panel that decides if a drug is too expensive relative to the increase in length and quality of life it provides for — there's a limit to the quantities of public money the NHS will spend on supplying expensive drugs for free when they don't do much good. But that's not about judging individuals' lives.) No, the real kicker is that the journalists at IBD didn't even know that Stephen Hawking (long-time holder of the chair that Isaac Newton once held at the University of Cambridge) is a British physicist, and has lived his whole life in Britain! His motor neurone disease has been constantly and expertly treated under the NHS and he has received constant nursing care (he says, "I wouldn't be here today if it were not for the NHS. I have received a large amount of high-quality treatment without which I would not have survived"). It's a linguistic issue, of course: it's that damn speech synthesizer Hawking uses. The people at IBD have heard it speak his words, but they couldn't tell from its odd and mildly Swedish-flavored enunciation that he is not an American (and that would be the default assumption for anyone brilliant, naturally). Britain's speech scientists need to work on that synthesizer and get it talking more like Prince Charles. It looks like Americans' hopes of a reform of their broken health insurance system are going to depend on such things. If it is left up to IBD, and the sort of people who think Medicare is going to be taken over by the gov'ment, all hopes of reform are doomed.

Stupid canine lexical acquisition claims

Language Log - Wed, 2009-08-12 04:25

Dogs as intelligent as two-year-old children, says a headline in the Daily Telegraph, a newspaper that is marketed to people of a conservative disposition and their dogs. And in case you did not quite understand the headline, they say it again in the subhead: "Dogs are as intelligent as the average two-year-old child, according to research by animal psychologists." It is bylined "By Richard Gray, Science Correspondent". (Science Correspondent! He almost certainly has a Master's degree, possibly in Science!)

Research conducted at Language Log Plaza has shown a somewhat different result. Dogs are not as bright linguistically as a human two-year-old. But what is true is that dogs have the same general intelligence and ability to detect bullshit as the average Science Correspondent for the Daily Telegraph or BBC News.

The details Mr. Gray reports are that "Researchers have found that dogs are capable of understanding up to 250 words and gestures, can count up to five and can perform simple mathematical calculations." And in case you (or your dog) did not quite understand that, it is repeated in more detail for you:

"The average dog is about as bright linguistically as a human two-year-old," said Professor Stanley Coren, a leading expert on canine intelligence at the University of British Columbia in Vancouver who has carried out the work.

"This means they can understand about 165 words, signs and signals. Those in the top 20 per cent were able to understand as many as 250 words and signals, which is about the same as a two and a half year old.

"Obviously we are not going to be able to sit down and have a conversation with a dog, but like a two-year-old, they show that they can understand words and gestures."

The evidence of understanding words comes from experiments in which a border collie was trained to go and fetch a ball when "Ball!" was shouted at it, and so on for other medium-sized fetchables. (Ah, border collies. Long-time Language Log readers will recall that we have been here before.)

And the evidence of mathematical calculations was that a trained border collie can work out the square root of an arbitrary integer written on a chalkboard, to an accuracy of at least three decimal places.

No it wasn't. I said that just to see if you were paying attention. If you're a Daily Telegraph reader you probably believed me. The evidence was from differential gaze experiments: if you drop three doggie treats behind a screen and then surreptitiously remove one or two before shifting the screen, a dog looks a little bit longer at the remaining treat or treats than it does when all three are still there like they should have been. They are capable of noticing, in other words, when slightly weird shit is going down.

If this all satisfies you — if you now think border collies can understand the meaning of lexical items and do mental arithmetic — then Professor Coren has won his game of spoof the public. But it has left me wondering whether I will be reading stories about lexical item acquisition in dogs and other stupid fake pet communication tricks until my dying day, or whether one day we will wake up on a bright new morning and Science Correspondents will have realized that they don't have to just paraphrase the press release put out at the APA convention, they can ask a few penetrating questions about what it means to understand the meaning of a word. (Like, could a dog understand an adverb, such as "surreptitiously"? Why is it always nouns and verbs triggering trained physical actions like fetching? I understand the noun "turd", but if you say it to me I don't run to try and find one.)

[Hat tip: Brian Davies.]

(I have left comments open below, but if you are a dog, please say so. On the Internet, nobody knows these things.)

Transletteration

Language Log - Tue, 2009-08-11 11:44

A friend in Taiwan sent me the following inquiry:

===
From an article in the NYTimes:

"Early Thursday, the attackers sent out a wave of spam under the name Cyxymu, which is a Latin transliteration of the Cyrillic name of the capital of Abkhazia, Sukhumi."

By which is meant that Latin Cyxymu is a "transliteration" of Cyrillic  Сухуми (in italics С у х у м u ) .

I think that this is an improper use of the word "transliteration" (to refer to "Sukhumi" as a transliteration of Cyxymu, however, would be correct), but I don't know what to call this rendering of Cyrillic Cyxymu as Latin "Cyxymu".


When I asked my LLogger colleagues their opinions, I received the following suggestions:

Arnold Zwicky:  maybe it can be called "cross-alphabet transfer"

David Beaver:  transletteration

Benjamin Zimmer:  "Volapuk encoding," via The Tensor

And Mark Swofford weighed in with this:  There are probably discussions out there already on how the Russian word for restaurant appears (when written in all caps: "PECTOPAH" — sorry, not typed in real Cyrillic) to those who know the Latin but not Cyrillic alphabet as if it were a word pronounced "pectopah". Of course, that's just coincidence, not any sort of intentional letter play (at least on the part of the Russians).

===
This reminds me of a couple of faux amis that I have encountered in Pinyin Mandarin:

1. A Chinese friend of mine saw a sign near our home that depicted the tracks of a railroad followed by this word XING ("crossing"), which he read as 行  ("go")!

2. When I first went to Beijing in 1981, some naughty female clerks in a curio shop at the Temple of Heaven tried to embarrass me by showing me the word FUXING written on a piece of paper and asking me what it meant in English.  I played dumb and insisted that it was a Chinese word (復興) meaning "rebirth, renaissance, resurgence," etc.

My examples are not the same as the Cyxymu or Volapuk phenomenon, but all of these things seem like faux amis to me.

Fry's English Delight: So Wrong It's Right

Language Log - Tue, 2009-08-11 05:31

Stephen Fry — British comedian, quiz show host, and public intellectual — has just started a new series of his BBC Radio 4 program on the English language, "Fry's English Delight." In "So Wrong It's Right," Fry "examines how 'wrong' English can become right English." Our old friend the eggcorn makes an appearance about 11 minutes in. Jeremy Butterfield, author of A Damp Squid: The English Language Laid Bare, explains eggcorns to Fry (damp squid is an eggcornization of damp squib, in case you didn't know). Butterfield also talks about spelling changes, like the back-formation of pea(s) from pease, and how lexicographers use corpora to track changes in language (with specific reference to the Oxford English Corpus, the main subject of A Damp Squid).

You can hear the whole thing online, at least for the next week.

And for more of Fry's linguistic musings, see my post, "Fry on the pleasure of language."

(Hat tip, Damien Hall.)

Speech science in social psychology

Language Log - Tue, 2009-08-11 04:34

In response to yesterday's post on "Linguistic analysis in social science", my old Bell Labs colleague Bob Krauss wrote that

There may be more language-related research being done in social psychology than you're aware of.   Attached is a chapter Jen Pardo and I contributed to a book about connections between social psych and other disciplines.

I was glad to see the chapter, which was published a few years ago as Robert M. Krauss and Jennifer S. Pardo, "Speaker Perception and Social Behavior: Bridging Social Psychology and Speech Science", pp. 273-278 in Paul A.M. Van Lange (Ed.), Bridging Social Psychology: Benefits of Transdisciplinary Approaches, 2006. But reading this chapter, and skimming the rest of the book, confirmed my view that at present, there is remarkably little language-based research in the social sciences.

Their (persuasive) abstract:

Language plays a critical role in social life, and the semantic-pragmatic levels of linguistic analysis has become an important research  focus in social psychology. Considerably less attention has been paid to the organized sound system that underlies speech.  We distinguish between speech perception, which studies the processes underlying comprehension of the linguistic content of speech, and speaker perception, which examines effects of variability in speech that is not linguistically significant. Much of the latter deals with phenomena that lie at the heart of social psychology.  We describe two broad research areas that illustrate the insights consideration of the phonological level of speech can contribute to an understanding of social behavior.

Their (inspiring) conclusion:

In this brief essay, we have argued that the sound structure of speech contains information that can contribute importantly to our understanding of social behavior.

Speaker perception studies the way a particular utterance reflects a speaker's identity and internal state, and his/her definition of the situation. The variability that these factors produce can be studied both as a dependent variable and an independent variable. That is to say, we can examine the effects on voice of inductions involving activated identities, internal state or situational definitions; we also can examine how variability in features of voice (either natural or synthetically created) affect listeners' perceptions of the speaker and the semantic content of the utterance.

I hadn't seen their chapter before, but I know most of the works in their bibliography. And in keeping with the theme of my earlier post, I'll predict that as digital audio archives become more and more available, and computational analysis and synthesis of "voice features" becomes more and more accessible, this kind of work should logically increase in prominence.

After all, when Bob and I overlapped at Bell Labs 30-odd years ago, if you wanted (for example) to compare the "activated identities" of politicians as revealed in their speeches or press conferences, you'd need to buy audio tapes from media companies, have them physically shipped to you, and then digitize and analyze them using million-dollar minicomputers. ("In the snow, uphill, both ways.") Today you can download the audio for free over the internet,do acoustic analysis and synthesis on your laptop, and run perception experiments over the net as well.

But despite Bob's 30-odd years of evangelism from positions of well-deserved authority and respect, and despite the fact that various forms of speech-and-language-based research  are becoming easier and easier, it's still not very common for social psychologists to do the sort of  research that he persuasively recommends.

The Krauss and Pardo chapter is 5 pages in a 489-page book focusing broadly on interdisciplinary applications of social psychology — and I was unable to find any other discussion in the book of research based on linguistic analysis of any sort. ("Conversation analysis" and "discourse analysis" are mentioned once each, in a parenthetical and essentially contentless sort of way.) This sampled proportion (5/489 = 1%) strikes me as a plausible estimate for the field as a whole.

Again, I predict that this is certain to change, as social-science researchers (and social psychologists in particular) respond to their changing environment. But the cultural conservatism of the academy means that it's going to take a while.

Tangled phrases or straight-out lies?

Language Log - Mon, 2009-08-10 21:41

About a week ago, Arthur Laffer said the following on CNN:

I mean, i- i- i- if you like the Post Office and the Department of Motor Vehicles, and you think they're run well, just wait till you see Medicare, Medicaid, and health care, done by the government.

Dylan Matthews at The Treatment ("Now Don't You Let The Government Get A Hold Of My Medicare", 8/4/2009) compared this to an earlier example of conservative pandering to public ignorance:

[Senator John Breaux] was walking through the New Orleans airport, returning home, when an elderly female constituent approached him. "Senator, Senator," she said, plucking emotionally at his sleeve. "Now don't you let the government get a hold of my Medicare." Breaux, ever the charmer, smiled and said reassuringly of this greatest of government entitlement programs, "Oh, no, we won't let the government touch your Medicare."

And Matthews commented, "I don't believe I have to explain what this says about the Republican economic policy elite" — which is a bit confusing, because John Breaux was a Democratic senator.

Ramesh Ponnuru at NRO ("Re: Hands Off Medicare", 8/5/2009) compounded the confusion by making an implausible linguistic argument in Laffer's defense:

I think this is a simple misunderstanding. Laffer seems to me to be saying that Medicare and Medicaid are not run well, and neither will health care in general when the government expands its role in it. "Done by the government," that is, modifies only "health care," not "Medicare, Medicaid, and health care."

This makes no syntactic, semantic, rhetorical, or phonetic sense.

Laffer is using a common rhetorical pattern of the form

If you like X, just wait till you see Y.

This can be used straight ("If you like the single, just wait til you see the video") or ironically ("Government: If you like the problems we cause, just wait 'til you see our solutions!") But in either case, the author suggests that X is viewed by many as good (or bad, in the ironic case), and predicts that if you're one of those that share that view, then Y will turn out to be even better (or even worse, in the ironic case). This implies that you're not already familiar with Y, or that Y will change in some way that will affect your evalution.

Laffer's statement is clearly an instance of this rhetorical template being used ironically. When he says "If you like X", for X = "the Post Office and the Department of Motor Vehicles", he's suggesting that many people view the Post Office and the Department of Motor Vehicles as bad — and indeed these are standard objects of right-wing scorn, disdained as bloated, inefficient and unhelpful bureaucracies. And thus in his next clause, "just wait till you see Y", for Y = "Medicare, Medicaid, and health care, done by the government", he's predicting that Y will turn out to be even worse than those icons of awfulness.

But Medicare, Medicaid, and health care already exist, as Laffer's hearers know well. So why is he making a prediction about our future perceptions of their quality? The only sensible construal is that these programs will change their nature when the government takes them over. Which in turn implies the absurd (but apparently widely-held) view that Medicare and Medicaid are not now government programs.

Ponnuru's proposed parse doesn't work phonetically either. If we eliminate punctuation from our transcript of the crucial clause, and annotate pause durations, we get:

just wait till you see [0.194]
Medicare Medicaid and health care [0.285]
done by the government.

In fluent speech, we expect the duration of pauses between words to correlate with the syntactic and semantic importance of the juncture. On that basis, if we place two pauses within a structure like

[just wait till you see
…[ Medicare Medicaid
……[and health care done by the government]]]

we'd expect them to fall after see and Medicaid — which is not what happened.

In contrast, if the structure were

[just wait till you see
…[[ Medicare Medicaid and health care]
…. [done by the government]]]

we'd expect the two pauses to occur after see and health care — which is exactly what happened.

But wait, there's more! The pitch contour is also exactly what we'd expect if [Medicare Medicaid and health care] were a constituent, with [done by the government] as a following modifier — but it makes no sense under Mr. Ponnuru's prefered construal. Listen again, and look (click on the image for a larger version):

In conclusion, I can see three ways to explain this:

(1) Dr. Laffer believes that Medicare and Medicaid are not now government programs, and would be changed for the worse if the government took them over. If this is true, he should educate himself, modify his views, and apologize to those that he may have misled. And CNN should be ashamed of itself for giving air time to an "expert" who is so badly informed about the topic he's asked to comment on.

(2) Many older Americans have a high opinion of Medicare and Medicaid, believe (counterfactually) that these programs are not now run by the government, and worry about the government "taking them over" and ruining them. Dr. Laffer knows that this belief is preposterously at variance with the truth, but chose to pander to it anyhow. If so, no self-respecting news organization should ever give him a platform again, except perhaps to give him the opportunity to apologize for being a lying weasel.

(3) Dr. Laffer because confused, got his clauses tangled up, and by mistake seemed to say something that he didn't believe. If this is true, then he should take the next opportunity to go on national television to clarify his views and apologize to those he may have misled.

Timothy Noah at Slate takes (2) for granted, which leads him to observe that "If there is a hell for libertarian poseurs, Laffer has secured himself a berth in it."

Paul Krugman comes down at about 2.2:

… if he was garbling his words, there was method in his garble. Right now, right-wingers do not, repeat, do not want people to understand that Medicare is the prime example of that dreaded condition, “government-run health care”; because if people understood that, they might think that government-run care is actually pretty good. So we don’t need to worry about what Laffer really meant; what he said was the party line, which is, “don’t let the government get its hands on Medicare.”

My own impression is that the error bars run from about 1.8 to 2.2. There's some more evidence from Dr. Laffer's own mouth: his next statement is "… I mean, the single provider, I think, is a real problem, Judy, …", which suggests that he is either deeply ignorant or shamelessly deceitful, since no single-provider plan is even remotely under consideration.

But mostly I'm shocked (shocked!) that the CNN anchordroid didn't pick up on Laffer's idiocy/dishonesty/gaffe during the broadcast:

[Note, by the way, that Paul Krugman and Nate Silver are not convinced that the DMV and the Post Office should be "unquestioned bywords for 'something bad'", and apparently most Americans agree with them, at least as far as the postal service is concerned.]

The first LOLcat?

Language Log - Mon, 2009-08-10 07:29

From YouRememberThat.com, a 1905 postcard that may be the oldest extant LOLcat:


The source suggests:

Perhaps soon, archeologists will discover an even older LOLcat on the walls of an Egyptian tomb… perhaps a cat with the caption, "I see what you did there!"

But it seems to me that if there are Egyptian (or Sumerian, or Mayan, or Mycenean, or Etruscan, or …) LOLcats, they've probably already been dug up and are now lying on some dusty museum shelf, waiting to be revealed to the world by a scholar who reads Language Log.

[Hat tip: Randy Alexander]

[Update — commenters (below) have informed us that this appeared last December on icanhascheezburger.com, and that this was one of a large number of similar proto-LOLcats produced by Harry Whittier Frees (1879-1953), a sample of which can be found here.]

What is "I" saying?

Language Log - Sun, 2009-08-09 20:47

Over the past couple of months, there's been a surge of media interest in various politicians' pronoun use. For some of the Language Log coverage, with links to articles by George F. Will, Stanley Fish, and Peggy Noonan (among others), see "Fact-checking George F. Will" (6/7/2009);  "Obama's Imperial 'I': spreading the meme" (6/8/2009); "Inaugural pronouns" (6/8/2009); "Another pack member heard from" (6/9/2009); "I again" (7/13/2009); "'I' is a camera" (7/18/2009).

In a comment on one of those posts, Karl Hagen asked:

Other than gut instinct, what's the evidence for assuming that greater use of first-person pronouns actually indicates excessive ego involvement? The absolute rate of first-person pronouns will obviously vary a lot depending on the context, but even controlling for context, is it really the case that those who say I more often are really more ego-involved?

I responded:

The best person to comment on this is Jamie Pennebaker. Pending his contribution, I'll quote relevant observations from a summary page on his web site

Prof. Pennebaker has graciously contributed a guest post on the meaning of "I", which follows.

In the last few months, a number of pundits have been analyzing the language of Barack Obama in an attempt to uncover who he really is.  The words that are attracting the most attention is his use of first person singular pronouns, or I-words.  As Mark Liberman and many others have noted, surprisingly few people have actually counted Obama’s use of 1st person singular pronouns and even fewer have stopped to think what “I” means.

Before reading any further, it might be best if you took a quick 10-item “I-Exam.”  This is a very brief quiz about who uses 1st person singular pronouns more than others.  I’m serious, go to www.utpsyc.org/itest, take the test, check out your feedback.  Then come back.

************

Welcome back.  You should now have a better sense of the social and psychological meaning of I-words.

A little bit of background might be helpful.  Not surprisingly, first-person singular (FPS) pronouns are used at very high rates in everyday speech.  Across thousands of natural conversations that we have recorded, transcribed, and analyzed, the word “I” is consistently the most frequently used word (averaging 4.73% of all words, compared with 0.56% “me” and 0.69% “my”).

A data set that my students and I have been relying on in the study of Bush, Obama, and others comes from press conferences or press opportunies wherein the person responds to questions posed by the press or, in some occasions, legislators or interested citizens.  In preparing the press conference texts, we strip out prepared remarks that typically occur at the beginning as well as the actual questions.  As presidents, both Obama and Bush have answered questions from the press approximately once per week.

Consistent with Liberman’s analyses of Obama’s and Bush’s inaugural address and other important speeches, Obama uses FPS pronouns at much lower rates than Bush. During the first 6 months of their presidencies, FPS pronouns accounted for 4.35% of Bush’s words and 2.88% of Obama’s.  Bush was significantly higher for all FPS words.

What do I-words mean?

From a psychological perspective, the use of FPS can reflect a number of overlapping processes.

The attention rule. Pronouns can be thought of as markers of attentional focus.  If the speaker is thinking and talking about a friend, expect high rates of third person singular pronouns.  If worried about communists, right wing radio hosts, or university administrators, words such as “they” and “them” will be higher than average.

The word “I” is no different.  If people are self-conscious, their attention flips to themselves briefly but at higher rates than people who are not self-conscious.  For example, people use the word “I” more when completing a questionnaire in front of a mirror than if no mirror is present.  If their attention is drawn to themselves because they are sick, feeling pain, or deeply depressed, they use “I” more.  And, by the same token, if they are deeply immersed in a task, FPS can drop to almost zero.

A common misperception is that I-use is associated with arrogance and dominance.  Studies consistently find the opposite: people higher in the social hierarchy use “I” words less.  The secure boss is surveying her or his kingdom calculating how to get more goodies.  The insecure underlings are trying to control their behaviors so as not to offend the leader.

The ownership rule. Use of FPS can also serve as a territorial marker.  If people want to emphasize their connection with their topic, they may increase their use of I-words.  The use of I-words, then, links their connection of self to their conversational target.  By the same token, people occasionally distance themselves from a target.  Across multiple studies on deceptive communication, the best predictor of lying is a drop in the use of I-words.  For example, if an administrator is asked why large sums of money were spent for office décor, an open speaker might say, “I felt that the funds were being used in the ways the donors had requested.”  The more deceptive might say, “It was felt that the funds…”

The graceful-I versus the sledgehammer-I. Not all I-words are alike. The graceful-I, often associated with the use of hedges, is one where the person is subtly acknowledging multiple perspectives.  Phrases starting with “I think that..”, “I wonder if..”, or “It seems to me that..” are all examples where the person is implicitly or politely making a request or an observation.  “I think it’s cold outside” (as opposed to “It’s cold outside”) is actually saying “I know that there are many views on this matter and I may, in fact, be wrong.  Indeed, when you go outside you might find it a bit warm but I personally felt that it was a bit cold outside. But I don’t want to intrude.”

One can imagine a continuum of I-phrases that range from the ultra-polite graceful-I’s to more reportorial phrases (e.g., “I saw”, “I heard”) to egotistical, controlling sledgehammer I’s.  Sledgehammer-I’s are typically associated with action verbs such as hit, won, stop, or push.  Intuitively, the person who uses graceful-I’s is a person who is trying to withdraw from the action, attempting to be smaller from the listener’s perspective.  The sledgehammer user, on the other hand, is attempting to expand in the psychological environment.

Based on informal counts within natural conversations, the rate of graceful-I’s is quite high whereas the sledgehammer-I’s are low.  Perhaps because of these base rates, we tend to hear and to remember sledgehammer statements more than graceful ones. Indeed, this is why most people’s stereotypes of I-usage are wrong:  they are based on the relatively infrequent sledgehammer-I users.

A good example of the different uses of I-words occurred in the 2004 presidential election. According to a New York Times article at the time, John Kerry’s advisors were working with him to change the way that he spoke.  They felt he used the word “I” too much and that he should use the more inclusive “we” at higher rates.  Without going into detail here, use of “we” in the political world is a reliable marker of being cold, distant, and arrogant.  Our analyses of the speeches, debates, and press conferences up to that time showed Kerry to use I-words at rates far below his opponents (especially Bush) and his We-words far above the others.

Kerry’s handlers did not understand personal pronouns and, in particular, the important distinction between graceful and sledgehammer I-use.  To appreciate the difference in I-use, look at 10 randomly selected uses of I from the third Bush-Kerry presidential debate in 2004:

Kerry:
I have supported or voted for tax cuts over 600 times.
I broke with my party in order to balance the budget
I voted for IRA tax cuts.
When I'm president, I'm sending that back to Congress
I'll make them secure.
I believe it was a failure of presidential leadership
I am a hunter, I'm a gun owner.
I ran one of the largest district attorney's offices in America
I put people behind bars for the rest of their life.
I've broken up organized crime; I know something about prosecuting.
I was hunting in Iowa last year

Bush:
Actually, I made my intentions — made my views clear.
I did think we ought to extend the assault weapons ban
I believe law-abiding citizens ought to be able to own a gun.
I called the attorney general and the U.S. attorneys and said…
And the prosecutions are up by about 68 percent — I believe — is the number.
To me, that's the best way to secure America.
Mitch McConnell had a minimum-wage plan that I supported
But let me talk about what's really important for the worker
I remember a lady in Houston, Texas, told me
I haven't gotten a flu shot, and I don't intend to because I want to make sure those who are most vulnerable get treated.

Although Bush used the word “I” at much higher rates in the debates and other interactions, virtually everyone assumed that Kerry was the big FPS user.  The misperception is attributable to how the two men used their pronouns.  Bush was the graceful user and Kerry the sledgehammer.  Whereas Bush thought, believed, remembered, intended, talked, and called, Kerry was busy voting, breaking, sending, running, hunting, and making people secure. Oh, and he was a gun owner.

At this point, we don’t yet have a trustworthy psychological profile of the sledgehammer-I user.  An educated guess, however, would be that people adopt sledgehammer pronoun use in settings where they are insecure but want to come across as active, powerful, and confident.  The element of insecurity is central to this framework in that the sledgehammer user simultaneously is emphasizing that an important action was taken but, at the same time, I ME MYSELF caused that action.  It is important that you the listeners know how important I ME MYSELF am.

Who is Obama and what can his I’s tell us?

In an interview on NPR’s Weekend Edition on August 8, 2009, Dan Balz and Haynes Johnson were asked about their recent book, The Battle for America 2008: The Story of an Extraordinary Election.  Johnson, looking back over his distinguished political reporting career studying presidents since Eisenhower, noted that Obama is “the single most self-confident of all the presidents” he has ever seen.

Obama’s use of pronouns supports Johnson’s view.  Since his election, Obama has remained consistent in using relatively few I-words compared to other modern U.S. presidents.  His usage is overwhelmingly gentle-I as opposed to sledgehammer-I.  Contrary to pronouncements by various media experts, Obama is neither “inordinately fond” of FPS (George Will, Washington Post, 6/7/2009) nor exhibiting “the full emergence of a note of … imperial possession” (Stanley Fish, NYT, 6/7/2009).  Instead, Obama’s language suggests self-assurance and, at the same time, an emotional distance.

References

(Other relevant papers can be downloaded from my Publications page)

Chung, C.K., & Pennebaker, J.W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social communication (pp. 343-359). New York: Psychology Press.
Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J.  (2007).  The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX: LIWC.net
Pennebaker, J.W., & Lay, T.C.  (2002).  Language use and personality during crises:  Analyses of Mayor Rudolph Giuliani’s press conferencesJournal of Research in Personality, 36, 271-282.
Pennebaker, J.W., Mehl, M.R., & Niederhoffer, K.G.  (2003).  Psychological aspects of natural language use:  Our words, our selvesAnnual Review of Psychology, 54, 547-577.
Tausczik, Y.R., & Pennebaker, J.W. (in press).  The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology.

[Above is a guest post by Prof. James W. Pennebaker.]

Linguistic analysis in social science

Language Log - Sun, 2009-08-09 11:45

It's a strange fact about social scientists that hardly any of them, in recent years, have paid any analytic attention to language, which is the main medium of human social interaction.  At schools of "communication", you'll generally find that neither the curriculum nor the faculty's research publications feature much if any analysis of speech and language. In other disciplines — sociology, social psychology, economics, history — you'll find even less of it. (The main systematic exception, Linguistic Anthropology, deserves a separate discussion — but the conclusion of such a discussion, I believe, would note a steep decline in empirical linguistic analysis. And of course I'm leaving out sociolinguistics, which is healthy enough but largely alienated from the rest of the social sciences.)

There are notable exceptions of several kinds, such as Erving Goffman, Manny Schegloff, or Jamie Pennebaker. But such work emphasizes the paradox, since it shows that we can't blame the effect on a lack of intellectual opportunity.

It's not only in the social sciences where linguistic anemia is evident, of course. Over the past generation, the amount of language-related teaching and research in "language departments" (including departments of English) has declined to an unprecedented level. It's common to find highly-ranked English departments where neither undergraduates nor graduate students are trained in any sort of linguistic analysis at all, except perhaps by accident (see this earlier post for a more specific discussion).

But climate change is coming, in my opinion. And in this case, the driving force is not carbon emissions, but digital technology.

To state the obvious: Traditional mass media are now nearly all digital; new media are documenting (and creating) social interactions at extraordinary scale and depth; more and more historical records are available in digital form.  The digital shadow-universe is a more and more complete proxy for the real one. And in the areas that matter to the social sciences, much of the content of this digital universe exists in the form of digital text and speech.

A future social scientist who wants to use this proxy universe to learn about the real one had therefore better know how to analyze the form and meaning of large digital archives of text and speech. And future social scientists who choose not to do this will work under a significant competitive disadvantage. (Numerical data, video recordings, and various kinds of relationship graphs are of course important too, but without analysis of speech and text, their value is lower.)

The required tools include a good deal of computer science and statistics, but you also need to know what to program and what to model.  As a result, the basic concepts and skills of speech and text analysis are an important part of the future social science tool kit.

There's an increasing amount of research along these lines, mostly by computer scientists and computational linguists, along with a few rogue social scientists like Jamie Pennebaker. We've blogged about quite a few examples over the years. But I suspect that most social scientists don't see most of this stuff, because it appears in conference proceedings and journals that they don't read.

All the same, change is sure to come. I predict that over the next 20 years or so, this work will go mainstream. (I know that 20 years in internet time is a millennium or two, but Academia is culturally conservative to a degree that would turn Pashtun village elders green with envy.)

One symptom (and cause) of corpus-based social science going mainstream is that individual pieces of research will increasingly break out into the old media (or go viral in new media). This happened a few days ago to Peter Sheridan Dodds and Christopher M. Danforth, whose paper "Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents" (Journal of Happiness Studies, published online 7/17/2009) was covered in the New York Times (Benedict Carey, "Does a Nation's Mood Lurk in Its Songs and Blogs?", 8/3/2009).

Here's the paper's abstract:

The importance of quantifying the nature and intensity of emotional states at the level of populations is evident: we would like to know how, when, and why individuals feel as they do if we wish, for example, to better construct public policy, build more successful organizations, and, from a scientific perspective, more fully understand economic and social phenomena. Here, by incorporating direct human assessment of words, we quantify happiness levels on a continuous scale for a diverse set of large-scale texts: song titles and lyrics, weblogs, and State of the Union addresses. Our method is transparent, improvable, capable of rapidly processing Web-scale texts, and moves beyond approaches based on coarse categorization. Among a number of observations, we find that the happiness of song lyrics trends downward from the 1960s to the mid 1990s while remaining stable within genres, and that the happiness of blogs has steadily increased from 2005 to 2009, exhibiting a striking rise and fall with blogger age and distance from the Earth’s equator.

Here's the figure showing the secular trend in song-lyric happiness:

Here's the figure showing the recent trend in emotional valence estimated from aspects of blog posts:

And finally, the effects of age, latitude, and day of the week (phase of the moon is not pictured):

Like most work of this type, the linguistic analysis involved is pretty simple — but it's still more than you'll now find in the collected works of the faculty of the communications schools that I've looked at.

And you could raise various questions about their methods and their conclusions, as always in science (though the work seems basically sound to me). But the nice thing about this kind of research is that all of their data is published — their paper gives the URLs that they got it from. (In fact, they doubtless undertook this study in large part because the basic data is easily available.) And they could easily publish their code as well (though the algorithms seem simple and easy to replicate).

So if you have an idea about how to qualify, modify or extend their findings, go to it!

[I'll note in passing that linguistics was left out of the publicity in this case: thus the NYT article quotes Prof. Pennebaker to the effect that “The new approach that these researchers are taking is part of movement that is really exciting, a cross-pollination of computer science, engineering and psychology. […] And it’s going to change the social sciences; that to me is very clear.”  From Jamie's mouth to God's ear; but let's recognize that this type of work will not reach its full potential unless the researchers involved also understand something about how speech and language work.]

Too much vacuum in his head

Language Log - Sun, 2009-08-09 04:17

That's what Descartes said to Huygens about Pascal. Another Shoebox cartoon, this one by brian, gives the background:

There's some serious history behind this famous but widely misunderstood slogan, coined originally (I think) by Hero of Alexandria. Before Pascal and Descartes (and Boyle and Hobbes and a cast of thousands), there was Hero's footnote to Aristotle's "plenism".

For the background, we turn to the Stanford Encyclopedia of Philosophy's article on Nothingness:

Aristotle denied the void can explain why things move. Movement requires a mover that is pushing or pulling the object. An object in a vacuum is not in contact with anything else. If the object did move, there would be nothing to impede its motion. Therefore, any motion in a vacuum would be at an unlimited speed.

Aristotle's refutation of the void persuaded most commentators for the next 1500 years. There were two limited dissenters to his thesis that vacuums are impossible. The Stoics agreed that terrestrial vacuums are impossible but believed there must be a void surrounding the cosmos. Hero of Alexandria agreed that there are no naturally occurring vacuums but believed that they can be formed artificially. He cites pumps and siphons as evidence that voids can be created. Hero believed that bodies have a natural horror of vacuums and struggle to prevent their formation. You can feel the antipathy by trying to open a bellows that has had its air hole plugged. Try as you might, you cannot separate the sides. However, unlike Aristotle, Hero thought that if you and the bellows were tremendously strong, you could separate the sides and create a vacuum.

Hero's views became more discussed after the Church's anti-Aristotelian condemnation of 1277 which required Christian scholars to allow for the possibility of a vacuum. […]

… scholars writing in the aftermath of the condemnation of 1277 proposed various recipes for creating vacuums. One scheme was to freeze a sphere filled with water. After the water contracted into ice, a vacuum would form at the top.

Aristotelians replied that the sphere would bend at its weakest point. When the vacuists stipulated that the sphere was perfect, the rejoinder was that this would simply prevent the water from turning into ice.

Neither side appears to have tried out the recipe. If either had, then they would have discovered that freezing water expands rather than contracts. […]

Hero was eventually refuted by experiments with barometers conducted by Evangelista Torricelli and Blaise Pascal. Their barometer consisted of a tube partially submerged, upside down in a bowl of mercury. What keeps the mercury suspended in the tube? Is there an unnatural vacuum that causes the surrounding glass to pull the liquid up? Or is there no vacuum at all but rather some rarefied and invisible matter in the “empty space”? Pascal answered that there really was nothing holding up the mercury. The mercury rises and falls due to variations in the weight of the atmosphere. The mercury is being pushed up the tube, not pulled up by anything.

When Pascal offered this explanation to the plenist Descartes, Descartes wrote Christian Huygens that Pascal had too much vacuum in his head. Descartes identified bodies with extension and so had no room for vacuums.

Descartes was not the only 17th-century wit to make this joke. Thus Thomas Pecke, "Upon Marcus", 1659:

Why durst you offer Marcus to aver
Nature abhorr'd a vacuum ? confer
But with your empty skull, then you'll agree
Nature will suffer a vacuitie.

There's a more extended and more effective version of the joke in Roger Boyle, Earl of Orrery, Mr Anthony, A Comedy (which seems to have been a sort of 17th-century Breakfast Club), where Mr. Anthony and his friend Jack Plot gang up on Mr. Pedagog:

Plot: How like you this, Mr. Pedagog , have I not taught your Pupil rarely this Morning?

Anthony:  Prethee let me have my full swinge at him (for he has had his many a dismal time at me:) I say, if thou dost not conform to all the Maxims of Jack Plot, Tom Art , and my own dear self, I will peach thee at such a rate to my Sire, as shall provoke him to uncase thee out of thy Pedagogical Cassock, Condemn to the Flame, Martyrlike all thy Ferula's, Grammars, Dictionaries, Classick Authors, and Common-Place Books; nay, take thy Green Glasses out of thy Spectacles, and leave thee only thy Horn-cases to look through; by which, thou wilt be as able to read Prayers with thy Nose as with thy Eyes.

Plot: Nay, if thou dost not frisk as lustily to a single Kit, whenever thy late Pupil and my present Convert bids thee, as to 24 Violins, I will Convert thy Lictorian Bundles of Birch, which Consul-like thou hast carryed before thee, into Rods for thy own Posteriors, and have no more mercy on thy Hanches, than thou usest to have on my Friend Anthony 's, when he cannot say his Lesson, though he be the greatest Dunce of the two; only his Imbecillity, varnish'd over with a Pythagorean Gravity, passes for profound Knowledge in thy Fathers Shallow Pate; where, if there is a Vacuum in Nature, there it needs must be.

Anthony: By this hand, I long to open it, to try the Experiment.

It's not clear to me whether (or how) Mr. Anthony's author, Roger Boyle,  was related to Robert Boyle, the antagonist of Thomas Hobbes in an important controversy about the existence of vacuums and the proper conduct of scientific investigation. Anyhow, Hobbes hated Descartes, but agreed with him about vacuums.

See Shapin and Schaffer, Leviathan and the Air-Pump, 1985; but also see Noel Malcolm, Aspects of Hobbes (2002), pp. 190-196:

By the time he published De corpora in 1655, Hobbes had actually adopted … a plenist theory, denying the existence of any empty spaces — even at the atomic level — in the universe. Why did he make this change? According to Shapin and Schaffer, the fundamental reasons were political. Hobbes settled on a materialist plenism in order to exclude 'incorporeal substances', such as spirits and the human soul …, because these were the props and devices used by priestcraft to harness people's fears and thereby gain power in the state, subverting lawful authority. The difficulty with this explanation is that his attack on 'incorporeal substance' required only materialism; it did not require plenism too. […]

The real reasons for Hobbes' shift from vacuist to plenist are to be found in experimental physics […] 'It is said that one can see through the empty space […] from which it follows that the action of a light-producing body is being propagated through a vacuum (which I think is impossible).'

Though action at a distance remains unmediated by ethereal vortices, the story continues, and not just in the comics:

Historians of science wonder whether the ether that was loudly pushed out the front door of physics is quietly returning through the back door under the guise of “space”. Quantum field theory provides especially fertile area for such speculation. Particles are created with the help of energy present in “vacuums”. To say that vacuums have energy and energy is convertible into mass, is to deny that vacuums are empty. Many physicists revel in the discovery that vacuums are far from empty.

[Update — I'm very sorry to say that the remark attributed to Descartes may be a myth, or perhaps an exaggeration (In the Stanford Encyclopedia, too, and many other apparently authoritative places!) At least, according to Daniel Garber, Descartes' Metaphysical Physics, p. 142:

All Descartes ever got from Pascal was the promise of a refutation of his preferred explanation. At the end of the Expériences nouvelles, a preliminary outline of a never completed treatise on the vacuum, Pascal promised to respond to the objection "that a matter imperceptible, extraordinary, and unknown to all of the senses fills the space [above the column]," a position formulated with Descartes in mind, no doubt. Descartes seems to have received the work in good humor. Writing to Huygens on 8 December [1647], shortly after having received the Expériences nouvelles, he noted:

It appears to me that the young man who wrote this booklet has the vacuum a bit too much on his mind, and is somewhat hasty. I wish the volume he promises were already available, so that one could see his reasons, which are, if I am not mistaken, insufficiently solid for what he has undertaken to prove. (AT V 653)

It's not at all clear from this translation that D was really making a joke about P having "too much vacuum in his head" — it would be nice to see the original French (or Latin?).]

The "moist" chronicles, continued

Language Log - Sat, 2009-08-08 11:09

People's aversion to the word moist has attracted our attention for a while now (most recently in this post — see also the links in this one). Mark Peters recently wrote about the moist phenomenon for Good, quoting Language Log discussion as well as a Word Routes column I wrote for the Visual Thesaurus. And now Mark's Good column just got noticed by the folks at "Wait, Wait Don't Tell Me!" on NPR — Mark and I were quoted in their "limericks" segment (skip to about 3:00 in):


And just so it's clear that I'm not pinning moist aversion entirely on the "oi" diphthong, here is what I originally wrote:

Why does moist merit a Facebook group of haters, while hoist and joist go unnnoticed? It's more than just the sound of the word: the disliked words tend to have some basic level of ickiness… slimy stuff, bodily discharge, or other things that people would prefer not to think about. Icky words include nostril, crud, pus, and pimple. Ointment and goiter share the "oi" sound with moist: there must be something about that diphthong that gets under people's skin.

(Read the rest here.)

This guy is falling

Language Log - Sat, 2009-08-08 04:10

Jem S at Shoebox turns the Purple Haze mondegreen around:

It's been done, but (apparently) not in a cartoon.

[Hat tip: Felix Hayman]

Unspecified large number

Language Log - Fri, 2009-08-07 12:05

Some corrections to and clarifications of my posting on by the hundreds / by hundreds / by the hundred.

First, an apology for having posted about Burchfield's note on by the hundreds (which declares it to be "unidiomatic") without going back and checking the original Fowler and Gowers's edition of Fowler; instead Tim Moon and I just looked at our project files about Fowler and Gowers, and these files were (for some reason) missing coverage of the expression.

Now we know that the trail goes back to Fowler and that the "unidiomatic" label reflects Fowler's taste.

Second, an apology for rushing to post and not adding all the qualifications about searches for the three expressions at issue. Just searching for by the hundredsby hundredsby the hundred, looking for uses conveying 'in large number(s)', pulls up a fair amount of irrelevant material: instances of by the hundred 'in lots of 100′, examples like "noticed by hundreds of researchers", and so on. In my earlier posting I narrowed the search by adding the verb came, and found all three variants to be attested, but with different frequencies (with the hundreds well in front). I find all three variants acceptable, but prefer the hundreds.

Commenters tried a variety of searches (and expanded the scope to include score and dozen, which don't seem to work quite the same way as hundred and thousand). Mark Liberman noted that hits for the hundreds in books seem to be "mainly recent and/or American" — mainly but by no means exclusively. All three variants are attested for some time back and in British as well as American sources, but it's reasonable to speculate that Fowler's taste reflected British preferences of his time.

A final remark about a misunderstanding that crops up repeatedly in comments on Language Log postings about variation. When I expressed a personal preference for the variant the hundreds, some readers seem to have taken me to be disparaging the other variants, and responded by saying they found one of the other variants (in particular, the hundred) unexceptionable. But I never dissed the other variants, and indeed said quite clearly that I found all three variants acceptable. Different people have different preferences, and many people will use two or all three of the variants on different occasions.

My current guess at what's going on in these responses is that some people are implicitly subscribing to some version of the One Right Way principle, so that if one variant is allowed, other variants are disfavored, or even disallowed. But no scholar of variation or usage holds to such a principle.

The meaning of timing

Language Log - Fri, 2009-08-07 11:18

Today's Cathy:



The interpretation of timing — not within but between communications — hasn't been studied much. There's Wally Chafe's Discourse, Consciousness, Time, but I don't recall much in it about interpreting the length of gaps or silences. There's some of this sort of thing in work on computer-mediated conversation. And there's some work on intercultural differences in conversational "dead air", which I'll try to find. But it's a topic that might repay further investigation, I think, in a culture where more and more interaction is asynchronous.

[Update — The work I was thinking of, on communicative norms in certain North American Indian cultures, is summarized by John Gumperz ("Contextualization and Ideology in Intercultural Communication", in DiLuzio, Günthner and Orletti (Eds.), Culture in Communication) this way:

Conversations are often punctuated with relatively long pauses and silences. In informal gatherings, Indian people may sit or stand quietly, without speaking. If addressed, they may look away and remain silent for a relatively long time (at least from the perspective of mainstream Americans) before responding. When a person is asked a question and she has no new information to provide, nothing new to say, she is likely to give no answer. In all such cases, American Indians themselves interpret the silence as a sign of respect, a positive indication, showing that the other's remarks or questions are being given full consideration that is their due.

He cites work by Philips on the Warm Springs reservation in Oregon, and by Basso on Western Apache.

Thus Susan Philips, "Some sources of cultural variability in the regulation of talk", Language in Society 5(1): 81-95, 1976:

… Indian exchanges proceed at a slower pace than those of Anglos. […] The pauses between two different speakers' turns at talk are frequently longer than is the case in Anglo interactions. There is a tolerance for silences — silences that Anglos often rush into and fill. […]

For Anglos, answers to questions are close to obligatory … That this is not the case with Warm Springs Indians was pointed out to me by an Indian from another reservation who had married into the Warm Springs reservation. He observed wryly that it is often difficult to get an answer out of 'these old people' (and I should add that the phrase 'old people' has the connotation of respect). And he told an anecdote about posing a question that got answered a week after it was asked.

In other words, answers to questions are not obligatory. Absence of an answer merely means the floor is open, or continues to belong to the questioner. This does not mean, however, that the question will not be answered later. Nor does it mean that it ought not to be raised again, since the questions may reasonably assume his audience has had time to think about it.

This absence of a requirement for immediate response is also apparent in the handling of invitations.

Keith Basso notes the various interpretations that failure to speak can have in mainstream American culture ("'To Give up on Words': Silence in Western Apache Culture", Southwestern Journal of Anthropology, 26(3), 1970):

Although the form of silence is always the same, the function of a specific act of silence — that is, its interpretation by and effect upon other people — will vary according to the social context in which it occurs. For example, if I choose to keep silent in the chambers of a Justice of the Supreme Court, my action is likely to be interpreted as a sign of politeness or respect. On the other hand, if I refrain from speaking to an established friend or colleague, I am apt to be accused of rudeness or harboring a grudge. In one instance, my behavior is judged by others to be "correct" or "fitting"; in the other, it is criticized as being "out of line."

He goes on to list a number of situations where conversation would be nearly obligatory in mainstream American culture, on pain of rudeness or other negative interpretation, but would be optional or even strongly discouraged among the Western Apache.

I've heard descriptions similar to those of Philips and Basso from several American Indianists whose experience and judgment I trust — though there is apparently (and unsurprisingly) cultural variation among different groups, as suggested by Philips' anecdote.]

Syndicate content