news aggregator

Goo goo goo joob, coo coo ca-choo, boop-oop-a-doop

Language Log - Tue, 2009-09-08 22:11

Last week, in the comments to Mark Liberman's post on the mystifying reggae chant at the beginning of Scotty's "Draw Your Brakes," I asked:

Now that we've looked into "Ma ma se, ma ma sa, ma ma coo sa" and this one, what's the next impenetrable pop lyric/chant we should tackle?

KCinDC promptly responded:

How about "goo goo g'joob"? Is it the same as "coo coo ca-choo"?

Ask and ye shall receive. Just in time for the rollout of the Beatles remasters and the "Beatles: Rock Band" video game, my latest Word Routes column on the Visual Thesaurus takes on "goo goo goo joob" (that's how it appears in the Magical Mystery Tour lyric sheet), "coo coo ca-choo," and, for good measure, "boop-oop-a-doop."

(I'll leave it to Mark to provide the requisite study in syncopation.)

Our love was real!

Language Log - Mon, 2009-09-07 10:50

I'm in Brighton for InterSpeech 2009, but unfortunately duties in Philadelphia made it impossible for me to make it here in time to act as a human control in the 2009 Loebner Prize competion, the annual administration of the "Turing Test". As the ISCA Secretariat put it,

We are seeking volunteers to pit themselves against the entries — and prove to the judges just how human they are!

The test involves using a computer interface to chat (type messages) for 5 minutes with a judge, who does the same with the program, not knowing which is which. The judge has to determine which is the true human.


It's no accident that the next-to-last xkcd strip dealt with a version of this problem:

[Click on the image for a larger version.]

I've looked around, and asked around, but if the chat logs for this year's competition have been posted, I can't find them.

But there's actually an xkcd VK testing site (that's a reference to the Voight-Kampff machine from Do Androids Dream of Electric Sheep? and Blade Runner). For more discussion, see the xkcd blag:

I hope no hearts out there are broken, but it’s important to know these things. Bots can handle thousands of connections at once, so you don’t know who else your internet partner is chatting with. There’s nothing worse than a Turing Test coming back positive for chlamydia.

[Update — Shalom Lappin responded by email:

Sorry I missed you at Interspeech. I was one of the judges for the Loebner prize. The contest was organized locally by Philip Jackson of the University of Sussex, and he might be able to provide you with the transcripts of the interactions.

None of the judges had any difficulty in distinguishing human from non-human interlocutors after the first or second turn in the conversation. The two main features which allowed me to identify a human vs. a non-human agent are (i) capacity for fluent domain general discourse marked by frequent and unpredictable changes in topic, (ii) willingness to allow the judge to take over the conversation, (iii) capacity to handle ellipsis, pronouns, and non-sentential fragments, and (iv) typing errors and corrections in human but not program contributions. The relative absence of progress in developing general purpose conversational agents contrast sharply with the substantial progress of the past 10-15 years in task driven, domain specific dialogue management systems and other types of NLP.

]

Teen speech in overdrive

Language Log - Mon, 2009-09-07 08:44

Another Zits cartoon on teenspeak:

And no, I can't make out what he is saying, though I could catch a few words.

[Addendum: Dhananjay Jagannathan writes to say that he has decoded what Jeremy is saying as: "I'm going over to Hector's house and I don't know if I'll be back in time for dinner so start without me." ]

Non Sequence of tenses

Language Log - Sun, 2009-09-06 14:31

(Part of) today's Non Sequitur:

I have the impression that this sort of thing ("He can now clearly see that there were going to be a lot more questions") happens a lot with the historical present, but I don't have any other examples at hand.

Google Demotes Literary Stars

Language Log - Sun, 2009-09-06 08:20

My post about Google's metadata problems, along with a similar piece in the Chronicle of Higher Education, got a lot of people talking about the problem in the press and the blogs. (I even ran into an allusion to it in a La Repubblica piece on the Google Book Settlement when I arrived in Rome yesterday morning.) A number of people passed along their own experiences with flaky metadata. Others criticized me on grounds that could be broadly summed up as "Don't look a gift horse in the server," "It's better than nothing," "Who needs metadata anyway?," "Just give them time," and "Why concentrate on trivialities like metadata while ignoring the real perils of corporate monopoly" (as in "serving as a consultant for monitoring the proper temperatures of the pitchforks in hell").

This is all to the good, if it helps move up the metadata issues in Google's queue. I do think this will get a lot better as Google puts its considerable mind to it. But there was one other aspect of the metadata problem which I hadn't noticed or even thought about, but which in its own small way was unkindest cut of all. It was noticed by the children's book author Ace Bauer, who was prompted by my account of the metadata problems to check his Google Books listing:

Turns out my review rating ranked only one star out of 5. That's dim. But see, the review upon which they based this ranking was Kirkus's. Kirkus loved the book. They gave it a star. One star. That's all they give folks. It's considered a major honor.

Indeed it is, and actually the falling-star glitch affects a number of writers, for example Roy Blount, Jr., the president of the Author's Guild, who is has been an enthusiastic backer of the settlement. Google Books assigns a one-out-of-five star rating to at least two of Blount's books on the basis of their starred Kirkus reviews, Crackers and First Hubby, and visits similar review rating downgrades on books by Guild vice-president Judy Blume and Guild board members Nick LemannJames GlieckOscar Hijuelos, among others.

 I don't know exactly what the Google people will say when they cotton to this one, but it's a good guess the first sentence will begin with "oy."

It's got to be a frustrating if very minor gaffe, particularly given the trouble Google went to to reach an accord with the authors. Of course it isn't hard to see how it could have happened, and it probably won't be that hard to fix, but it underscores the usefulness of having book-savvy people looking over your shoulder when you're setting up your metadata, whether you generate them yourself or get them from a provider. There's a transitivity to cluelessness. If you pass on obviously broken data, whether about starred review rankings or the quarter of a million Portuguese language books all dated 1899, you're apt to look foolish yourself, the same  way you do as soon as you put on the dumb t-shirt your grandmother sent you from Atlantic City. 

Added 9/7: On reflection, this feels like piling on. I do think Google wants to get this stuff right, and in this particular cases, "right" isn't as complicated as it can be elsewhere.

Like shooting feet in a barrel

Language Log - Sat, 2009-09-05 11:01

So Roy Ortega thinks that the Spanish-language media in the U.S. have an obligation to become "more proactive in encouraging [their] audience to seek full fluency in the English language". (Immediate side note: why do people seem to tend to write "the English language" instead of just "English" when making pronouncements like this?)

But let's not get Ortega wrong here. "By no means [is he] a rabid advocate of the English Only or English First movements. [He] certainly [doesn't] support declaring English as the country's official language and [he is] not calling on anyone to forsake their fist [sic] language." He's simply observing that "English is the dominant language of the U.S. and should be spoken by all of its citizens", that "[w]ithout adequate English-speaking skills, few can expect to achieve the highest levels of success in U.S. society", that "far too many adult immigrants and legal foreign residents living in the U.S. have failed to master the English language despite some having lived in this country for decades", and that "[m]any simply don't recognize fluency in English as an important part of their personal development".

The contradiction is outstanding — unless you take Ortega to mean that he is, in fact, an advocate of the English Only and English First movements (just not a "rabid" one) and that he supports English as the official language of the U.S. (he just doesn't support "declaring" this).

And please don't buy the I'm-only-saying-this-for-their-own-good platitude that Ortega is selling here. Actual research on the topic of English language adoption among immigrant populations in the U.S. has repeatedly found what Calvin Veltman is often credited with finding in an article entitled "Modelling the Language Shift Process of Hispanic Immigrants" (International Migration Review, Vol. 22, No. 4, pp. 545-562; published by The Center for Migration Studies of New York, Inc., 1988). Here's the article's abstract; you can read a more detailed abstract via PubMed here.

This article provides a longitudinal interpretation of the 1976 Survey of Income and Education data on the linguistic integration of Hispanic immigrants to the United States. The assumptions required to sustain such an analysis are examined, followed by the presentation of data suggesting that age at time of arrival and length of residence in the U.S. largely explain observed patterns of language shift. The analysis shows that movement to English is extremely rapid, occurring within fifteen years of arrival in the U.S. Further, most of the younger immigrants make English their preferred personal language.

The body of research that has been produced on this topic consistently finds rapid language shift across generations, from monolingual Spanish (or whatever the non-English immigrant language may be) in the first generation, to some level of bilingualism in the second generation, to monolingual English in the third generation — a remarkably stable observation generally referred to as the "three-generation rule", and if anything, the trend has been for this shift to speed up towards becoming a "two-generation rule". Ortega is thus not completely off-base in saying (in more provocative words) that many adult immigrants do not learn English, either because they don't feel the need to or for some other reason; these are just overwhelmingly likely to be first-generation immigrants whose children and grandchildren are speaking more and more English, at the expense of Spanish — an unfortunate breakdown in intergenerational communication that the Spanish-language media could arguably be helping with by encouraging bilingualism rather than language shift, as Ortega would have it.

Ergotopographs

Language Log - Sat, 2009-09-05 06:13

Back in July, the New Scientist's Feedback page reported that

THE powers that be at Guy Robinson's place of work insist that employees tell the office if they're "working from home". Human laziness being what it is - sorry, we meant to say "the employees being committed to maximising productivity in a forward-looking sense" - the welter of emails on Monday mornings got shortened to the three letters "WFH". Then someone was stuck working at an airport and sent the message "WFA".

Then, given the insistence by the virus that is language on mutating whenever possible, the changes poured in and escaped the limitations of the alphabet: "WFT" working on a train, "WF\__" working from a sunlounger (not being smug or anything) and "WF\_O__/" working from a plane (ditto).

Guy's colleagues suggest "WF#" for "working from prison", but they have not needed to use this, yet. Feedback suggests a few others: "WF=====" for working at a linear accelerator and "WF() - -()" for working in a laser lab (with lenses).

The Feedback editor suggested that

Now the phenomenon just needs a sciency name. It has to be "ergotopography", from the Greek words for "work", "place" and "writing".

and called for more examples from readers. A couple of days ago, the results were reported:

Unsurprisingly, we received a number of suggestions that are unprintable - either because they use characters or symbols our printers don't have, or because they would make the magazine illegal in Herat and/or Houston. Perhaps surprisingly, most of the former were also the latter.

Another which may cause the typesetters pause is John Harvey's "WF    " for "working from space". It will also probably stretch your manager's credulity, unless your business cards bear the words "Space Agency".

Among the remainder, we would thank Philip Ritchie for WF(O<-<) for "working from the bath", if it weren't for the office health and safety supremo standing over us, shrieking: "Don't do that!"

Jeremy Bailey claims to be working from a flat-bottomed boat with an outboard motor: WFgl___/, whereas Tom Hasker's boat has an entirely more serious propeller: WF§-\___/, and Mike Forsyth claims to be on a sailboat: WF~~~4~~~.

And, oh, all right then: following enthusiastic lobbying by several mums in the New Scientist office, here is one that Feedback had at first put in the unprintable category. It comes from Belinda Anyos, who suggests that an obstetrician writing from a labour ward might want to send: WF/\(☺)/\.

I don't know Greek well enough to tell whether combining the roots of ergon and topos to mean "place of work" is plausible or not, but at least according to LSJ, no classical author seems to have tried it. Nor does the OED know of any English coinages starting with "ergotopo-".

A more authentic word for workplace might be ἐργᾰστήριον, but ergasteriography is only a little less awkward than ergotopography. And anyhow, πόνος "toil, labor" might be a better choice for the "work" morpheme in this case.

Still, an awkward and perhaps inauthentic neologism is a good fit for a concept whose instances, as the New Scientist's feedback editor admits, "appear, in the words of one contributor's confession, to have been 'bred in captivity'".

In any case, I'm WF here:

Is your size your size?

Language Log - Fri, 2009-09-04 08:19

According to today's Cathy, men now have to worry about this too:

I don't understand spell-checkers

Language Log - Fri, 2009-09-04 07:29

Steffi Lewis asked whether this sentence (which, as she says, is attributed to Chico Marx) is well analyzed: Time flies like an arrow; fruit flies like a banana.

I answered as follows (with apologies to syntacticians for the casual low-class nontechnical description):

In the sensical version of the sentence, "time" is a noun phrase and "flies like an arrow" is a verb phrase (with "like an arrow" an adverbial modifier of the verb "flies"), while "fruit flies" is a noun phrase and "like a banana" is a verb phrase (with "a banana" as the object of the verb "like").  In the nonsensical version of the sentence, you just reverse those two analyses.


The system I was typing the response on uses a spell-checker, which objected to sensical — and I can't really blame it for that, because I sort of made it up…although I got three hits for it when I googled it just now, so (as I already knew) I'm obviously not the only person to make up that word, and besides, I find that there's an obsolete word sensical in the Oxford English Dictionary.  Anyway, the spell-checker's complaint about sensical didn't bother me.  But it also objected to analyses, and this seems very weird.  I assume it wanted analysis instead; but can someone more expert in spell-checking than I am tell me why on earth the spell-checker wouldn't be trained to recognize the plural?  What did it expect, analysises?  — Maybe it did: I just googled that, and got 76,100 hits for it.  But at least Google asked if I meant to google analyses instead, and for analyses I got 68,000,000 hits.  So if Google knows about analyses, why doesn't the spell-checker?  (I am sorry to have to report that  that the Language Log spell-checker is also objecting to analyses, which it has underlined in red on every occurrence in my draft of this post.  The shame!)  (It doesn't like analysises either.  So I conclude that spell-checkers don't want you to have more than one analysis.)

Atrocious

Language Log - Fri, 2009-09-04 01:19

Linguists around the world right now are packing for a trip to Scotland to attend the 50th Anniversary Golden Jubilee meeting of the Linguistics Association of Great Britain here in Edinburgh (it starts on Sunday). And those listening to the BBC's Radio 4 this Friday morning may have been a little discomfited to hear the weather man, in his official capacity, use the adjective atrocious to describe the weather in Scotland over the past few days. Really! Adjective control is getting lax at Broadcasting House. The word choice should be interpreted, however, in a cultural context. Not to put too fine a point on it, a linguistic context of whingeing, moaning, snivelling, grumbling, and overstatement about the weather that probably goes back to the first settlement by Angles, Saxons, and Jutes. The fact is that no one whose experience has been limited to the British Isles has any idea what would be an appropriate meteorological use of the adjective atrocious.

Barbara and I just got back last night from a train trip to Oban, a small coastal town in the windy, rainy, northwest of Scotland (trains on time almost to the minute). We also took a day trip to the Isle of Mull (ferry system perfectly integrated with the bus times — a wondrous thing for anyone inured to California's hopelessly unintegrated public transportation chaos). Yes, it did rain. Every ten minutes, day and night. The "bright intervals" that feature so heavily in the most optimistic of British weather forecasts were generally only a few minutes at a time. So we wore head-to-toe waterproof gear that we basically took off only when in our hotel room, and the trip was a delight. That judgment is not in any way clouded by overindulgence in the excellent single malt whisky to which Oban lends its name (as it happened, we didn't do the distillery tour and never even sipped the product). The fact is that Scotland at sea level is always fairly temperate — there is nothing in the whole U.K. that could possibly be compared with the kinds of temperatures familiar to residents of the northern USA and Canada.

We walked everywhere, past dark stone walls, the brightly painted harborside houses of Tobermory, dripping green hedgerows, washed ripe blackberries for the picking, publicly accessible (and entirely unsupervised) disused castles covered with moss. We enjoyed spectacular seafood meals (the Waterfront Restaurant on Railway Quay in Oban is truly excellent), and walked home through the rain afterwards. It was wonderful. We were never cold. If this was atrocious, I'd like to experience a lot more atrocity.

Edinburgh is currently about the same: reasonable temperatures (even when Arctic southward air streams set in, Edinburgh gets temperatures no worse than a crisp October day in New England), and continuous rain. Rain on grey volcanic rocks and handsome Georgian architecture and a thousand-year-old castle, past which I will (after donning my waterproof raingear) walk to my office at the university. So don't be afraid to pack a long plastic raincoat and rainproof hat, and come to Edinburgh for the LAGB. You'll love it. It's atrocious.

Semantic fail

Language Log - Thu, 2009-09-03 19:11

Leena Rao at TechCrunch points out a case where semantic search turned into anti-semitic search.

This morning I wrote about NetBase SolutionshealthBase, a semantic search engine that aggregates medical content from millions of authoritative health sites including WebMD, Wikipedia, and PubMed. But is it a semantic engine or an anti-semitic search engine?

Several of our readers tested out the site and found that healthBase’s semantic search engine has some major glitches (see the comments). One of the most unfortunate examples is when you type in a search for “AIDS,” one of the listed causes of the disease is “Jew.” Really.

The ridiculousness continues. When you click on Jew, you can see proper “Treatments” for Jews, “Drugs And Medications” for Jews and “Complications” for Jews. Apparently, “alcohol” and “coarse salt” are treatments to get rid of Jews, as is Dr. Pepper! Who knew?

Apparently this was not the result of amalgamating medical advice from Hamas, but rather a consequence of some artificial stupidity applied to Wikipedia, as a company representative explained:

This is an unfortunate example of homonymy, i.e. words that have different meanings.
The showcase was not configured to distinguish between the disease “AIDS” and the verb “aids” (as in aiding someone). If you click on the result “Jew” you see a sentence from a Wikipedia page about 7th Century history: “Hispano-Visigothic king Egica accuses the Jews of aiding the Muslims, and sentences all Jews to slavery. ” Although Wikipedia contains a lot of great health information it also contains non-health related information (like this one) that is hard to filter out.

And that's not the end of the fun and games:

If you look at the pros of AIDS (yes, it thinks here are pros to having AIDS), it comically lists the “Spanish Civil War.” One of the causes of hemorrhoids is “Bronco” (I don’t even want to know).

It only took a few clicks for me to get here:

Or here:

And vice versa respectively

Language Log - Thu, 2009-09-03 13:24

At some time approximately 30 to 35 years ago — that is, in the 1970s, back when disco had a future — I received a letter from my friend Jim Hurford. We were young lecturers then, me in London and him in Lancaster, though he was later to become Professor of General Linguistics at the University of Edinburgh. Here is what his letter asked me:

"Can you construct a grammatical and meaningful English sentence that ends with the words and vice versa, respectively ?"

Jim is now Professor Emeritus, and I now hold the Chair that he held for so many years, and I still have not succeeded in constructing an example of the mind-twistingly difficult sort he requested.

But Jim made the mistake of asking me many years before the Internet was invented. Today there is the web, and there are blogs, and comments areas, and there is Language Log. I am sure that our ingenious and talented readers — that would be you — will take up the challenge in the comments space below. If any of them should succeed, I will present the sentence to Jim, and take all the credit. (He will not read this post, because he is working busily on a book about the evolution of grammatical structures, and has no time for things like Language Log.) You (if you are the one who succeeds) will have the inward pleasure of knowing that you solved the puzzle. I will have the gratitude and admiration of my friend Jim. And Jim will have the example sentence he has so long sought. Doesn't that sound like a win-win-win scenario to you?

[Afterword: I don't know what reminded me of the puzzle Jim set all those years ago, or why it seemed so hard at the time, or what he thought would be the relevance of the structure. It is of course not a serious surprise that Language Log readers were able to solve it (see below). But it is perhaps a surprise that some of them completed it in less than ten minutes after the post went up on this site! What is particularly wonderful is that (of course, as JS Bangs was the first to point out) today we can simply Google up genuine examples from real texts. I didn't bother because I knew you would. The real point of this post is that thirty years has changed the nature of syntactic explorations beyond imagining. Back in the 1970s when we were young, you could have devoted your whole life optimistically scanning print sources and had only decades of disappointment for your pains. Today we have the trillion-word corpus of the web to search, and the entire investigation takes maybe eight seconds. (Incidentally, I'm deleting the comments that construct sentences that merely quote the phrase. If you allow quotations of ungrammatical bits and pieces to count, then instantly every string becomes grammatical and syntax is pointlessly trivialized. If you don't, then I think it turns out that nearly all word strings are ungrammatical.) —GKP]

Misleading pseudo-scientific argument of the week

Language Log - Thu, 2009-09-03 08:15

According to Abigail Norfleet James, Teaching the Male Brain: How Boys Think, Feel, and Learn in School (2007), p 37:

The shape of the inner ear is not the same for boys and girls. As we have seen in the previous chapter, the female cochlea responds more quickly to sound than does the male cochlea (Don et al., 1993) That means that boys are likely to respond to aural information of questions just a bit slower than girls will. Because boys don't hear soft or high sounds very well and because they don't respond to sounds as rapidly as do girls, boys may have trouble with auditory sources of information.

The reference is to M. Don et al., "Gender differences in cochlear response time: An explanation for gender amplitude differences in the unmasked auditory brain-stem response", J. Acoust. Soc. Am. 94(4): 2135-2148, 1993. And yes, there really are sex differences in cochlear response time — but the distributions for males and females overlap, as usual, and the average sex differences are less than a thousandth of a second.

There are different ways to measure the response, and different frequency bands to check — you can read the paper to survey all the differences in detail — but here's a typical figure from Don et al. showing the sex differences in cochlear latency for a two types of measurement in one frequency range:

And another figure showing differences for one type of measurement in different frequency ranges:

Most of the differences are in the range of .0001 to .0003 seconds (1 to 3 ten-thousandths of a second), and none are larger than .0007 sec.

In comparison, simple acoustic reaction time in adults ranges from about 120 to 300 msec. (R.D. Luce, Response times, 1986). For children, mean simple acoustic reaction times range "from 465 msec at age five to 190 msec at 15 years" (K. Andersen et al., "The Development of Simple Acoustic Reaction Time in Normal Children", Developmental Medicine and Child Neurology 26(4), 2008). "Simple acoustic reaction time" is how long it takes to respond to a sound when you know it's coming, and all you need to do is to press a key as soon as you hear it. Choice reaction times (where you need to interpret the simulus and respond accordingly) are much longer. And the time that it takes even the most attentive and cooperative child to respond to a simple verbal instruction is more like two or three seconds, if only because the instruction itself is likely to take nearly that long to be expressed.

So the average sex difference in cochlear response time of .0002 to .0006 seconds — even if this difference is preserved through the brain stem and the cortex, and translated to the interpretation of the stimulus and the formulation and execution of the response — is roughly a thousandth of children's typical simple acoustic reaction time, and about one part in ten thousand of the time that it takes them to respond to even the simplest verbal instructions.

While we're looking at the Don et al.paper, though, let's also reproduce what they found about pure-tone thresholds:

[The subjects in the Don et al. study were 17 females and 14 males aged 18-38.]

Dr. James claims that "Because boys don't hear soft or high sounds very well and because they don't respond to sounds as rapidly as do girls, boys may have trouble with auditory sources of information."

I don't know whether it's really true that boys have more "trouble with auditory sources of information" than girls do. I do know, however, that when Dr. James tries to persuade her readers of this by citing research about sex differences in cochlear response times and audiometric profiles, her argument is at best irrelevant and at worst dishonest.

And this error is not an isolated one. There is a growing popular literature on the biology of human sex differences, and the use of misinterpreted or overinterpreted "scientific" evidence is all too typical of the strand of this work that emphasizes sex differences in order to argue for sex-specific (and often sex-segregated) educational practices. For more on this with respect to sex differences in audiometric thresholds, comfortable listening levels, etc., see here, here, and here.

Syndicate content