academia | advice | alcohol | American Indians | architecture | art | artificial intelligence | Barnard | best | biography | bitcoin | blogging | broken umbrellas | candide | censorship | children's books | Columbia | comics | consciousness | cooking | crime | criticism | dance | data analysis | design | dishonesty | economics | education | energy | epistemology | error correction | essays | family | fashion | finance | food | foreign policy | futurism | games | gender | Georgia | health | history | inspiration | intellectual property | Israel | journalism | Judaism | labor | language | law | leadership | letters | literature | management | marketing | memoir | movies | music | mystery | mythology | New Mexico | New York | parenting | philosophy | photography | podcast | poetry | politics | prediction | product | productivity | programming | psychology | public transportation | publishing | puzzles | race | reading | recommendation | religion | reputation | review | RSI | Russia | sci-fi | science | sex | short stories | social justice | social media | sports | startups | statistics | teaching | technology | Texas | theater | translation | travel | trivia | tv | typography | unreliable narrators | video | video games | violence | war | weather | wordplay | writing

Sunday, December 11, 2005

Statistically Improbable Phrases

Recently I've become obsessed with the SIPs feature on SIPs are Statistically Improbable Phrases that the Search Inside function has deemed distinctive in a book. Amazon explains, "If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. ... SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!."

Here is today's weirdest SIP, shitty fish, which came up when I looked at Lydia Millett's Everyone's Pretty.The phrase shows up four times in the novel and only once in any other book scanned by Search Inside.

"... Nature is unappealing at the dinner table. -Fish with a shit. It's a shitfish! -Language Barbara! Language! -Fishy shit and shitty fish, fishy shit and shitty fish, sang Babs, and began to hop on one leg, spilling wine on the carpet. Phillip's ..." (68)

"... -Mind of Christ and shitty fish, mind of Christ and shitty fish, continued Babs singsong, and then stopped hopping long enough to empty her goblet. -Come ..." (69)

The SIP also appears in a translation of Alexander Solzhenitsyn's One Day in the Life of Ivan Denisovich:

"... show them any smaller!" "I tell you what, if they brought that meat to our camp today instead of the shitty fish we get and chucked it in the pot without washing or scraping it, I think we'd . . ." (122) explains, "For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements." This isn't a useful claim for many of the SIPs that turn up in the searches I've done (and yes, I've done more than a few as it's paper-writing season). Without any other context, I'm guessing that 'shitty fish' is an example of foolish echolalia in Millett's book and a marker of desperation in Solzhenitsyn. The proximity of the words 'shitty' and 'fish' is statistically improbable, but its appearance in two very different books doesn't connect the two texts. I guess I'm most interested when statistical improbabilies appear in multiple texts but have no relevance to one another: delightful examples of the atomization of language. The SIP function takes in complete sentences, paragraphs, chapters, etc. and spits out pairs of statistically improbable words: it atomizes the text, and, far from aiding in determining some "important plot elements," it reduces the words only to what they are, configurations of letters with no meaning attached.