Computerized Textual Analysis

Apologies for this rough post, but I don’t have much time before I have to catch my shuttle to the airport. I’m in Victoria at the Digital Humanities Summer Institute, taking a course with David Hoover (NYU) called “Out of the Box Text Analysis.” All week, I’ve been trying to work through my own skepticism about whether:

  1. computerized analysis of literary texts merely confirms/denies what we already know;
  2. the results are interesting and valuable enough to justify the tedious work of prepping the texts;
  3. we come up with reasons to justify the results so that they confirm what we already think about an author;
  4. these new, high-tech intellectual exercises serve as a way to justify talking about the same old texts and questions we always talk about;
  5. & so on…

But yesterday, I was able to produce my first results, and it’s amazing how the thrill of generating a meaningful graph can override the ache of skepticism.

Here’s what I did: I wanted to examine images of whiteness in Imagist poetry of the 1910s. I downloaded the PDFs of Imagist anthologies from 1914, 1915, 1916, and 1917, converted them to plain text, and cleaned up most of the errors. Then I combined the anthologies into one text set and ran a word frequency test. Much to my delight, the first word to appear after all the common words like articles (a, an, the), pronouns (I, he, it), prepositions (on, to, of), and “to be” verbs (is, are, was) was… WHITE!

In this case, the computerized text analysis did confirm what I already suspected about Imagist poetry, i.e., that it’s riddled with images of whiteness. But running the various tests (which I’ll spell out step by step when I have more time) offered information that could only be tediously gleaned through careful close reading and tabulations, such as:

  1. By inserting dividers <div><\div> between poets in my anthology texts, I could see that certain poets, such as Richard Aldington, H.D., and Amy Lowell, used white a lot (10-15 times each in a small set of poems), while others, such as F. S. Flint and D. H. Lawrence, hardly use the word at all.
  2. By testing for color words more generally, I learned that Imagist poetry is rife with color words, though terms that connote whiteness, such as “silver,” “pale,” “moon,” “stars,” “ivory,” and “swan,” are most common, with terms for yellow, including “gold,” “golden,” and “sun,” are probably in second place.

Here’s a graph of the most common color words in the Imagist anthologies I tested.

Screen Shot 2016-06-10 at 2.34.24 PMI then attempted a more complex test of Cyrena Pondrom’s brilliant argument in her article “H.D. and the Origins of Imagism.” In that essay, Pondrom uses traditional close reading, historical, and biographical analysis to argue that, although Ezra Pound is typically credited as the founder of Imagism, H.D. actually originated the style. She was writing Imagist-like lyrics well before Pound, and when he saw her poetry and labeled it Imagiste, he then began adopting the concise, spare style in his own verse.

To test Pondrom’s argument, I ran a word-frequency comparison between a collection of H.D.’s poems, a collections of Pound’s poems, and compared them to a test set of the 4 Imagist anthologies combined. Each of these anthologies contain poems by H.D., Pound, and about a half dozen other poets.

Screen Shot 2016-06-10 at 2.35.37 PM

My comparison shows that the Imagist group [green dots] is in fact stylistically closer (as measured by word frequency) to H.D. [blue dots] than to Pound [red dots], which does imply that she may be the original “author” of the style. But perhaps more interesting than this rather loose conclusion is the list of most distinctive words for each poet that my test generated. The top 25 words that most distinguish Pound from H.D. are these:

hath, thee, thou, thy, ye, doth, time, mine, hast, lo, unto, ways, things, oh, ’tis, been, good, lady, glory, thine, art, truth, o’er, soul, seen

Compare that list to the top 25 words that most distinguish H.D. from Pound:

lift, has, cut, could, rocks, feet, across, fire, break, flower, touch, rock, leaf, caught, bright, wild, salt, must, gift, goddess, hurt, wet, beach, race, left

What’s so striking is surprising is that Pound’s list of distinctive words is chock full of archaic poetic diction: the “hath,” “thee’s” and “thou’s” that characterize old fangled English poetry—not the strikingly modern diction of Imagism. H.D.’s most distinctive words, in contrast, are short, concrete nouns and active verbs—the very kinds of language that characterize the Imagist Doctrine, which warns: “go in fear of abstractions.”

This list provides stronger evidence that H.D.’s poetry more closely aligns with Imagism stylistically, thereby providing additional support to Pondrom’s argument.

I may not yet have generated an original argument, but I’ve learned enough to begin to see how computerized textual analysis can complement (rather than substitute for) close reading of poetry, helping me to test, extend, and deepen my findings.

Please like & share:

Leave A Comment

css.php