a Description of Ethan`s Slides

Slide 9: Who’s words are in your bag of words? Topic modeling means you’re dealing with words that have to be ONLINE, computerized. Who is the actual editor behind the editions that you are dealing with online are not always so clear (i.e., often times publication information is not provided on Project Gutenberg editions. Slide 10: Harkness We’re working with novels, and it seems like it probably wouldn’t matter whether a few words were misspelled in a novel, and it often feels as though because there are more words, a few mix-ups is a matter of percentages. However, it is agreed upon in the bibliographical community (this article by Bruce Harkness is one representative) that textual scholarship regarding the publication of novels is just as important. Slide 11/12: Here is the issue at hand In this diagram we’ve outlined the issue at hand – that something happens between the author writing a work and that work ending up in our hands as a book or in an online edition a la Gutenberg. The problem is that there are almost always mix-ups, errors, complications, or other problematic occurences between this writing and getting something in our hands. This is a problem even when we’re just looking at one edition – when we’re talking about doing this with dozens, or hundreds of novels in a topic model like this, the problem is greatly compounded. Slide 13: Overwhelmed Don’t worry! And don’t feel like your whole model will be ruined by this. Even though this can be a huge problem, we’re not sure how important it is to topic modeling given some of the things topic modeling takes for granted. Slide 14: Kinds of editions In this image, G.T. Tanselle shows the kinds of editions that scholars can try to make – each has different principles behind them. Some seek to recreate exactly a previous historical document, all errors intact; others try to recreate an ideal eclectic text based on what they think of as the author’s intention, introducing some changes that may not be present in any existing edition; others don’t care about history at all, and editors/publishers introduce changes based on something like aesthetic preference. All of these are here for us to think about the principles behind editions and whose words you’re getting in your bag of words: words from authors, publishers, modern editors, and so on. This introduces a number of problems: how do the principles behind different editions stack up? What if we are mixing in contemporary editorial words with historical authorial ones? Frequently editorial methods do not mix with one another in terms of principle. Slide 15/16: Case study of Ulysses Gabler edition – The Gabler edition of Ulysses has a particularly thorny history, but this is a text we have in our corpus. We have used the edition from Project Gutenberg – but as you can see, what was found in our edition was very different form many other editions in one pivotal moment, adding multiple sentences. So who’s words are these? Are they Joyce’s, or Gabler’s? Does it matter that they were from a different manuscript? Do we care for the purposes of a model? Slide 17: How much does all this matter? We’re already “massaging” texts, adding stop words (another way of saying taking out words). So who’s words are in the bag of words, or not in our bag of words? The insights topic modeling come at a cost, one of which seems to be a careful consideration of the texts being used as historical objects representative of the historical time in which they were written/edited (something models are often used to think about). Keep in mind, that this was just one example of one change in one edition of one novel in our corpus… something to consider as you go about making your corpus and interpreting your model!

a Description of Ethan`s Slides

Related documents

Products

Support

a Description of Ethan`s Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib