Text mining can prove a useful tool when attempting to tease out the connotations of specific words within large bodies of texts. Close reading hinders the ability of one scholar to cover more than a manageable amount of books, but with distant reading tools one scholar can take a large corpora and apply the same methods of analyses to garner support for their argument. The argument can thus open itself up to under-studied works. 

In my project what I hope to have shown is a way in which traditional scholarship can be translated to distant reading methods, and then applied to texts not served by the original scholarship. My keyword findings adhere to the original argument made by Gilbert and Gubar, and those texts I processed outside the scope of their argument uncovered similar themes. Though my analysis required the specific context of each word, it was a far more efficient means of digging into several novels at once than close reading, and it yielded similar results. For example: in my comparison against A Tale of Two Cities, we can see that the female authors were, in fact, incorporating words and tropes in a similar manner to their male contemporaries, therefore supporting the argument that women were covertly incorporating messages of rebellion within already popularized literary styles. 

This type of “dataset for distant reading,” as Ted Underwood might call it, is not limited to Gilbert and Gubar’s work or even to literary studies. Text mining tools like AntConc can be employed to study a wide range of art forms including video games, film, and even fanfiction. The great thing about digital humanities projects is that they “actively and deliberately [invite] other perspectives into the data analysis and storytelling process” (D’Ignazio and Klein). Within the scope of my project I did not have the chance to open up my processes to books far outside the original lens of Gilbert and Gubar. The reason being that my project functions as a prototype. My goal was to prove that my processes can mine valid results that can then be analyzed to prove any given argument, mine being about concealed themes. Given the results of my dataset, my future project would see this type of analysis open itself up to works that have never stepped foot into the realm of literary scholarship. In future studies, I would also add additional keywords that reflect synonyms of the time or words that were used by different cultures but meant the same thing. In addition, I would add in a larger control for male-authored texts and do a more practiced comparison of female versus male use of the keywords. 

With the steps and processes I have outlaid here, though, it does not have to solely be myself who undertakes the next steps of this project. Accessibility, in this sense, is not only applicable to the texts being studied, but to the one doing the studying. Anyone with a computer and a text file can run the same experiments that I have. I hope this inspires others to create processes that can help open up the world of traditional scholarship. 

Continue to Bibliography.

Return to Main Menu.