Text Analysis Methods Grad Workshop

This is a script for a workshop on using Voyant and TAPoR for a graduate class on research methods.

1.0 Introduction

  • Overview
    This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant.
  • Outline
    In this workshop we will:

    • First, look at how to use a single Voyant tool, Cirrus, with different texts.
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Then learn how to load your own text into Voyant.
    • Finally, we will look at TAPoR where you can find other tools.
  • Help
    Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

  • Voyant Tools

2.0 Preparing a text for a question

The first step in text analysis is to assemble a text to fit your question(s). What do you want to ask about? What sort of text would help you ask questions about an issue? How can you use the internet to build a text?

For this workshop lets assemble a text off the internet.

  • Decide on some aspect of popular culture or computing culture well documented on the internet.
  • Google keywords associated with the subject you want to study.
  • Skim the results and then develop selection criteria for what you want to scrape.
  • Scrape a set of texts using Google.
  • Copy and paste the texts into a text file. Clean out the navigation information and irrelevant parts.
  • Export a text file for text analysis.

For more see Appendix 1: Finding and Preparing an Electronic Text


3.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools.

Go to the Cirrus tool and load up your text: http://voyant-tools.org/tool/Cirrus and load the text.

There are a number of ways to load a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

 

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools/ and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

You can see Cirrus with a text like “Frankenstein” here: http://bit.ly/VoyantCirrusFrankenstein [tempmainbeta]

The Cirrus tool shows you a word cloud of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That’s what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won’t work that well. Try your text in different tools.


4.0 Using a Reading Skin

Voyant Tools can also be composed into “skins” that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Go to Voyant and load your text into the Reading Skin: http://voyant-tools.org

If you want to see a text in the Reading Skin you can look at Frankenstein: http://bit.ly/VoyantFrankensteinStop [tempmainbeta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [tempmainbeta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud. Then click on a text in the Word Trends and play with the KWIC.
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


6.0 More Information

Finding Texts:

Aggregating and Cleaning Texts:


7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.