Category Archives: Workshops

DH2015 Workshop: The New, The Neat & The Gnarly

Voyant Tools Logo
@VoyantTools

This workshop will focus on the second major release of Voyant Tools (2.0), which addresses several of the major shortcomings and irritants of version 1.0. In addition to performance improvements throughout, the search and filtering functionality have been vastly enhanced and Voyant now supports proximity and n-gram operations.

We have designed this workshop to be of interest both to new users of Voyant, who will get an introduction to the platform, and to existing users, who will discover all the new functionality 2.0 has to offer. Please note that some URLs in this workshop document are time-sensitive and may not be functional beyond the workshop – you may want to consult the Workshops page to see if there’s a more recent workshop document.

Outline

  • Getting Setup
  • First Steps: Cirrus
  • Next Steps: The Full Environment
  • Bring Your Own Texts
  • Getting to Know the Tools
  • Exploring More Tools
  • Advanced Search Functionality
  • Exporting URLs, Tools & Data
  • Voyant Tools Roadmap

Getting Setup

One of the strengths of Voyant Tools has always been that it’s freely and conveniently accessible online – there’s a hosted version that anyone can use (at voyant-tools.org, though we’ll be using a more recent beta version). There’s also a downloadable version of Voyant Tools that can be run locally and that has several potential advantages:

  • You can keep your texts confidential as they will not be cached on our server.
  • You can restart the server if it slows down or crashes.
  • You can handle larger texts without the connection timing out.
  • You can work offline (without an Internet connection).
  • You can have participants in a group (like in this workshop) run their own instance without encountering load issues on our server.

For this workshop, it’s strongly recommended that you use the standalone local instance of Voyant Tools (available through VoyantServer):

  • download  the VoyantServer 2.0 zip archive
  • double-click on the zip archive to expand its contents
  • double-click on VoyantServer.jar
    • on Mac, because of security restrictions on applications that aren’t signed and approved by Apple, you may need to Ctrl-click on the VoyantServer.jar file, select open from the menu, and then click open (not the default button) in the next dialog box
    • you’ll need Java 1.7+ for this, your computer will tell you if you need to download Java

You can find more information about Running VoyantServer, including tips in case of problems. If you’re unable to run VoyantServer (because of a problem with your machine or because you’re using a tablet, or for any other reason), you should be able to following along using one of the two following URLs (in order of preference):

For most of the Workshop outline below we will provide a list of links for the different URLs possible, such as for the home page [local, workshop, beta].

First Steps: Cirrus

Cirrus is the word cloud tool in Voyant. Have a look at an example [localworkshopbeta]. Voyant Tools (Austen)

  • What text do you think it is a cloud for?
  • What features are metrical (based on measuring the text in some way)? How are the other features generated?
  • What words are missing?

new All tools and visualizations  in Voyant 2.0 are HTML5-based, no more Flash or Java applets (that cause cross-platform compatibility and security issues.

Next Steps: The Full Environment

Voyant Tools is an environment that can host different individual tools (like Cirrus) in different views and layouts. The default view of Voyant is composed of 5 panels where the tools interact with one another. Try opening the Austen corpus [localworkshopbeta]. If you click on a word in Cirrus, the Trends graph will update. If you click on a node in the Trends graph, the Contexts tool will update. Here’s a summary of the 5 visible tools:

Voyant Tools Numbered (Austen)

 

  1. Cirrus: a simple wordcloud that displays the highest frequency terms in the corpus (that aren’t in the stopword list)
  2. Reader: a infinite scrolling reader for the actual text in the corpus (this fetches the next part of the text as needed)
  3. Trends: a visualization of word frequency across the corpus or within each document (depending on the mode)
  4. Summary: a high-level summary of data from the corpus
  5. Contexts: a list of occurrences of a specified word (this is sometimes called a concordance or a keyword in context)

Explore the visible tools (we’ll come back to the other tools later):

  • what happens when you hover over the help icon? what if you click it?
  • which tools trigger responses from which other tools?
  • what scale is each tool (entire corpus, entire document, part of a document, etc.)?
  • what is the visualization in the bottom of the Reader (middle-top) panel?
  • try a simple search in the Reader panel
  • what is relative frequency in the Trends tool?
  • what are vocabulary density and distinctive words in the Summary tool?
  • what does the plus icon do in the Contexts tool?
  • what is the difference between context and expand in the Contexts tool?

new Voyant 2.0 uses a new, crisper theme (the global appearance of the interface).

Bring Your Own Texts

A strength of Voyant Tools has always been that you can use an existing corpus (such as the Austen corpus we used above), or you can create your own corpus from the home page [localworkshopbeta]. voyant-home   There are three primary ways of creating a corpus:

  1. type or paste text into the large box (you can copy-and-paste text from a webpage or word processor, for instance) – in this case you’ll be creating a corpus with one document
  2. type or paste URLs into the large box, one URL per line – this will create a corpus with as many documents as you have URLs, Voyant will try to fetch the content from the specified locations (so they can’t be behind a password or restrictive firewall); the URLs can point to documents in various supported formats (see below)
  3. click the upload button and select one or more files to upload – the files can be in a variety of formats, including plain text, HTML, XML, RTF, MSWord, and PDF, or a Zip (archive) file containing documents in one of the supported formats

For the purposes of the workshop it might be best to try first with a simpler file format (like plain text or MSWord), but it’s also possible to use XML very powerfully by clicking on the options icon (when hovering in the Add Texts header) and defining XPath expressions to documents, body content and metadata such as title and author.

new When uploading files, you can now select multiple files at once by using the Ctrl and/or Shift keys.

Getting to Know the Tools

Each of the several tools in Voyant has its own particularities and peculiarities, but here are some general principles that apply to several tools.

Options. Many of the tools provide parameters directly visible (usually in the bottom part of the tool). The Contexts tool for instance (bottom right-hand corner of the default skin) has options for searching, for the context size (how many words to show on each side of the keyword in the table), and for expand size (how many words to show on each side of the keyword when you expand the occurrence by clicking on the plus icon in the first column of the row). In addition to these visible options, some tools also have additional options that can be accessed through the options icon in the top header. The Cirrus tool, for instance, has an option for modifying the stopword list.

Voyant Tools Options

Stopwords. The stopword list contains common words that usually have less meaning and are very common in most texts, such as determiners (“the”, “a”) and prepositions (“to”, “in”, “from”), etc. One person’s stopword is another person’s treasure, and it may be worth looking at the list of words to see if there are ones you’d prefer to show or if there are words that you don’t want to show and that should be added to the stopword list. You can edit the list by click on the options icon (in Cirrus, for instance) and clicking the edit button. Note that you can apply the newly selected or edited list to the current tool only or globally to all tools that support stopwords (globally is the default).

Voyant Tools Options

new Voyant 2.0 now uses auto-detect by default so it’s no longer necessary to choose a stopword list (unless the auto-detect option doesn’t work for you).

Table/Grid Headers. The column headers in table/grid views includes functionality that may not be obvious. First, a help tip will appear when you hover over most column headers to briefly explain what that column is showing. Next, a down arrow will appear in the right part of the column header that and clicking on the down arrow will allow you to sort by that column (when possible) and to toggle the visibility of columns. Finally, if a column is sortable, you can also click on the header to toggle between ascending and descending order for sorting the table by that column.

Grid Headers

newInfinite Scrolling Tables/Grids. Tables can sometimes contain a huge number of logical items (for instances tens of thousands of terms in a document) which would be impractical to load at once. In Voyant 1 there was a paging mechanism that allowed the user to see 50 items at a time by advancing or rewinding by “page”. In Voyant 2 items are loaded on-demand as the user scrolls through the table – in most cases that should happen fairly seamlessly.

Corpus/Document Modes. Some of the tools can operate at variable scale, either showing data at the corpus level or at the individual document level – this can be a bit confusing if you’re not sure what you’re seeing. For instance, by default Cirrus shows top frequency terms for the entire corpus, but you can also generate a Cirrus from the terms of an individual document – one way to do this is to click on the Documents tab in the lower left-hand panel and click on one of the document rows. The Cirrus that appears will be for just one document, and if you want to revert to Corpus mode you can click on the “reset” button that appears in the lower right-hand corner of the Cirrus tool.

Cirrus Scale

Resizing. The individual tool panels are resizable, the mouse pointer should change to a resize icon when you are hovering over the inner borders between tools and you can drag the border to resize. Similarly, the columns in table/grid tools are resizable.

Exploring More Tools

newThe way you access other tools in Voyant 2.0 has been improved and simplified, particularly with the introduction of tabs (multiple tools available from each panel) and the introduction of the tool switching menu.

In addition to the five tools that are displayed by default (Cirrus, Reader, Trends, Summary and Contexts), each of the five panels makes it easy to access additional tools, some of which we’ve mentioned already. Here are the other tools available from the tabs:

  • Corpus Terms: displays frequency and distribution information for terms (types or unique words) in the corpus
  • Links: displays a network graph of the collocates of keywords (the highest frequency terms that occur close to the specified search terms) – you can click on individual terms to fetch more terms and you can drag terms off the tool to remove them
  • Collocates: similar to Links, but this presents collocates of search terms in a table form
  • Documents: lists the documents in the corpus, including some metadata (where available, such as title and author), as well as counts of words/tokens, types and a ratio of types to tokens
  • Phrases: lists the recurring phrases in the corpus (though any phrase must be repeated in a document before it is counted at the corpus level); this is a new tool in Voyant 2.0 and one of the most useful functions can be to see the longest repeating phrases (without having to specify a search query); note that there are different options for handling overlapping phrases
  • Bubblelines: this is another representation of the distribution within each document in the corpus, it can be helpful for perceiving where different terms appear together (overlap)

All of these tools can be accessed through the tabs, but they can also be invoked from the tool switching menu (a windows-like icon) that appears when you hover over the header of any tool.

Tool Switch

If you click on the tool switching icon a nested menu will appear. The first items will be a list of one or more tools that fit most naturally in that tool panel, but you can also navigate tools by scale (corpus or document) or by tool type (visualizations, tables/grids, other).

The skin header (the blue bar at the top) also has a tool switching menu which allows you to replace the entire page with one tool. This is also a convenient way to access the ScatterPlot tool which provides a visualization of Correspondence Analysis or Principle Component Analysis (more complex analysis of how terms are shared between documents).

Note that some of the tools from the current 1.0 version of Voyant have note yet been implemented in version 2.0, such as TermsRadio, Knots, and Bubbles. Those should be implemented in the coming months, though some of the other tools may be abandoned, especially those that rely on Flash or Java.

Advanced Search Functionality

new Much of the advanced search functionality is new in Voyant 2.0 – we’ll go through some highlights below.

Help with the search syntax is displayed when you hover over the question mark icon in a search box. The hovering tip box will disappear after a few seconds, and you can click on the question mark to have a dialog box appear until you dismiss it.

Search Syntax

Search functionality is fairly consistent in all tools that support search. For experimentation, let’s work in the Corpus Terms tool (which is the second tab in the upper left-hand panel where the Cirrus wordcloud is displayed by default). These examples use the Austen corpus [localworkshopbeta].

  • exact match: think this searches the exact word (though it’s case insensitive, there’s currently no way to perform a case-sensitive search)
  • wildcard match: think* this matches the root of a word and includes variants as a single term (think, thinks, thinking, etc.), note that for now wildcards can’t be used at the beginning of words and produces inconsistent results when used in the middle of words
  • expanding wildcard match: ^think* this is similar to the previous wildcard match but this time each variant is counted and displayed as a separate term (this can be useful for seeing what terms are actually included in a wildcard match)
  • multiple matches: think*, ^think* you can search multiple terms (two or more) by separating them by commas – a simple search might be for exact matches think, thinking, but you can also use more complex searches like think*, ^think* to get the best of both worlds form wildcard matches (counting the total wildcard matches as one term and also seeing the individual matches).
  • combined matches: think|thinking use a combined match to merge two or more search terms into one result – this might be useful for counting singular and plural forms of a word, but not all wildcard forms (time|times but not timely, etc.)
  • phrase match: “time enough” this matches an exact phrase or sequence of words – note the use of quotes (if you exclude the quotes you’re essentially performing a combined match for time|enough, though that may change in the future)
  • proximity match: “time enough”~10 this is essentially a NEAR match, where the terms in quotes (there can be more than two) must occur within a specified number of words (in this case within 10 words, but you can specify a different number for the proximity); note that words can appear in any order, so enough might occur before time; it’s not possible to expand the match with the ^ operator like with wildcard searches, but you can use the Contexts tool to see the actual occurrences that are being matched
  • multiple matches: time*, time|times, “time enough”~10 it’s possible to mix and match the different syntaxes, as with this example that has a wildcard match, multiple matches, combined matches, and a proximity match

Exporting URLs, Tools & Data

A distinguishing feature of Voyant Tools is its ability to generate URLs that can be bookmarked or shared and that point to a specific corpus with specific parameters.

newThe URL in the browser location bar will now update automatically after you create a corpus – you can bookmark or share this URL directly.

To export the URL from the current skin (combination of tools, not just one tool), click on the export icon from the top blue header bar.

Voyant Export

This will cause a dialog box to appear with various export options, the first of which is a simple link that can be copied into the clipboard or clicked to open the URL in a new window.

Voyant Export Skin The same basic process works for individual tool panels as well (if you just want to export or share, say, the Cirrus visualization), except that additional parameters are usually included with the tool panels (specific search terms that have been selected, for instance).

In addition to exporting a URL, you can also generate a bibliographic entry for Voyant Tools (if you wish to cite it, which would be awfully kind of you :), or if you want to export a live dynamic tool panel. The exported tool works much like a YouTube clip that can be embedded into any website – it pulls interactive content from a remote site. For both of these options, expand the “Export View” menu (see the image above).

The HTML snippet for a live tool might look something like this:

<!– Exported from Voyant Tools: http://voyant-tools.org/.
Please note that this is an early version and the API may change.
Feel free to change the height and width values below: –>
<iframe style=’width: 100%; height: 400px’ src=’http://beta.voyant-tools.org:80/?corpus=austen&view=Cirrus’></iframe>

Which should produce a live tool like this:

Important notes about URLs and embedded tools:

  • During this workshop we’re using special instances of Voyant Tools that may not be accessible to others – that’s certainly true for a standalone (local) instance of Voyant running on your machine, but it’s also true for the workshop and beta URLs where corpora are less likely to remain accessible, unlike the current production version of Voyant Tools where corpora remain accessible as long as they’re visited regularly (at least once every three weeks).
  • Embedding the HTML snippet may be a bit trickier with some Content Management Systems. In WordPress for instance, if you’re not an administrator, you may want to install a plugin like iframe.

In addition to exporting a URL or a embedding an interactive tool, Voyant provides some additional data exporting features, depending on the tool. For instance, some visualizations (like Cirrus, Trends, and Links) allow you to export data as graphics (a PNG or SVG), while the table-oriented tools (like Corpus Terms, Contexts and Phrases) allow you to export data in different formats (HTML, tab-separated values, and JSON). The tab-separated values can be especially useful since you can copy the generated output into a clipboard and paste it directly into a spreadsheet program (like Excel or Google Spreadsheets).

Export Tab-Separated ValuesNote that in the current beta version it’s only possible to export the currently visible/loaded data, but that in a close future release it will be possible to export full datasets.

Voyant Tools Roadmap

Voyant Tools is an ongoing project and we’ll continue to improve and enhance the platform. Here’s a tentative roadmap for future development:

  • by fall 2015 we hope to release Voyant Tools 2.0 and replace the current 1.0 version – some of the major remaining work includes:
    • various bug fixes
    • allow for adding and reordering documents in existing corpora
    • adding a password protection for corpora
    • backwards compatibility issues to ensure that existing Voyant URLs continue to function correctly)
  • during fall 2015 and winter 2016 work will resume on Voyant Notebooks, a literate programming environment that allows a combination of writing, code snippets, dynamic tools, and other data output (more here). Voyant Notebooks is intended to leverage the existing analytic and visualization capabilities of Voyant while allowing users to customize some functionality and include a narrative description of their work
  • ongoing work to summer 2016 on the next version of Voyant Tools that will include functionality for part-of-speech tagging, lemmatization, and topic modelling (some work has already been done on each of these, but was put on hold to ensure that Voyant 2.0 could be released)

Please feel warmly encouraged to help improve and guide further development of Voyant Tools by providing us with feedback, including bug reports and feature requests. You can follow the developments on Github, Twitter, or contact us directly (sgsinclair at  Google’s email service).

My Very Own Voyant – DH2014 in Lausanne

This page is outline for the DH 2014 Workshop My Very Own Voyant: From Web to Desktop Application (PDF).

0. Introductions

Where we introduce ourselves and our experience with Voyant-Tools.

  • Have you used Voyant-Tools?
  • What do you hope to learn?
  • Outline of what we will accomplish

1. Installing and running VoyantServer locally

Where you install VoyantServer on your laptop and run it.

Note: It’s best to click the Stop Server button in VoyantServer after you are finished or you may leave processes running.

2. A brief tour of Voyant

Where those unfamiliar with Voyant get a brief tour.

3. VoyantServer settings

Where you learn about controlling VoyantServer.

Managing data (corpus indices)

  • Where are corpus indices cached?  By default data is stored in a temporary directory that is specified by your operating system. Data in that directory should persist when you start and stop VoyantServer, but it may be cleaned out by your operating system when you restart your machine.
  • The easiest way to specify an alternate location, one where the data are more likely to survive a machine restart, is to create a new, empty directory called data in the same folder as where VoyantServer.jar is located.
  • You can also set another location for your data by providing a path to an existing parent folder in the server-settings.txt file. Here is an example:

data_directory = /Users/grockwel/Documents/VoyantServerData

Modifying a Corpus ID/Name

Voyant Tools automatically assigns an ID/name to a corpus, a generated value like 1404362954425.942. After creating a corpus you can see the id/name by clicking the Export icon and producing a URL.

You can change the corpus id/name, but it should be considered an advanced operation. There are two steps:

  1. In your data directory there’s a folder named trombone3_0 which contains individual folders for each corpus. The first step is to find the folder that corresponds to your corpus and rename the folder to the new corpus id/name that you wish to use (it’s best to use a reduced character set such as alphanumeric characters, dots and hyphens)
  2. In the corpus folder there’s a file named corpus-metadata.xml – open it with an XML or text editor and modify the entry that is below the id entry, near the top of the file. Save the file, and the corpus should now be available with the new corpus id/name in the URL.

Corpus Metadata

 

Handling large corpora

  • For larger copora (>10 MB) you can increase the memory of VoyantServer (set the value in megabytes such as 1024, 2048, 4096, etc.). Remember to stop and restart.

4. Confidentiality

  • Using VoyantServer on confidential information.
  • How to make sure VoyantServer can’t be accessed.

5. Setting up a public server

  • How to run VoyantServer for others.
  • Deploying as a Tomcat application.

Deploying as a Tomcat application

Tomcat

VoyantServer ships with a compliant Java Servlet web application that can be deployed under different servlet containers, such as Apache Tomcat.  Here are some steps:

  • download Tomcat (like the core version of Tomcat 7 – the tar.gz file is recommended for Mac to preserve executable file permissions)
  • uncompress the archive (usually double-clicking on the file)
  • copy the _app folder from VoyantServer to the webapps folder in the Tomcat folder (see image below) – be sure to copy and not move the folder (on Mac you can hold the option key while dragging the folder)
  • rename the _app folder  to voyant
    • it’s actually not necessary to rename the folder, but then the URL would be something like http://127.0.0.1/_app
    • you can also make the application run in the root of the server by deleting the existing ROOT folder and renaming _app to ROOT – the the URL is something like http://127.0.01/
  • now start Tomcat by running bin/startup.sh from the Tomcat folder (this is typically done on the command-line, you can read the RUNNING.txt file in the Tomcat folder for more information)
  • usually you can then visit http://127.0.0.1/voyant

By default the data will be stored in the temp directory in the Tomcat folder (which, despite its name, shouldn’t disappear during Tomcat or machine startup). This and many other settings can be tweaked, but it’s best to look at the Tomcat documentation.

6.0 Exporting and Skinning

If we have time we will now show you how you can experiment with other Voyant Tools like ResoViz.

  • After loading a text (like http://rss.cbc.ca/lineup/topstories.xml) you can choose the Export button and use “a URL for a different tool/skin and current data” – You can now experiment with different tools not available in the standard skin. Try ResoViz.
  • You can also try a different skin. Experiment with the skins and try the Skin Builder.

7.0 For After the Workshop

Try Voyant on a text or corpus of your own after the workshop.

  1. Find or assemble a text of your own.
  2. Try studying it with Voyant.
  3. Experiment some more with the advanced features like the Exporting to a different skin. Try opening your corpus in the Skin Builder and developing your own skin.
  4. We are developing a version called Voyant Notebooks that has a literate programming interface where you can program with Voyant. This will allow you to keep a notebook of your analysis.

Staying in touch

If you want to be kept up to date on VoyantServer you can:

  • Ask to be added to a Google group to which we will send occaisional posts: announcements@voyant-tools.org (Note: this is a broadcast list not a discussion list.)
  • You can follow Voyant on Twitter @VoyantTools

To find and clean texts see:

Finding Texts:

Aggregating and Cleaning Texts:

8.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

Voyant Tools, Teaching Edition, DH2013, Lincoln, Nebraska (July 2013)

0. Introductions (GR – 15′):

Where the instructors and participants introduce themselves and their teaching context.

  • How do you want to use text analysis in your teaching?
  • What do you hope to learn?

1. Brief example of teaching text analysis with Voyant:

Where participants are led through a hands-on introduction to Voyant as if they were students.

Can you guess what text this is?

Introducing Voyant Tools in the Classroom

2. Discussion of example:

Where we discuss how the hands-on example tutorial could work (or not) in an undergraduate class.

3. Models for text analysis:

Where we break into groups that develop models for how they might use text analysis in a course. Some might develop a model for introducing text analysis in a literature course, some in a digital humanities course.

4. Managing the module:

Where we discuss what can go wrong and what learning resources there are.

Questions:

  • What can go wrong when you are teaching text analysis?
  • How can you manage the technology?
  • What learning resources are there?

5. Why bother?:

Where we conclude with a discussion of the place of text analysis in the digital humanities curriculum.

Questions:

  • Why bother teaching students text analysis?
  • Why would students want to learn text analysis?
  • What are the limits to text analysis?
  • What do they need to know to appreciate text analysis?

6. Future directions: Voyant Notebooks

Voyant Notebooks

Text Analysis Methods Grad Workshop

This is a script for a workshop on using Voyant and TAPoR for a graduate class on research methods.

1.0 Introduction

  • Overview
    This workshop will quickly introduce you to computer assisted text analysis using Voyant and TAPoR. Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell and is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. TAPoR is web site for the discovery and review of text analysis tools including those in Voyant.
  • Outline
    In this workshop we will:

    • First, look at how to use a single Voyant tool, Cirrus, with different texts.
    • Then learn how to use the normal skin of Voyant with a single text and then a corpus.
    • Then learn how to load your own text into Voyant.
    • Finally, we will look at TAPoR where you can find other tools.
  • Help
    Remember that the tools entered in TAPoR like Voyant are research tools and will often fail, especially when a whole group of people use it at once. There are multiple versions up if one server is down. If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

  • Voyant Tools

2.0 Preparing a text for a question

The first step in text analysis is to assemble a text to fit your question(s). What do you want to ask about? What sort of text would help you ask questions about an issue? How can you use the internet to build a text?

For this workshop lets assemble a text off the internet.

  • Decide on some aspect of popular culture or computing culture well documented on the internet.
  • Google keywords associated with the subject you want to study.
  • Skim the results and then develop selection criteria for what you want to scrape.
  • Scrape a set of texts using Google.
  • Copy and paste the texts into a text file. Clean out the navigation information and irrelevant parts.
  • Export a text file for text analysis.

For more see Appendix 1: Finding and Preparing an Electronic Text


3.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools.

Go to the Cirrus tool and load up your text: http://voyant-tools.org/tool/Cirrus and load the text.

There are a number of ways to load a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

 

To learn more about the Cirrus tool go to http://docs.voyant-tools.org/tools/ and scroll down to read about Cirrus. Or go to TAPoR 2.0 and read a review.

You can see Cirrus with a text like “Frankenstein” here: http://bit.ly/VoyantCirrusFrankenstein [tempmainbeta]

The Cirrus tool shows you a word cloud of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with the full Voyant reading environment. That’s what we will look at next.

Try It: Now try other tools in Voyant. Go to http://docs.voyant-tools.org/tools and experiment with the tools. Warning some of them are prototypes that won’t work that well. Try your text in different tools.


4.0 Using a Reading Skin

Voyant Tools can also be composed into “skins” that combine tools as panels so that they can be used interactively. Here is the same Frankenstein text and an Austen corpus in a simple skin:

Go to Voyant and load your text into the Reading Skin: http://voyant-tools.org

If you want to see a text in the Reading Skin you can look at Frankenstein: http://bit.ly/VoyantFrankensteinStop [tempmainbeta]

For a corpus see Austen (5 novels): http://bit.ly/VoyantAustenStop [tempmainbeta]

To learn about using the full Reading skin you can go to

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud. Then click on a text in the Word Trends and play with the KWIC.
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.


 

5.0 Other Stuff

Here are some links to other tools, different corpora and skins for specialized tools:


6.0 More Information

Finding Texts:

Aggregating and Cleaning Texts:


7.0 Other Tools

What other tools are there out there? See TAPoR 2.0 for a growing list of tools.

CWRCshop2 (Ryerson): Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community.

1.0 Introduction

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Jane Austen’s Persuasion.

Cirrus (Austen’s Persuasion): http://voyeurtools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple (backup)

The Cirrus tool shows you a word cloud of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting?
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Here are some more Cirrus visualizations to consider:

These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they’re ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In Context (KWIC) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into “skins” that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud. Then click on a text in the Word Trends and play with the KWIC.
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URL for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup version: http://beta.voyant-tools.org/

You will get panel that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

Note that you can create a persistent URL for your corpus – that way your link can be shared or bookmarked and you won’t need to reload the texts into Voyant. Click the save icon in the blue bar at the top and the first URL will be the link for your Voyant corpus.

5.0 Other Stuff

CWRCshop: Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community. Please note that ULRs and resources may no longer be available.

1.0 Introduction

  • The workshop leaders will introduce themselves:
    • Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
    • Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey
    Rockwell. It was previously called “Voyeur” so do not be confused if that name is used. Voyant is the next generation in a series of text analysis
    tools that include HyperPo and TAPoRware. It provides tables and graphs
    related to word use across a single document or a collection. Voyant
    adds, among other things, the ability to handle much larger files than
    the previous tools could.
  • Outline
    In this workshop we will:

    • First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts.
    • Then learn how to use the normal skin of Voyant with a single text.
    • Finally, show how to load your own text into Voyant.
  • Now make sure you can connect to the wireless.
  • Help
    If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley’s Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1317355585427.2492&stopList=stop.en.taporware.txt

For a backup go here:http://voyeur.hermeneuti.ca/tool/Cirrus/ and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt

The Cirrus tool shows you a word cloud of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In Context (KWIC) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into “skins” that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

Frankenstein: http://dev.voyeurtools.org:8080/?corpus=1317355585427.2492&skin=simple&event=corpusTypeSelected

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud. Then click on a text in the Word Trends and play with the KWIC.
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URL for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup older version: http://voyeur.hermeneuti.ca

You will get panel that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

5.0 Other Stuff

Here are some corpora and skins: