Welcome to iscan-server’s documentation!¶
Introduction¶
If you are a new ISCAN user, please go directly to ISCAN Tutorials.
ISCAN (Integrated Speech Corpus ANalysis) is a tool to manage and analyze corpora on a single server with support for multiple clients enriching and querying the corpora as needed. The ISCAN server contains tools to install/start/stop the databases that PolyglotDB (the Python API) uses.
Additionally, the ISCAN server is used to manage multiple different Polyglot databases, so users can isolate corpora as needed, and start and stop databases as they need access to them.
ISCAN Tutorials¶
The ISCAN system is a system for going from a raw speech corpus to a data file (CSV) ready for further analysis (e.g. in R), which conceptually consists of a pipeline of four steps:
- Importing the corpus into ISCAN
- Result: a structured database of linguistic objects (words, phones, sound files).
- Enriching the database
- Result: Further linguistic objects (utterances, syllables), and information about objects (e.g. speech rate, word frequencies).
- Querying the database
- Result: A set of linguistic objects of interest (e.g. utterance-final word-initial syllables),
- Exporting the results
- Result: A CSV file containing information about the set of objects of interest
Preliminaries¶
Access¶
Before you can begin the tutorial, you will need access to log in to the ISCAN server via your web browser. To log in to the McGill ISCAN server via your web browser visit https://roquefort.linguistics.mcgill.ca, press the ‘Log in’ button on the top right of the screen and enter the username and password provided.
NWAV 2018 Workshop: the workshop organizers will give you a username and password.
After NWAV 2018 Workshop: Please contact savanna.willerton@mail.mcgill.ca to request access to one of the ISCAN tutorial accounts.
Questions, Bugs, Suggestions¶
If at any point while using ISCAN you get stuck, have a question, encounter a bug (like a button which doesn’t work), or you see some way in which you believe the user interface could be improved to make usage more clear/smooth/straightforward/etc, then please file an issue on the ISCAN GitHub repository.
There is also a Slack channel you can join if you have quick questions or would like real-time help. Please contact savanna.willerton@mail.mcgill.ca for access.
Dataset¶
These tutorials use a tutorial corpus, which is (as of Oct 15, 2018) a small subset of ICE-Canada containing speech from all “S2A” files (two male Canadian English speakers). These files can be downloaded from the ICE Canada site, but doing so is not necessary for these tutorials!
Tutorial 1: Polysyllabic shortening¶
Motivation¶
Polysyllabic shortening refers to the “same” rhythmic unit (syllable or vowel) becoming shorter as the size of the containing domain (word or prosodic domain) increases. Two classic examples:
- English: stick, sticky, stickiness (Lehiste, 1972)
- French: pâte, pâté, pâtisserie (Grammont, 1914)
Polysyllabic shortening is often – but not always – defined as being restricted to accented syllables. (As in the English, but not the French example.) Using ISCAN, we can check whether a simple version of polysyllabic shortening holds in the tutorial corpus, namely:
- Considering all utterance-final words, does the initial vowel duration decrease as word length increases?
Step 1: Import¶
This tutorial will use the tutorial corpus available for you, available under the title ‘iscan-tutorial-X’ (where X is a number). The data for this corpus was parsed using the Montreal Forced Aligner, with the result being one Praat TextGrid per sound file, aligned with word and phone boundaries. These files are stored on a remote server, and so do not require you to upload any audio or TextGrid files.
The first step of this analysis is to create a Polyglot DB object of the corpus which is suitable for analysis. This is performed in two steps:
- Importing the dataset using ISCAN, using the phone, word, and speaker information contained in the corpus
- Enriching the dataset to include additional information about (e.g., syllables, utterances), as well as properties about these objects (e.g., speech rate)
To import the corpus into ISCAN, select ‘iscan-tutorial-x’ corpus (replacing “x” with the number you’re using) from the dropdown menu under the ‘Corpora’ tab in the navigation bar. Next, click the ‘Import’ button. This will import the corpus into ISCAN and return a structured database of objects: words, phones, and sound files), that will be interacted with in the following steps.
Step 2: Enrichment¶
Now that the corpus has been imported as a database, it is now necessary to enrich the database with information about linguistic objects, such as word frequency, speech rate, vowel duration, and so on. You can see the Enrichment page to learn more about what enrichments are possible, but in this tutorial we will just use a subset.
First, select the ‘iscan-tutorial-x’ under the ‘Corpora’ menu, which presents all of the current information available for this specific corpus. To start enrichment, click the ‘create, edit, and run enrichments’ button. This page is referred to as the Enrichment view. At first, this page will contain an empty table - as enrichments are added, this table will be populated to include each of these enrichment objects. On the right hand side of the page are a list of new enrichments that can be created for this database.
Here, we will walk through each enrichment that is necessary for examining vowel duration to address our question (“Considering all utterance final…”).
Syllables
Syllables are encoded in two steps. First, the set of syllabic segments in the phonological inventory have to be specified. To encode the syllabic segments:
- Select ‘Phone Subset’ button under the ‘Create New Enrichments’ header
- Select the ‘Select Syllabics’ preset option
- Name the environment ‘syllabics’
- Select ‘Save subset’
This will return you to the Enrichment view page. Here, press the ‘Run’ button listed under ‘Actions’. Once syllabic segments have been encoded as such, you can encode the syllables themselves.
- Under the ‘Annotation levels’ header, press the ‘Syllables’ button
- Select Max Onset from the Algorithm dropdown menu
- Select syllabics from the Phone Subset menu
- Name the enrichment ‘syllables’
- Select ‘Save enrichment’
Upon return to the Enrichment view, hit ‘Run’ on the new addition to the table.
Speakers
To enrich the database with speaker information:
- Select the ‘Properties from a CSV’ option
- Select ‘Speaker CSV’ from the ‘Analysis’ dropdown menu. The CSV for speaker information is available for download.
- Upload the tutorial corpus ‘speaker_info.csv’ file from your local machine.
- Select ‘Save Enrichment’ and then ‘Run’ from the Enrichment view.
Lexicon
As with the speaker information, lexical information can be uploaded in an analogous way. Download the Lexicon CSV, for the tutorial corpus, select ‘Lexicon CSV’ from the dropdown menu, save the enrichment, and run it.
Utterances
For our purposes, we define an utterance as a stretch of speech separated by pauses. So now we will specify a minimum duration of pause that separates utterances (150ms is typically a good default).
First, select ‘pauses’ from ‘Annotation levels’, and select ‘<SIL>’ as the unit representing pauses. As before, select ‘Save enrichment’ and then ‘run’.
With the positions of pauses encoded, we are now able to encode information about utterances:
- Under the ‘Annotation levels’ header, select ‘utterances’.
- Name the new addition ‘utterance’
- Enter 150 in the box next to ‘Utterance gap(ms)’
- Select ‘Save enrichment’, and then ‘Run’ in the Enrichment view.
Speech rate
To encode speech rate information, select ‘Hierarchical property’ from the Enrichment view. This mode allows you to encode rates, counts or positions, based on certain hierarchical properties (e.g., utterances, words). (For example: number of syllables in a word; Here select the following attributes:
- Enter “speech_rate” as the property name
- From the Property type menu, select rate
- From the Higher annotation menu, select utterance
- From the Lower annotation menu, select syllable
And then, as with previous enrichments, select ‘Save enrichment’ and then run.
Stress
Finally, to encode the stress position within each word:
- Select ‘Stress from word property’ from the Enrichment view menu.
- From the ‘wordproperty’ dropdown box, select ‘stress_pattern’.
- Select ‘Save enrichment’ and run the enrichment in the Enrichment view.
Step 3: Query¶
Now that the database has been enriched with all of the properties necessary for analysis, it is now necessary to construct a query. Queries enable us to search the database for particular set of linguistic objects of interest.
First, return to the Corpus Summary view by selecting ‘iscan-tutorial-x’ from the top navigation header. In this view, there is a series of property categories which you can navigate through to add filters to your search.
In this case, we want to make a query for:
- Word-initial stressed syllables
- only in words at the end of utterances (fixed prosodic position)
Here, find the selection titled ‘Syllables’ and select ‘New Query’. To make sure we select the correctly positioned syllables, apply the following filters:
Under syllable properties:
- Left aligned with: word
- Under ‘add filter’, select ‘stress’ from the dropdown box, and enter ‘1’ in the text box. This will only select syllables with primary stress in this position.
Under word properties:
- Right aligned with: utterance
Warning
Note that if right alignment with utterances is specified for syllables in this query, this will inadvertently restrict the query to monosyllabic words, as aligning with a higher linguistic type (in this case, utterances) implicitly aligns it to an intermediate linguistic type (in this case, words).
Provide a name for this query (e.g., ‘syllable_duration’) and select ‘Save and run query’.
Step 4: Export¶
This query has found all word-initial stressed syllables for words in utterance-final position. We now want to export information about these linguistic objects to a CSV file. We want it to contain everything we need to examine how vowel duration (in seconds) depends on word length. Here we may check all boxes which will be relevant to our later analysis to add these columns to our CSV file. The preview at the bottom of the page will be updated as we select new boxes:
Under the SYLLABLE label, select:
- label
- duration
Under the WORD label, select:
- label
- begin
- end
- num_syllables
- stress_pattern
Under the UTTERANCES label, select:
- speech_rate
Under the SPEAKER label, select:
- name
Under the SOUND FILE label, select:
- name
Once you have checked all relevant boxes, select ‘Export to CSV’. Your results will be exported to a CSV file on your computer. The name will be the one you chose to save plus “export.csv”. In our case, the resulting file will be called “syllable_duration export.csv”.
Examining & analysing the data¶
Now that the CSV has been exported, it can be analyzed to address whether polysyllabic shortening holds in the tutorial corpus. This part does not involve ISCAN, so it’s not necessary to actually carry out the steps below unless you want to (and have R installed and are familiar with using it).
In R, load the data as follows:
library(tidyverse)
dur <- read.csv('syllable_duration export.csv')
You may need to first install the tidyverse library using install.packages('tidyverse')
. If you are unable to install tidyverse, you may also use library(ggplot2)
instead (note: if you do this, please use subset()
instead of filter()
for the remaining steps).
First, by checking how many word (types) there are for each number of syllables in the CSV, we can see that only 1 word has 4 syllables:
group_by(dur, word_num_syllables) %>% summarise(n_distinct(word_label))
# word_num_syllables `n_distinct(word_label)`
# <int> <int>
# 1 1 109
# 2 2 34
# 3 3 7
# 4 4 1
We remove the word with 4 syllables, since we can’t generalize based on one word type:
dur <- filter(dur, word_num_syllables < 4)
Similarly, it is worth checking the distribution of syllable durations to see if there are any extreme values:
ggplot(dur, aes(x = syllable_duration)) +
geom_histogram() +
xlab("Syllable duration")

As we can see here, there is one observation which appears to be some kind of outlier, which perhaps are the result of pragmatic lengthening or alignment error. To exclude this from analysis:
dur <- filter(dur, syllable_duration < 0.6)
Plot of the duration of the initial stressed syllable as a function of word duration (in syllables):
ggplot(dur, aes(x = factor(word_num_syllables), y = syllable_duration)) +
geom_boxplot() +
xlab("Number of syllables") + ylab("Syllable duration") +
scale_y_sqrt()

Here it’s possible to see that there is a consistent shortening effect based on the number of syllables in the word, where the more syllables in a word the shorter the initial stressed syllable becomes.
Tutorial 2: Vowel formants¶
Vowel quality is well known to vary according to a range of social and linguistic factors (Labov, 2001). The precursor to any sociolinguistic and/or phonetic analysis of acoustic vowel quality is the successful identification, measurement, extraction, and visualization of the particular vowel variables for the speakers under consideration. It is often useful to consider vowels in terms of their overall patterning together in the acoustic vowel space.
In this tutorial, we will use ISCAN to measure the first and second formants for the two speakers in the tutorial corpus, for the following vowels (keywords after Wells, 1982): FLEECE, KIT, FACE, DRESS, TRAP/BATH, PRICE, MOUTH, STRUT, NURSE, LOT/PALM, CLOTH/THOUGHT, CHOICE, GOAT, FOOT, GOOSE. We will only consider vowels whose duration is longer than 50ms, to avoid including reduced tokens. This tutorial assumes you have completed the import and enrichment sections from the previous tutorial, and so will only include the information specific to analysing formants.
Step 1: Enrichment¶
Stressed vowels
First, the set of stress vowels in the phonological inventory have to be specified. To encode these:
- Select ‘Phone Subset’ button under the ‘Create New Enrichments’ header
- Select the ‘Select Stressed Vowels’ preset option
- Name the environment ‘stressed_vowels’
- Select ‘Save subset’
This will return you to the Enrichment view page. Here, press the ‘Run’ button listed under ‘Actions’.
Acoustics
Now we will compute vowel formants for all stressed syllables using an algorithm similar to FAVE.
For this last section, you will need a vowel prototype file. For the purposes of this tutorial, the file for the tutorial corpus is provided here:
Please save the file to your computer.
From the Enrichment View, under the ‘Acoustics’ header, select ‘Formant Points’. As usual, this will bring you to a new page. From the Phone class menu, select stressed_vowels. Using the ‘Choose Vowel Prototypes CSV’ button, upload the ICECAN_prototypes.csv file you saved. For Number of iterations, type 3 and for Min Duration (ms) type 50ms.
Finally, hit the ‘Save enrichment’ button. Then click ‘Run’ from the Enrichment View.
Step 2: Query¶
The next step is to search the dataset to find a set of linguistic objects of interest. In our case, we’re looking for all stressed vowels, and we will get formants for each of these. Let’s see how to do this using the Query view.
First, return to the ‘iscan-tutorial-x’ Corpus Summary view, then navigate to the ‘Phones’ section and select New Query. This will take you to a new page, called the Query view, where we can put together and execute searches. In this view, there is a series of property categories which you can navigate through to add filters to your search. Under ‘Phone Properties’, there is a dropdown menu with search options labelled ‘Subset’. Select ‘stressed_vowels’. You may select ‘Add filter’ if you would like to see more options to narrow down your search.
The selected filter settings will be saved for further use. It will automatically be saved as ‘New phone query’, but let’s change that to something more memorable, say ‘Tutorial corpus Formants’. When you are done, click the ‘Save and run query’ button. The search may take a while, especially for large datasets, but should not take more than a couple of minutes for this small subset of the ICE-Can corpus we’re using for the tutorials.
Step 3: Export¶
Now that we have made our query and extracted the set of objects of interest, we’ll want to export this to a CSV file for later use and further analysis (i.e. in R, MatLab, etc.)
Once you hit ‘Save query’, your search results will appear below the search window. Since we selected to find all stressed vowels only, a long list of phone tokens (every time a stressed vowel occurs in the dataset) should now be visible. This list of objects may not be useful to our research without some further information, so let’s select what information will be visible in the resulting CSV file using the window next to the search view.
Here we may check all boxes which will be relevant to our later analysis to add these columns to our CSV file. The preview at the bottom of the page will be updated as we select new boxes:
- Under the Phone header, select:
- label
- begin
- end
- F1
- F2
- F3
- B1 (The bandwidth of Formant 1)
- B2 (The bandwidth of Formant 2)
- B3 (The bandwidth of Formant 3)
- num_formants
- Under the Syllable header, select:
- stress
- position_in_word
- Under the Word header, select:
- label
- stress_pattern
- Under the Utterance header, select:
- speech_rate
- Under the Speaker header, select:
- name
- Under the Sound File header, select:
- name
Once you have checked all relevant boxes, select ‘Export to CSV’. Your results will be exported to a CSV file on your computer. The name will be the one you chose to save plus “export.csv”. In our case, the resulting file will be called “Tutorial Formants export.csv”.
Step 4. Examining & analysing the data¶
With the tutorial complete, we should now have a CSV file saved on our personal machine containing information about the set of objects we queried for and all other relevant information.
We now examine this data. This part doesn’t use ISCAN, so it’s not necessary to actually carry out the steps below unless you want to.
In R, load the data as follows:
library(tidyverse)
v <- read.csv("Tutorial Formants export.csv")
Rename the variable containing the vowel labels to ‘Vowel’, and reorder the vowels so that they pattern according to usual representation in acoustic/auditory vowel space:
v$Vowel <- v$phone_label
v$Vowel = factor(v$Vowel, levels = c('IY1', 'IH1', 'EY1', 'EH1', 'AE1', 'AY1','AW1', 'AH1', 'ER1', 'AA1', 'AO1', 'OY1', 'OW1', 'UH1', 'UW1'))
Plot the vowels for the two speakers in this sound file:
ggplot(v, aes(x = phone_F2, y = phone_F1, color=Vowel)) +
geom_point() +
facet_wrap(~speaker_name) +
scale_colour_hue(labels = c("FLEECE", "KIT", "FACE", "DRESS", "TRAP/BATH", "PRICE", "MOUTH", "STRUT", "NURSE", "LOT/PALM", "CLOTH/THOUGHT", "CHOICE", "GOAT", "FOOT", "GOOSE")) +
scale_y_reverse() +
scale_x_reverse() +
xlab("F2(Hz)") +
ylab("F1(Hz)")

Tutorial 3: Sibilants¶
Sibilants, and in particular, /s/, have been observed to show interesting sociolinguistic variation according to a range of intersecting factors, including gendered, class, and ethnic identities (Stuart-Smith, 2007; Levon, Maegaard and Pharao, 2017). Sibilants - /s ʃ z ʒ/ - also show systematic variation according to place of articulation (Johnson, 2003). Alveolar fricatives /s z/ as in send, zen, are formed as a jet of air is forced through a narrow constriction between the tongue tip/blade held close to the alveolar ridge, and the air strikes the upper teeth as it escapes, resulting in high pitched friction. The post-alveolar fricatives /ʃ ʒ/, as in ‘sheet’, ‘Asia’, have a more retracted constriction, the cavity in front of the constriction is a bit longer/bigger, and the pitch is correspondingly lower. In many varieties of English, the post-alveolar fricatives also have some lip-rounding, reducing the pitch further.
Acoustically, sibilants show a spectral ‘mountain’ profile, with peaks and troughs reflecting the resonances of the cavities formed by the articulators (Jesus and Shadle, 2002). The frequency of the main spectral peak, and/or main area of acoustic energy (Centre of Gravity), corresponds quite well to shifts in place of articulation, including quite fine-grained differences, such as those which are interesting for sociolinguistic analysis: alveolars show higher frequencies, more retracted post-alveolars show lower frequencies.
As with the previous tutorials, we will use ISCAN to select all sibilants from the imported sound file for the two speakers in the tutorial corpus, and take a set of acoustic spectral measures including spectral peak, which we will then export as a CSV, for inspection.
Step 1: Enrichment¶
It is not necessary to re-enrich the corpus with the elements from the previous tutorial, and so here will only include the enrichments necessary to analyse sibilants.
Sibilants
Start by looking at the options under ‘Create New Enrichments’, press the ‘Phone Subset’ button under the ‘Subsets’ header. Here we select and name subsets of phones. If we wish to search for sibilants, we have two options for this corpus:
- For our subset of ICE-Can we have the option to press the pre-set button ‘Select sibilants’.
- For some datasets the ‘Select sibilants’ button will not be available. In this case you may manually select a subset of phones of interest.
Then choose a name for the subset (in this case ‘sibilants’ will be filled in automatically) and click ‘Save subset’. This will return you to the Enrichment view where you will see the new enrichment in your table. In this view, press ‘Run’ under ‘Actions’.
Acoustics
For this section, you will need a special praat script saved in the MontrealCorpusTools/SPADE GitHub repository which takes a few spectral measures (including peak and spectral slope) for a given segment of speech. With this script, ISCAN will take these measures for each sibilant in the corpus. A link is provided below, please save the sibilant_jane_optimized.praat
file to your computer: Praat script
From the Enrichment View, press the ‘Custom Praat Script’ button under the ‘Acoustics’ header. As usual, this will bring you to a new page. First, upload the saved file ‘sibilant_jane_optimized.praat’ from your computer using ‘Choose Praat Script’ button. Under the Phone class dropdown menu, select sibilant.
Finally, hit the ‘Save enrichment’ button, and ‘Run’ from the Enrichment View.
Hierarchical Properties
Next, from the Enrichment View press the ‘Hierarchical property’ button under ‘Annotation properties’ header. This will bring you to a page with four drop down menus (Higher linguistic type, Lower linguistic type, Subset of lower linguistic type, and Property type) where we can encode speech rates, number of syllables in a word, and phone position.
While adding each enrichment below, remember to choose an appropriate name for the enrichment, hit the ‘save enrichment’ button, and then click ‘Run’ in the Enrichment View.
Syllable Count 1 (Number of Syllables in a Word)
- Enter “num_syllables” as the property name
- From the Property type menu, select count
- From the Higher annotation menu, select word
- From the Lower annotation menu, select syllable
Syllable Count 2 (Number of Syllables in an Utterance)
- Enter “num_syllables” as the property name
- From the Property type menu, select count
- From the Higher annotation menu, select utterance
- From the Lower annotation menu, select syllable
Phone Count (Number of Phones per Word)
- Enter “num_phones” as the property name
- From the Property type menu, select count
- From the Higher annotation menu, select word
- From the Lower annotation menu, select phone
Word Count (Number of Words in an Utterance)
- Enter “num_words” as the property name
- From the Property type menu, select count
- From the Higher annotation menu, select utterance
- From the Lower annotation menu, select word
Phone Position
- Enter “position_in_syllable” as the property name
- From the Property type menu, select position
- From the Higher annotation menu, select syllable
- From the Lower annotation menu, select phone
Step 2: Query¶
The next step is to search the dataset to find a set of linguistic objects of interest. In our case, we’re looking for all sibilants. Let’s see how to do this using the Query view.
First, return to the ‘iscan-tutorial-X’ Corpus Summary view, then navigate to the ‘Phones’ section and select New Query. This will take you to a new page, called the Query view, where we can put together and execute searches. In this view, there is a series of property categories which you can navigate through to add filters to your search. Under ‘Phone Properties’, there is a dropdown menu labelled ‘Subset’. Select ‘sibilants’. You may select ‘Add filter’ if you would like to see more options to narrow down your search.

The selected filter settings will be saved for further use. It will automatically be saved as ‘New phone query’, but let’s change that to something more memorable, say ‘SibilantsTutorial’. When you are done, click the ‘Run query’ button. The search may take a while, especially for large datasets.
Step 3: Export¶
Now that we have made our query and extracted the set of objects of interest, we’ll want to export this to a CSV file for later use and further analysis (i.e. in R, MatLab, etc.)
Once you hit ‘Run query’, your search results will appear below the search window. Since we selected to find all sibilants only, a long list of phone tokens (every time a sibilant occurs in the dataset) should now be visible. This list of sibilants may not be useful to our research without some further information, so let’s select what information will be visible in the resulting CSV file using the window next to the search view. G Here we may check all boxes which will be relevant to our later analysis to add these columns to our CSV file. The preview at the bottom of the page will be updated as we select new boxes:

- Under the Phone header, select:
- label
- begin
- end
- cog
- peak
- slope
- spread
- Under the Syllable header, select:
- stress
- Under the Word header, select:
- label
- Under the Utterance header, select:
- speech_rate
- Under the Speaker header, select:
- name
- Under the Sound File header, select:
- name
Once you have checked all relevant boxes, click the ‘Export to CSV’ button. Your results will be exported to a CSV file on your computer. The name will be the one you chose to save for the Query plus “export.csv”. In our case, the resulting file will be called “SibilantsTutorial export.csv”.
Step 4: Examining & analysing the data¶
With the tutorial complete, we should now have a CSV file saved on our personal machine containing information about the set of objects we queried for and all other relevant information.
We now examine this data. This part doesn’t use ISCAN, so it’s not necessary to actually carry out the steps below unless you want to.
First, open the CSV in R:
s <- read.csv("SibilantsTutorial export.csv")
Check that the sibilants have been exported correctly:
levels(s$phone_label)
Change the column name to ‘sibilant’:
s$sibilant <- s$phone_label
Check the counts for the different voiceless/voiced sibilants - /ʒ/ is rare!
summary(s$sibilant)
# S SH Z ZH
# 2268 187 1296 3
Reorder the sibilants into a more intuitive order (alveolars then post-alveolars):
s$sibilant <- factor(s$sibilant, levels = c('S', 'Z', 'SH', 'ZH'))
Finally, plot the sibilants for the two speakers:
ggplot(s, aes(x = factor(sibilant), y = phone_peak)) +
geom_boxplot() +
xlab("Spectral Peak (Hz)") +
ylab("sibilant") +
scale_y_sqrt() +
facet_wrap(~speaker_name)

Tutorial 4: Custom scripts¶
Often in studies it is necessary to perform highly specialized analyses. As ISCAN can’t possibly provide every single analysis that anyone could ever want, there is a way to perform analyses outside of ISCAN, and then bring them in. This is the purpose of the ‘Custom Properties from a Query-generated CSV’ enrichment. Using it is relatively straightforward, although it requires some prelimanary steps to get the data in the right format before using. It also requires access to the original sound files of a corpus if you wish to use these in your analysis.
In this tutorial we will be using an R script, but you can use any script or software that you so choose.
Step 1: Necessary Enrichments¶
The only necessary enrichment to do in this tutorial is to encode a sibilant subset of phones. To do this, start at the ‘iscan-tutorial-X’ corpus summary view, and click on the ‘Create, run and edit enrichments’ button in the centre column. Then, click on ‘Phone Subset’.
At the enrichment page, click on the select sibilants button, then name the subset ‘sibilants’ and save the subset.
Finally, at the main enrichment page, run the sibilant enrichment.
Step 2: Running a phone query¶
Now that we have all the enrichments we need, we can go to the Query View.
Starting at the ‘iscan-tutorial-X’ corpus summary view, navigate to the phones section of the left-most column and click “new Phone Query”. From there, the you’ll have the option to choose various different filters to select a subset of phones. For this tutorial we’re looking at sibilants, so all you need to do is select the sibilants subset from the first drop-down menu from the centre menu.
Feel free to also re-name the query to anything you’d like, for example ‘sibilant ID query’. From there, click on ‘Run query’ and wait for the query to finish.
Step 3: Exporting phone IDs¶
Once the query has finished, a new pane will appear to the right of the window. This pane will contain a list of different properties of the phones found, and properties of the syllables, words, and utterances that a phone is in. These are the columns that will be included in the CSV that you will download from ISCAN.
For the script, we will need a couple different columns.
- Under the Phone header, select:
- label
- begin
- end
- id
- Under the Sound File header, select:
- name
Once all these columns have been selected, click the “Generate CSV Export File” in the row of buttons in the centre of the screen, above the results of the query. This may take a second or two to run, then once it’s available, click on the “Save CSV export file” and save the file somewhere convenient on your computer.
An important thing to note for this section is that while you can rename columns for export, you should not rename the ID column if you intend on importing this CSV later. By default, a phone ID column will be labeled “phone_id”. When importing, ISCAN looks for a column that ends with “_id” and then uses the first half of the name of that column to know what these IDs represent(in this case, phones). You also should not have multiple ID columns in your import CSV, although if you do, ISCAN will use the first ID column only.
Step 4: Running the R script¶
The script and associated files can be downloaded here. This script estimates spectral features in R (Reidy, 2015).
In order to get this script running on your computer, you will have to make a few minor edits once you have extracted the ZIP file. Open up ‘iscan-token-spectral-demo.R’ in your text editor.
At the top of the file, there will be two file paths defined. Change ‘sound_file_directory’ to the file path of where you have the sound files of the tutorial corpus. Then, change ‘corpus_directory’ to be the file path of the CSV that you downloaded from ISCAN. I have it as a relative path, but you can of course make it an absolute path.
sound_file_directory <- "/home/michael/Documents/Work/test_corpus"
corpus_data <- read.csv("../sibilants_export.csv")
This script also assumes you have not renamed any of the columns that you exported. If you did change any columns’ name, you will have to look through the script and change the following lines to the names of the corresponding columns.
sound_file <- paste(corpus_data[row, "sound_file_name"], '.wav', sep="")
begin <- corpus_data[row, "phone_begin"]
end <- corpus_data[row, "phone_end"]
Finally, you can run the script in R, and it will create a new CSV file, ‘spectral_sibilants.csv’ that we will import to ISCAN.
Step 4: Importing the CSV¶
Back in ISCAN, go to the enrichments page for your tutorial corpus.
Under ‘Annotation properties’, click on the ‘Custom Properties from a Query-generated CSV’ button.
From this page, click on the “browse” button and navigate to the ‘spectral_sibilants.csv’ generated in the last step. Select it for upload, then click on “Upload CSV”. This may take a second or two, so be patient.
After this, a new list of properties will appear which come from the columns of the CSV. Scroll down, and select all the features that start with ‘spectral’.
Then, click ‘save enrichment’, and from the main enrichment page, run the enrichment labelled ‘Enrich phone from “spectral_sibilants.csv”’.
Now you’re done! ISCAN will now have all of the values calculated by the R script associated with all the sibilants in the corpus. You can test this out by going to the phone query you created earlier. You should see all these new properties in the column selection pane, although you may need to click “Refresh Query” before the values appear.
Enrichment View¶
Databases can be enriched by encoding various elements. Usually, the database starts off with just words and phones, but by adding enrichments a diverse range of information will become available to the user for searching with the Query View later. All enrichments are added in the Enrichment View. Here are some details about this View.
Actions Once an enrichment has been saved, a number of possible actions will become available for it. The actions buttons are available as a column in the table visible at the top of the Enrichment View
- Run If the enrichment is runnable, the user may run it to apply this enrichment to the corpus
- Reset The user may reset an enrichment to remove changes added to the corpus by running that enrichment. The state of the database will be as it was before running this enrichment
- Edit The user may make changes to the saved enrichment. For the changes to be applied to the corpus, hit Run again after editing
- Delete This has the same effect as Reset but also removes this enrichment from the Enrichment View. To re-run a deleted enrichment, the user must start over and create a new enrichment
Phone subset Here the user may encode a subset of phones, such as vowels, stops, or sibilants, by selecting from the table which phone labels to include. All phones belonging to this subset will be labeled as such in the corpus
Hierarchical property This section allows the user to encode properties involving two levels, such as number of syllables in each utterance, or rate of syllables per second
- Higher annotation Upper level property in hierarchy
- Lower annotation Lower level property in hierarchy
- Property type target property such as rate, count, or position
Example: To encode the number of phones per word, set Higher annotation to word, Lower annotation to phones, and Property type to count.
Stress from word property Enriches syllables with stress information (‘stressed’ or ‘unstressed’, coded as ‘1’ or ‘0’) via a listing for each word in the lexical enrichment CSV
Properties from a CSV Here the user can import information to the corpus using CSV files saved locally
- Lexicon CSV Allows the user to assign certain properties to specific words using a CSV file. For example the user might want to encode word frequency. This can be done by having words in one column and corresponding frequencies in the other column of a column-delimited text file.
- Speaker CSV Allows the user to enrich speakers with information by adding speaker metadata, such as sex and age, from a CSV
- Phone CSV Allows the user to add certain helpful features to phonological properties. For example, adding ‘fricative’ to ‘manner_of_articulation’ for some phones
- Sound File CSV Sound file properties may include notes about noise, recording environment, etc. Not relevant for ISCAN
Acoustics Here the user may encode pitch, formants, intensity, FAVE-style point formants, and more into the corpus. Custom praat scripts may be imported here to perform other types of analyses
Pauses This allows the user to specify for a given database what segments should count as pauses instead of speech. These are typically among the most common words in a corpus, so top 25 words are provided (25 word default may be changed). If not found, custom pause labels may also be entered in the search bar
Utterances Define utterances here as segments of speech separated by a gap of a specified length (typically between .15-.5 seconds)
Syllables If the user has encoded syllabic segments, syllables can now be encoded using a syllabification algorithm (e.g. maximum attested onset)
Administration and installation¶
Note
At the moment, iscan-server is only supported on Ubuntu.
Warning
Running analyses via script currently do not work with the docker installation. If you plan on writing/running automated scripts, rather than interacting with the data via the web interface, use the non-Dockerized version.
Much of this documentation is meant to be more technical and is geared towards advanced users interested in setting up their own ISCAN servers, either on a desktop computer for local use or on a dedicated server for communal use. Please see the ISCAN Tutorials section for more information and a walk-through on how to use ISCAN once it has been set up. Please see the troubleshooting section below for a continuously-updated set of solutions to common installation or maintenence issues.
Installation via Docker¶
Note
Currently only Ubuntu is a supported and tested system. We hope to support other operating systems in the future, but please bear in mind that things may not work. If you are on Windows 10, you can install the Linux subsystem. ISCAN servers are known to run using the Bash shell in Windows 10, but the Docker does not work on the Linux subsystem for Windows. If this is the only option for you, please see Installation without using Docker.
Warning
Running analyses via script currently do not work with the docker installation. If you plan on writing/running automated scripts, rather than interacting with the data via the web interface, use the non-Dockerized version.
Prerequisites¶
ISCAN server uses Docker. This containerization means that the only dependency the user must install is Docker itself.
Preparing Docker¶
First, install Docker for Ubuntu. It is easiest to install using the Install from repository method.
Next, complete Docker for Ubuntu’s postinstallation instructions. This will make it unnecessary to prepend Docker commands with sudo
.
Then, install Docker Compose, the tool for defining and running multi-container Docker applications.
Installation¶
First, clone the iscan-spade-server repository to your machine:
git clone https://github.com/MontrealCorpusTools/iscan-spade-server.git
Included are a Dockerfile
and a docker-compose.yml. In order to build a Docker image from these files,
navigate to the root of the repository and run:
docker-compose build
Then, run:
docker-compose up
This will launch the containers.
Initial migrations¶
The first time you use iscan-spade-server, you will need to make database migrations. In another terminal, while the containers are up, run:
docker-compose run app init
The needed migrations to perform will be detected and made.
Superuser creation¶
The first time you use iscan-spade-server, you will need to set up a username and password to log in with. In another terminal, while the containers are up, run:
docker-compose run app python3 manage.py createsuperuser
This will begin a prompt that asks you for a username, email address, and password. Once you have filled them out, the prompt will close.
Then, you should be able to log in with your credentials. You should only need to perform this step once; from now on,
whenever you start the server, you should be able to log in with your defined username and password. When finished,
press Ctrl+C
to end the current server run.
Use & workflow via Docker¶
Starting and stopping the server¶
To start the server and its containers, run:
docker-compose up
In your web browser, navigate to localhost:8080
. You should see the ISCAN web page.
To stop the server, press Ctrl+C
only once. The terminal should show a Gracefully stopping...
message and then exit.
Mounted volumes¶
This Docker instance is configured so that the contents of certain directories persist between runs of the server, and so that contents are constant between the local directory and the directory in the container. These local directories, located in the root of the repository, are:
polyglot_source/
- the directory containing corpora to be loaded.polyglot_data/
- the directory where corpus metadata will be storediscan/
- the directory where the front-end code is storediscan_server/
- the directory containing the Django project for the server
Changes you make locally in these folders should persist into the container without needing to re-build the Docker image.
Cleaning¶
The docker-compose up
command usefully regenerates fresh containers each time it is run, but old containers can
take up space. To clean up containers on your machine, first stop all of them:
docker stop $(docker ps -a -q)
Then, remove them:
docker rm $(docker ps -a -q)
Installation without using Docker¶
Note
Currently only Ubuntu is a supported and tested system. We hope to support other operating systems in the future, but
please bear in mind that things may not work. If you are on Windows 10, you can install the Linux subsystem.
ISCAN servers are known to run using the Bash shell in Windows 10. Also note that sudo
ability is required for installation
and running services that ISCAN depends upon.
Prerequisites¶
ISCAN servers have the following prerequisites.:
- Python 3
- Java 8
- RabbitMQ
- SQL database (PostGreSQL recommended)
- NodeJS and NPM
- Praat
- Reaper (optional)
Java 8¶
To install Java 8 on Ubuntu, you can install the Oracle version via:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
RabbitMQ¶
For working with asynchronous tasks, a message queue is needed, RabbitMQ is the default, installed as follows
sudo apt-get install rabbitmq-server
sudo service rabbitmq-server start
See https://simpleisbetterthancomplex.com/tutorial/2017/08/20/how-to-use-celery-with-django.html#installing-rabbitmq-on-ubuntu-1604 for more details.
Relational Database¶
ISCAN server can use a SQLite database, but in general a PostGreSQL database is recommended. It can be installed via:
sudo apt-get install postgresql postgresql-contrib libpq-dev
sudo service postgresql start
The database will have to be set up with a user/password as well, see https://www.digitalocean.com/community/tutorials/how-to-use-postgresql-with-your-django-application-on-ubuntu-14-04 for more instructions.
Praat¶
ISCAN requires the barren version of the Praat executable to run its acoustic analysis scripts (no graphical interface).
You can get this from http://www.fon.hum.uva.nl/praat/download_linux.html.
Once you extract it, make sure that the command praat
points to this executable (either through an alias or renaming
praat_barren
and making sure the parent directory is included in $PATH
)
Reaper¶
Reaper is a program for pitch analysis, so you’ll only need to install it if you want to use Reaper’s pitch estimation in place of Praat’s.
Follow the instructions on Reaper’s GitHub repository (https://github.com/google/REAPER) to install it and put the resulting executable somewhere on the system path so that Polyglot can find it easily.
AutoVOT¶
AutoVOT is a program for automatically calculating Voice Onset Times(VOT). It’s necessary in order to run any enrichment related to VOTs
Follow the instructions on AutoVOT’s GitHub repository (https://github.com/mlml/AutoVOT) to install it and put the resulting executable somewhere on the system path so that Polyglot can find it easily.
NodeJS¶
Installation of the front end JavaScript and dependencies is handled by NPM, which is installed as follows:
sudo apt-get install nodejs npm
Installation¶
Start by cloning the GitHub repository
git clone https://github.com/MontrealCorpusTools/iscan-server.git
Once there, look in the iscan-server/iscan_server/settings
directory and create a file named local_settings.py
.
Add the following to it, replacing any paths with relevant paths for your system, as well as information for the PostGreSQL database (i.e., whatever database name, user name and password you used when setting up the PostGreSQL database):
SOURCE_DATA_DIRECTORY = '/path/for/where/corpora/should/be/loaded/from'
POLYGLOT_DATA_DIRECTORY = '/path/to/store/all/polyglot/data'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'database_name',
'USER': 'user_name',
'PASSWORD': 'password',
'HOST': 'localhost',
'PORT': '5433',
}
}
From the root of the server directory, install all of the server’s dependencies:
pip install -r requirements.txt
For development, getting the latest version of PolyglotDB is recommended via:
pip install https://github.com/MontrealCorpusTools/PolyglotDB/archive/master.zip
Then set up the server’s database:
python manage.py migrate
To install all of the JavaScript dependencies for the front end, run:
npm install
To generate a superuser admin account for the server:
python manage.py createsuperuser
In a separate terminal, start the celery process (from the root of the iscan-server repository):
celery -A iscan_server worker -l info
Finally, run the server:
python manage.py runserver 8080
Server administration¶
This page is to be used a reference for common tasks in administration of an ISCAN server. The workflow for many of these are not currently as streamlined as they would ideally be.
Updating ISCAN and PolyglotDB¶
As ISCAN and PolyglotDB are both under active development, updating to the latest version is necessary to fix issues that crop up. To perform an update, run the command:
docker-compose run app update
Which will then fetch the latest changes from the GitHub repositories of both packages.
Getting a tutorial corpus¶
At the moment, ISCAN hosted on https://roquefort.linguistics.mcgill.ca uses an aligned version of ICE-Canada corpus.
However, this tutorial corpus is not distributable currently, so we plan to make another one based on a subset of LibriSpeech
available in the near future. For the purpose of adding tutorial corpora for each user to run through tutorials, any
smaller corpus aligned using the Montreal Forced Aligner will work once renamed to spade-tutorial
and added to
the system.
Adding new corpora¶
To add any new corpus, simply put its folder in the polyglot_source
directory (see Mounted volumes for its location
in the Docker installation). When not using Docker, this is configuration setting in the local_settings.py
of the Django
configuration (SOURCE_DATA_DIRECTORY
in the non-Docker Installation).
Once the corpus is in the directory, performing a reload on the home page of the ISCAN server will update ISCAN’s records of databases and corpora. If the corpus does not use forced aligned TextGrids as transcripts, then the corpus format will have to be changed in the admin page (i.e. go to https://hostname.com/admin/iscan/corpus/, select the corpus and select the appropriate supported format from the Input format dropdown).
Supported formats¶
The primary supported format is the output TextGrids from commonly used forced aligners:
In addition, the following formats are supported as well:
- TextGrid output from LaBB-Cat
- BAS Partitur formatted files
- Buckeye Speech Corpus
- TIMIT
In regards to how the corpus should be structured, where possible files should be divided into speaker directories. If the force aligned TextGrid has speaker information (i.e., word and phone tiers for multiple speakers), this is not necessary.
Creating new users¶
User creation is currently the Django admin interface (i.e., https://hostname.com/admin/auth/user/). In addition to specifying their username and password, you can also make them superusers through this page (giving them access to the admin site and all corpora/databases).
Modifying user permissions¶
More specific per-corpus permissions can be given to uses through the Django admin interface as well (i.e., https://hostname.com/admin/iscan/corpuspermissions/). When creating one of these, you can select the user, the corpus, and whether to grant the following permissions:
- Can edit: allows the user to edit and correct aspects of the corpus data, such as acoustic measurements like pitch/formants/ tracks or time boundaries of segments/words (this functionality is currently not implemented in ISCAN)
- Can annotate: allows the user to add their own annotations to linguistic items, within an annotation framework specified in the admin interface (this functionality isn’t fully featured yet, and has primarily only been implemented for annotating utterances)
- Can view annotations: allow the user to see annotations in the corpus
- Can listen: allows the user to play audio on the corpus detail page
- Can view detail: allows the user access to the query detail view, otherwise the user can query but not see the full context of each result
- Can enrich: allows the user to create/run/reset/delete enrichments
- Can access database: allows the user to start/stop the database for this corpus
Reporting errors and issues¶
Some issues can be worked around in the admin interface. For instance, running an enrichment locks the corpus as busy
,
which can cause issues with rare exceptions during their running to cause the corpus to become locked. This busy
status
can be fixed by changing this property on the admin page for that corpus object.
Additionally, databases can be reset to their original non-imported state by deleting the database on the admin page for databases (i.e., https://hostname.com/admin/iscan/database/).
If any issues are encountered, please post them along with the exception message found either in the runserver window or the celery window to the GitHub issues page.
Apache server configuration¶
The current recommended configuration is to have a forward facing web server (Apache/NGINX) proxy pass to a locally running server (Gunicorn/Django dev server). The following all assumes that there is a locally running server running on port 8080, and uses Apache as the example configs listed below, as well as the hostname https://roquefort.linguistics.mcgill.ca/.
Additionally, since there is sensitive data involved, we heavily recommend using HTTPS rather than HTTP.
Note
All commands assume Ubuntu 16.04. Commands may differ depending on other operating systems.
Enabling prerequisite Apache modules¶
sudo apt-get install apache2
sudo service apache2 stop
sudo a2enmod rewrite
sudo a2enmod ssl
sudo a2enmod proxy
sudo a2enmod proxy_http
HTTP server config¶
The HTTP server config uses the rewrite module to change any HTTP requests into HTTPS ones, so that there is never any use
of http://roquefort.linguistics.mcgill.ca/ over https://roquefort.linguistics.mcgill.ca/. The following config
would be saved to a file named roquefort.linguistics.mcgill.ca.conf
in /etc/apache2/sites-available/
.
<VirtualHost *:80>
ServerName roquefort.linguistics.mcgill.ca # Update for other hostname
ServerAdmin webmaster@localhost
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
RewriteEngine on
RewriteCond %{SERVER_NAME} =roquefort.linguistics.mcgill.ca # Update for other hostname
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
</VirtualHost>
Enable the site via:
sudo a2ensite roquefort.linguistics.mcgill.ca.conf
HTTPS server config¶
The primary configuration file for the Apache server is the HTTPS one below. SSL certificates are easily generated through Let’s encrypt.
The Proxy module for Apache is used to forward all requests to the locally running ISCAN server. The following config
would be saved to a file named roquefort.linguistics.mcgill.ca-ssl.conf
in /etc/apache2/sites-available/
.
<IfModule mod_ssl.c>
<VirtualHost *:443>
ServerName roquefort.linguistics.mcgill.ca # Update for other hostname
ServerAdmin webmaster@localhost
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
SSLCertificateFile /etc/letsencrypt/live/roquefort.linguistics.mcgill.ca/fullchain.pem # Update for actual location
SSLCertificateKeyFile /etc/letsencrypt/live/roquefort.linguistics.mcgill.ca/privkey.pem # Update for actual location
Include /etc/letsencrypt/options-ssl-apache.conf
<Location "/">
ProxyPass http://localhost:8080/
ProxyPassReverse http://localhost:8080/
ProxyPreserveHost On
RequestHeader unset X-Forwarded-Proto
RequestHeader set X-Forwarded-Proto https env=HTTPS
</Location>
</VirtualHost>
</IfModule>
Enable the site via:
sudo a2ensite roquefort.linguistics.mcgill.ca-ssl.conf
Once the configuration files are set up, the Apache server can be rebooted via:
sudo service apache2 restart
Troubleshooting common issues¶
Connection refused¶
If you run into an error where a SPADE server returns the following connection error:
Traceback (most recent call last):
File "/home/linguistics/XXX/miniconda3/lib/python3.6/site-packages/neobolt-1.7.0rc5-py3.6-linux-x86_64.egg/neobolt/direct.py", line 793, in _connect
s.connect(resolved_address)
ConnectionRefusedError: [Errno 111] Connection refused
- First check that your webserver is running. Assuming you are connecting via localhost, run:
telnet localhost 8080
or
netstat | grep 8080
And you should see that port 8080 is listening. If either of these are not working, run:
python manage.py runserver 8080
from the root of your iscan-spade-server
directory.
- Check Neo4j is listening on the right ports. This can be checked at
iscan-spade-server/polyglot_data/CORPUS/neo4j.log
. Specifically, ISCAN Neo4j should be using 7400 (compared to a default Neo4j install which uses 7474 and 7687).
In this case, the easiest thing to do is to reset the database. Inside your SPADE
repository, run:
python reset_database.py CORPUS
Where CORPUS
refers to the particular corpus you are trying to use. Which will delete the database files from your polyglot_data directory. If you then run:
python SPADE_SCRIPT CORPUS -r
This will rebuild your database from scratch.
Neo4j PID error on stopping¶
Sometimes there may be issues in how ISCAN tracks the Polyglot databases. It does this by storing the PID from the system for both Neo4j and InfluxDB. Sometimes however, these can be empty but the database is started and the database processes are running. As a workaround for this issue, you can reset the database fully by doing the following:
- Stop the Neo4j process (
kill -9 <pid>
, where<pid>
can be gotten from aps -l
command ortop -n 5
command, the neo4j will be listed asjava
) - Navigate to the ISCAN admin page (locally http://localhost:8080/admin/)
- Log in with a superuser’s credentials
- Click on
Databases
underISCAN
- Click on the database name that is having issues
- Change the status to
Stopped
- Retry whatever operation was having issues previously (either start the database again or run the script).
Moving parts¶
The ISCAN server uses several components, which are all managed by Docker (see Installation via Docker and Use & workflow via Docker for more information). The components are:
- A web interface, with which the user can manage and analyze corpora, written in Angular
- A Python API, PolyglotDB, which communicates between the web interface and the back-end web framework, using Django
- A relational database, which manages metadata about Polyglot databases, using PostgreSQL
- Message and job queues for working with asynchronous tasks, using RabbitMQ and Celery