Chris_2012.03.14 - The Language Archive

advertisement
User guide
Executing recognizers under ELAN 4.1.2.
Christopher Rosenthal
Przemyslaw Lenkiewicz
Index
1. Setup
1.1 Install ELAN 4.1.2.
1.2 Install GuiSkin
1.3 ELAN extensions
1.4 Create your personal Work Folder
2. ELAN Audio Recognizers
2.1 ELAN audio recognizers: What they can do.
2.2 ELAN audio recognizers: Using them
2.3 IAIS 02: Standard homogeneous segmentation
2.4 IAIS 03: Fine audio segmentation
2.5 IAIS 04: Speech/ Non - Speech detection based on
pre-trained acoustic models
2.6 IAIS 06: Speaker diarization
2.7 IAIS 01: Silence detection
3. ELAN Video Recognizers
3.1 ELAN video recognizers: What they can do.
3.2 Skin Color Estimation with GuiSkin
3.3 Shot Segmentation with ELAN
3.4 Skin Color Estimation with ELAN
3.5 Using hands and head tracking recognizer
2
1. Setup
1.1 Install ELAN 4.1.2.
1.2 Install GuiSkin
1.3 ELAN extentions
1.4 Create your personal Work Folder
The recognizers are not able to access anything on your local hard disk, nor to save anything
there. As they are executed on MPI servers, they can to access and write files that are also
on a server.
As most of the media files are already uploaded to the MPI Archive, it is in the right place for
the recognizers and it can be processed directly in their location1. For any other files that
you want to process, and for storing the recognizers’ output files, we will create your
personal work folder.
Go to your ‘Computer’ and click Map network drive.
If some files are not accessible (you get an error when trying to process them)
please contact Przemek Lenkiewicz or Eric Auer.
1
3
Choose any drive letter for the new drive (in the example below X is chosen) and type the
following in the Folder field:
Figure 1. Mapping a network drive
\\Sun4\{your MPI user name}
This path leads to your home folder on the MPI network infrastructure, every MPI user has
one. We will create your work folder there.
4
Figure 2. Asking for credentials
If you are asked for credentials (See Figure 2) provide your MPI username and password.
Make sure you see MPI_NL before your username.
Now open your newly created drive and create a folder named Recognizers. Any name is
allowed, we will just keep using Recognizers in this example.
Once created, right click on Recognizers folder, choose PROPERTIES. Choose the SECURITY
tab and then under GROUP OR USER NAMES choose EVERYONE and click EDIT.
5
6
Again make sure that EVERYONE is selected, then click Full Control (all checkboxes should
be selected after that) and click OK.
7
This time select your home folder (the one with you user name), that we have just mapped.
In this example this is the x:\ drive. Again right click and choose Properties.
8
Go to the Security tab and select Everyone. Click Edit.
Make sure that Everyone is selected. Click on Read & execute checkbox (the following two
should also become selected then) and click OK.
9
2. ELAN audio recognizers
2.1 ELAN audio recognizers: What they can do.
ELAN audio recognizers offer automatic assistance in analyzing audio files containing
speech. Audio files can be segmented into meaningful segments, respectively speech/ no
speech segments, silence/ no silence segments and it can conduct a speaker diarization.
Ideally, audio files fulfill the following characteristics:
-
All subjects speak clearly, at the same level of volume
There is no overlap between the subject's conversation
Background noise is reduced to a minimum
2.2 ELAN audio recognizers: Using them
To start using the audio recognizers, open ELAN, go to -> FILE -> NEW. Open the desired
audio file by moving it into the Selected Files menu (seen on the picture below, on the right).
Then click OK.
You cannot process a video file with audio recognizers directly. You need to have the video
file and also a separate WAV file with the audio track. It is a common practice to obtain both
files when digitizing the recordings at MPI. When processing such video recording, make
sure to include both files (audio and video) in the Selected Files window.
10
To let ELAN create audio segmentations within an audio file the program offers 2
recognizers: 1) the IAIS 02: Standard homogeneous segmentation and 2) IAIS 03: Fine audio
segmentation.
Both of them can be used for audio segmentation, however, the IAIS 03 gives the user more
control to fine-tune the results.
2.3 IAIS 02: Standard homogeneous segmentation
Chose the IAIS 02- Standard homogeneous segmentation recognizer from the Audio
Recognizer dropdown list. Define the output file, where the results of the segmentation will
be stored. The file has to be located in a folder, which is accessible with write permissions
for the recognizer, so e.g. the Recognizers folder that has been created in Section 1.4. The
name of the file needs to end with ".xml". Try to give this file a meaningful name, like:
OUT [xml] Tier holding the standard segmentation:
X:\recognizers\output.audio.standard.segmentation.xml
As this file will later be reused.
Click on START and let ELAN define segments within your audio file.
11
Figure 3. IAIS 02: Standard homogeneous segmentation
2.4 IAIS 03: Fine audio segmentation
The IAIS 1- Fine audio segmentation works in a similar way to the IAIS 02 recognizer. The
difference is that is offers the user the chance to modify a parameter, which controls the
sensitivity of the segmentation process.
Chose the fine audio segmentation and Define the output file, where the results of the
segmentation will be stored.
OUT [xml] Tier holding the fine segmentation:
E.g. X:\recognizers\output.audio.fine.segmentation.xml
Use the slider to tune the sensitivity of the recognizer and the size of the resulting segments.
Choose whether the recognizer should perform merging of the results. This step will merge
the resulting segments that are neighbors and have high similarity.
Click on START and let ELAN define segments within your audio file.
12
2.5 IAIS 04: Speech/ Non - Speech detection based on pre-trained
acoustic models
To detect what parts of the recordings contain human speech, change the recognizer to IAIS
2- Speech/ Non - Speech detection. Within the parameter section fill in the input file name,
which contains the segmentation information you have created earlier, and output file
name, which will hold the speech/no speech results. For example:
IN [xml]: Result of the segmentation with optional manual labels for training:
X:\recognizers\output.audio.segmentation.xml
OUT [xml]: Tier holding the speech/non-speech segmentation:
E.g. X:\recognizers\output.audio.speech.xml
Then click -> Start
13
2.6 IAIS 06: Speaker diarization
The speaker diarization recognizer will try to assign each segment of the recording to
respective speaker. So it should detect the number of speakers in the recording and detect
who is speaking when.
Change the recognizer to IAIS 06: Speaker diarization. Within the parameter section fill in
the input file name, which contains the speech/non-speech information you have created
earlier, and output file name, which will hold the results of speaker diarization. For
example:
IN [xml]: Result of one of the speech/non-soeech recognizers:
X:\recognizers\output.audio.speech.xml
OUT [xml]: Tier holding the speaker diarization:
E.g. X:\recognizers\output.audio.speaker.xml
Then click -> Start
14
2.7 IAIS 01: Silence detection
The silence detection recognizer detects audio fragments which contain no speech and
separates them from fragments that contain spoken language (or rather noise?).
Change the recognizer to IAIS 01: Silence detection and define the output file, where the
results of the silence detection should be stored.
OUT [xml] Tier holding the silence segmentation:
E.g. X:\recognizers\output.audio.silence.detection.xml
Use the sliders to tune the sensitivity of the recognizer.
The recognizers seem to work best with the following parameters, respectively:
Minimal time in ms for a silent segment: around 125
Minimal time in ms for a non-silent segment: around 140
Maximal sample value to be recognized as silence in percent: around 1,1
After calibrating the parameters click -> Start.
15
3. ELAN video recognizers
3.1 ELAN video recognizers: What they can do.
ELAN video recognizers offer automatic assistance in analyzing video files regarding hand
movement. The video recognizers can track and record hand movement, respectively left
hand movement, right hand movement, hands joining and head/hand overlap.
Video files and specifically the subjects in them need to fulfill the following characteristics:
-
16
The subject's skin color has to be different to the background color
The color of the subject's clothing has to be different to his/her skin color
The subject's chest has to be covered by clothing (which color is different to the skin
color)
The skin of face and hands have to be fully visible
The subject's face and body need to face the camera
The upper body and face have to be fully visible
Requirements fulfilled., e.g.
-
-
Skin color differs from
clothing and background
color
Upper body is fully
visable and facing the
camera
Requirements not fulfilled, e.g.
-
-
Color of clothing of left
subject does not
sufficiently differ from
skin
Face of right subject is
not facing the camera
ELAN video recognizers require an estimation of skin color beforehand, manually. This can
be done with GuiSkin.
3.2 Skin Color Estimation with GuiSkin
Skin Color Estimation with GuiSkin
17
The Skin Color Estimation App is a graphical way to estimate skin color in a video by
manually selecting the best intervals in the YUV domain. This step has to be done before the
video file can be successfully processed by Hands and Head tracking recognizers.
1) Open GuiSkin and import a video (-> FILE -> LOAD VIDEO)
2) Use the color intervals to mark all skin with "blue ink" and make sure no background is
marked
Advise: GuiSkin has 6 sliders: Y for the brightness, U and V for the colour components. First
three sliders represent the mid-point of the selected range, the last three sliders represent
the size of the selected range. U usually ranges between 80 and 130, V between 125 and
175.
When estimating the color, you will notice this step involves trial and error. Our advice is to
start with the second slider and observe the changes in the image. When a reasonable
amount of skin color is highlighted, use the 5th slider to adjust the range for this value. The
two remaining sliders can serve to limit the amount of non-skin pixels highlighted.
Sometimes it cannot be avoided that some parts in the background will be marked as well.
Priority should always be that all skin is marked. Non-moving background can be cancelled
out by the recognizers later on.
18
Once done calibrating, press SAVE RESULTS and save the given XML file in your work
folder from section Error! Reference source not found.. (or save it anywhere and then copy
or move to your work folder).
3.3 Shot Segmentation with ELAN
Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and
press OK.
To your upper right, click on VIDEO RECOGNIZER and choose Segments the video by
detecting cuts. You will see 4 empty text fields:
19
Adjust amount of messages during calculation here [Logging_level]
Choose between normal / verbose / none. Defines the length and detail of Report-data (not
very important).
1) OUT [xml]: tier xml with shot segmentation [shotlist_xml]
This xml-file holds the shot segmentation
Fill in: X:\recognizers\shotlist.xml
2) OUT [xml]: tier xml with subshot segmentation [subshotlist_xml]
This xml-file holds the subshot segmentation
Fill in: X:\recognizers\subshotlist.xml
3) OUT [csv]: tier csv with shot segmentation [shotlist_csv]
This csv-file holds the shot segmentation
Fill in: X:\recognizers\shotlist.csv
4) OUT [csv]: tier csv with subshot segmentation [shotlist_csv]
This csv-file holds the subshot segmentation
Fill in: X:\recognizers\subshotlist.csv
Once filled in, press START
3.4 Skin Color Estimation with ELAN
Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and
press OK.
20
To your upper right, click on VIDEO RECOGNIZER and choose Estimates YUV intervals
representing skin. You will see 2 empty text fields:
Threshold used to decide whether the pixels changed from last frame [Change_threshold]
Choose value between 10.0 - 50.0. Recommended value is 30.0.
21
Specifies whether analyze also frames with only one moving cluster or not
[Use_single_cluster]
Choose between true / false
1) IN [xml]: xml containing result of segmentation [shotlist_xml]
Here you need to import an XML file which contains segmentation information (this step is
not necessary if video consists of one shot).
Fill in: X:\recognizers\shotlist.xml
2) OUT [xml] xml updated with new information [skincolorvalues_xml]
This file holds the skincolor values (same information as can be created manually with
GUISkin).
Fill in: X:\recognizers\skincolorvalues.xml
Once filled in, press START
3.5 Using the hands and head tracking recognizer
Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and
press OK.
22
To your upper right, click on VIDEO RECOGNIZER and choose Tracks motion of hands and
head.
IN [xml]: xml containing skin colour parameters and/or shot information [XML_filename]
Here you need to import the XML file containing the skin color parameters. This file can
have either been created by GUISkin or by ELAN.
Fill in, e.g.: X:\recognizers\skincolorvalues.xml
23
Adjust amount of log messages [Logging_level]
Choose between none / normal /verbose
Speed threshold used to detect hands movement [speed_threshold]
Choose between low / normal / high. Sensitivity of gesture-analysis.
Background image, used to improve tracking [background_image]
Choose between true / false. ELAN offers to cancel out non-moving objects in the
background which appear similar to body parts due to their color and form. ("true"
recommended in most cases)
Write output video [output_video]
Choose between true / false. ELAN offers to create an output video with all gesture tracking
information. ("true" recommended)
OUT [aux] Directory with tier files containing the annotations [Output_path]
This directory holds all output files which ELAN creates, specifically tiers and output video.
Fill in: X:\recognizers\output\
Once filled in, press START
This recognizer in current version is somewhat unstable and it can either crash or endlessly
keep you waiting for the results.
The way of checking if it’s still working is to open your work folder and see if the files you
have just created above are changing their size. If they’re growing, the recognizer is still
working. If not, it is finished. In such case press CANCEL once or twice. (this is a bug and it
should be fixed soon).
Once the program is finished, you have to import the tiers manually.
24
Press FILE -> IMPORT -> IMPORT TIERS FROM RECOGNIZERS and then chose all xml files
ELAN has created, one at a time.
Ideally, the result should look similar to the one displayed below.
Output tiers of ELAN can be best understood with the graphic below.
25
Download