User guide Executing recognizers under ELAN 4.1.2. Christopher Rosenthal Przemyslaw Lenkiewicz Index 1. Setup 1.1 Install ELAN 4.1.2. 1.2 Install GuiSkin 1.3 ELAN extensions 1.4 Create your personal Work Folder 2. ELAN Audio Recognizers 2.1 ELAN audio recognizers: What they can do. 2.2 ELAN audio recognizers: Using them 2.3 IAIS 02: Standard homogeneous segmentation 2.4 IAIS 03: Fine audio segmentation 2.5 IAIS 04: Speech/ Non - Speech detection based on pre-trained acoustic models 2.6 IAIS 06: Speaker diarization 2.7 IAIS 01: Silence detection 3. ELAN Video Recognizers 3.1 ELAN video recognizers: What they can do. 3.2 Skin Color Estimation with GuiSkin 3.3 Shot Segmentation with ELAN 3.4 Skin Color Estimation with ELAN 3.5 Using hands and head tracking recognizer 2 1. Setup 1.1 Install ELAN 4.1.2. 1.2 Install GuiSkin 1.3 ELAN extentions 1.4 Create your personal Work Folder The recognizers are not able to access anything on your local hard disk, nor to save anything there. As they are executed on MPI servers, they can to access and write files that are also on a server. As most of the media files are already uploaded to the MPI Archive, it is in the right place for the recognizers and it can be processed directly in their location1. For any other files that you want to process, and for storing the recognizers’ output files, we will create your personal work folder. Go to your ‘Computer’ and click Map network drive. If some files are not accessible (you get an error when trying to process them) please contact Przemek Lenkiewicz or Eric Auer. 1 3 Choose any drive letter for the new drive (in the example below X is chosen) and type the following in the Folder field: Figure 1. Mapping a network drive \\Sun4\{your MPI user name} This path leads to your home folder on the MPI network infrastructure, every MPI user has one. We will create your work folder there. 4 Figure 2. Asking for credentials If you are asked for credentials (See Figure 2) provide your MPI username and password. Make sure you see MPI_NL before your username. Now open your newly created drive and create a folder named Recognizers. Any name is allowed, we will just keep using Recognizers in this example. Once created, right click on Recognizers folder, choose PROPERTIES. Choose the SECURITY tab and then under GROUP OR USER NAMES choose EVERYONE and click EDIT. 5 6 Again make sure that EVERYONE is selected, then click Full Control (all checkboxes should be selected after that) and click OK. 7 This time select your home folder (the one with you user name), that we have just mapped. In this example this is the x:\ drive. Again right click and choose Properties. 8 Go to the Security tab and select Everyone. Click Edit. Make sure that Everyone is selected. Click on Read & execute checkbox (the following two should also become selected then) and click OK. 9 2. ELAN audio recognizers 2.1 ELAN audio recognizers: What they can do. ELAN audio recognizers offer automatic assistance in analyzing audio files containing speech. Audio files can be segmented into meaningful segments, respectively speech/ no speech segments, silence/ no silence segments and it can conduct a speaker diarization. Ideally, audio files fulfill the following characteristics: - All subjects speak clearly, at the same level of volume There is no overlap between the subject's conversation Background noise is reduced to a minimum 2.2 ELAN audio recognizers: Using them To start using the audio recognizers, open ELAN, go to -> FILE -> NEW. Open the desired audio file by moving it into the Selected Files menu (seen on the picture below, on the right). Then click OK. You cannot process a video file with audio recognizers directly. You need to have the video file and also a separate WAV file with the audio track. It is a common practice to obtain both files when digitizing the recordings at MPI. When processing such video recording, make sure to include both files (audio and video) in the Selected Files window. 10 To let ELAN create audio segmentations within an audio file the program offers 2 recognizers: 1) the IAIS 02: Standard homogeneous segmentation and 2) IAIS 03: Fine audio segmentation. Both of them can be used for audio segmentation, however, the IAIS 03 gives the user more control to fine-tune the results. 2.3 IAIS 02: Standard homogeneous segmentation Chose the IAIS 02- Standard homogeneous segmentation recognizer from the Audio Recognizer dropdown list. Define the output file, where the results of the segmentation will be stored. The file has to be located in a folder, which is accessible with write permissions for the recognizer, so e.g. the Recognizers folder that has been created in Section 1.4. The name of the file needs to end with ".xml". Try to give this file a meaningful name, like: OUT [xml] Tier holding the standard segmentation: X:\recognizers\output.audio.standard.segmentation.xml As this file will later be reused. Click on START and let ELAN define segments within your audio file. 11 Figure 3. IAIS 02: Standard homogeneous segmentation 2.4 IAIS 03: Fine audio segmentation The IAIS 1- Fine audio segmentation works in a similar way to the IAIS 02 recognizer. The difference is that is offers the user the chance to modify a parameter, which controls the sensitivity of the segmentation process. Chose the fine audio segmentation and Define the output file, where the results of the segmentation will be stored. OUT [xml] Tier holding the fine segmentation: E.g. X:\recognizers\output.audio.fine.segmentation.xml Use the slider to tune the sensitivity of the recognizer and the size of the resulting segments. Choose whether the recognizer should perform merging of the results. This step will merge the resulting segments that are neighbors and have high similarity. Click on START and let ELAN define segments within your audio file. 12 2.5 IAIS 04: Speech/ Non - Speech detection based on pre-trained acoustic models To detect what parts of the recordings contain human speech, change the recognizer to IAIS 2- Speech/ Non - Speech detection. Within the parameter section fill in the input file name, which contains the segmentation information you have created earlier, and output file name, which will hold the speech/no speech results. For example: IN [xml]: Result of the segmentation with optional manual labels for training: X:\recognizers\output.audio.segmentation.xml OUT [xml]: Tier holding the speech/non-speech segmentation: E.g. X:\recognizers\output.audio.speech.xml Then click -> Start 13 2.6 IAIS 06: Speaker diarization The speaker diarization recognizer will try to assign each segment of the recording to respective speaker. So it should detect the number of speakers in the recording and detect who is speaking when. Change the recognizer to IAIS 06: Speaker diarization. Within the parameter section fill in the input file name, which contains the speech/non-speech information you have created earlier, and output file name, which will hold the results of speaker diarization. For example: IN [xml]: Result of one of the speech/non-soeech recognizers: X:\recognizers\output.audio.speech.xml OUT [xml]: Tier holding the speaker diarization: E.g. X:\recognizers\output.audio.speaker.xml Then click -> Start 14 2.7 IAIS 01: Silence detection The silence detection recognizer detects audio fragments which contain no speech and separates them from fragments that contain spoken language (or rather noise?). Change the recognizer to IAIS 01: Silence detection and define the output file, where the results of the silence detection should be stored. OUT [xml] Tier holding the silence segmentation: E.g. X:\recognizers\output.audio.silence.detection.xml Use the sliders to tune the sensitivity of the recognizer. The recognizers seem to work best with the following parameters, respectively: Minimal time in ms for a silent segment: around 125 Minimal time in ms for a non-silent segment: around 140 Maximal sample value to be recognized as silence in percent: around 1,1 After calibrating the parameters click -> Start. 15 3. ELAN video recognizers 3.1 ELAN video recognizers: What they can do. ELAN video recognizers offer automatic assistance in analyzing video files regarding hand movement. The video recognizers can track and record hand movement, respectively left hand movement, right hand movement, hands joining and head/hand overlap. Video files and specifically the subjects in them need to fulfill the following characteristics: - 16 The subject's skin color has to be different to the background color The color of the subject's clothing has to be different to his/her skin color The subject's chest has to be covered by clothing (which color is different to the skin color) The skin of face and hands have to be fully visible The subject's face and body need to face the camera The upper body and face have to be fully visible Requirements fulfilled., e.g. - - Skin color differs from clothing and background color Upper body is fully visable and facing the camera Requirements not fulfilled, e.g. - - Color of clothing of left subject does not sufficiently differ from skin Face of right subject is not facing the camera ELAN video recognizers require an estimation of skin color beforehand, manually. This can be done with GuiSkin. 3.2 Skin Color Estimation with GuiSkin Skin Color Estimation with GuiSkin 17 The Skin Color Estimation App is a graphical way to estimate skin color in a video by manually selecting the best intervals in the YUV domain. This step has to be done before the video file can be successfully processed by Hands and Head tracking recognizers. 1) Open GuiSkin and import a video (-> FILE -> LOAD VIDEO) 2) Use the color intervals to mark all skin with "blue ink" and make sure no background is marked Advise: GuiSkin has 6 sliders: Y for the brightness, U and V for the colour components. First three sliders represent the mid-point of the selected range, the last three sliders represent the size of the selected range. U usually ranges between 80 and 130, V between 125 and 175. When estimating the color, you will notice this step involves trial and error. Our advice is to start with the second slider and observe the changes in the image. When a reasonable amount of skin color is highlighted, use the 5th slider to adjust the range for this value. The two remaining sliders can serve to limit the amount of non-skin pixels highlighted. Sometimes it cannot be avoided that some parts in the background will be marked as well. Priority should always be that all skin is marked. Non-moving background can be cancelled out by the recognizers later on. 18 Once done calibrating, press SAVE RESULTS and save the given XML file in your work folder from section Error! Reference source not found.. (or save it anywhere and then copy or move to your work folder). 3.3 Shot Segmentation with ELAN Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and press OK. To your upper right, click on VIDEO RECOGNIZER and choose Segments the video by detecting cuts. You will see 4 empty text fields: 19 Adjust amount of messages during calculation here [Logging_level] Choose between normal / verbose / none. Defines the length and detail of Report-data (not very important). 1) OUT [xml]: tier xml with shot segmentation [shotlist_xml] This xml-file holds the shot segmentation Fill in: X:\recognizers\shotlist.xml 2) OUT [xml]: tier xml with subshot segmentation [subshotlist_xml] This xml-file holds the subshot segmentation Fill in: X:\recognizers\subshotlist.xml 3) OUT [csv]: tier csv with shot segmentation [shotlist_csv] This csv-file holds the shot segmentation Fill in: X:\recognizers\shotlist.csv 4) OUT [csv]: tier csv with subshot segmentation [shotlist_csv] This csv-file holds the subshot segmentation Fill in: X:\recognizers\subshotlist.csv Once filled in, press START 3.4 Skin Color Estimation with ELAN Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and press OK. 20 To your upper right, click on VIDEO RECOGNIZER and choose Estimates YUV intervals representing skin. You will see 2 empty text fields: Threshold used to decide whether the pixels changed from last frame [Change_threshold] Choose value between 10.0 - 50.0. Recommended value is 30.0. 21 Specifies whether analyze also frames with only one moving cluster or not [Use_single_cluster] Choose between true / false 1) IN [xml]: xml containing result of segmentation [shotlist_xml] Here you need to import an XML file which contains segmentation information (this step is not necessary if video consists of one shot). Fill in: X:\recognizers\shotlist.xml 2) OUT [xml] xml updated with new information [skincolorvalues_xml] This file holds the skincolor values (same information as can be created manually with GUISkin). Fill in: X:\recognizers\skincolorvalues.xml Once filled in, press START 3.5 Using the hands and head tracking recognizer Open ELAN -> FILE -> NEW and move your video into the SELECTED FILES window and press OK. 22 To your upper right, click on VIDEO RECOGNIZER and choose Tracks motion of hands and head. IN [xml]: xml containing skin colour parameters and/or shot information [XML_filename] Here you need to import the XML file containing the skin color parameters. This file can have either been created by GUISkin or by ELAN. Fill in, e.g.: X:\recognizers\skincolorvalues.xml 23 Adjust amount of log messages [Logging_level] Choose between none / normal /verbose Speed threshold used to detect hands movement [speed_threshold] Choose between low / normal / high. Sensitivity of gesture-analysis. Background image, used to improve tracking [background_image] Choose between true / false. ELAN offers to cancel out non-moving objects in the background which appear similar to body parts due to their color and form. ("true" recommended in most cases) Write output video [output_video] Choose between true / false. ELAN offers to create an output video with all gesture tracking information. ("true" recommended) OUT [aux] Directory with tier files containing the annotations [Output_path] This directory holds all output files which ELAN creates, specifically tiers and output video. Fill in: X:\recognizers\output\ Once filled in, press START This recognizer in current version is somewhat unstable and it can either crash or endlessly keep you waiting for the results. The way of checking if it’s still working is to open your work folder and see if the files you have just created above are changing their size. If they’re growing, the recognizer is still working. If not, it is finished. In such case press CANCEL once or twice. (this is a bug and it should be fixed soon). Once the program is finished, you have to import the tiers manually. 24 Press FILE -> IMPORT -> IMPORT TIERS FROM RECOGNIZERS and then chose all xml files ELAN has created, one at a time. Ideally, the result should look similar to the one displayed below. Output tiers of ELAN can be best understood with the graphic below. 25