Introduction to Spectral Analysis and Matlab

advertisement
Introduction to Spectral Analysis and Matlab
IRIS Summer Intern Orientation, 2014
Introduction
The object of this lab is to explore the relationship between the time domain and the
frequency domain while being introduced to the numerical computing program MATLAB.
You will first look at pure sine waves as a function of time and their representation in the
frequency domain, and then examine some earthquake data. Those already proficient in
both MATLAB Scripting and Fourier analysis may wish to skip directly to the more
advanced sections covered in the Earthquake data section (Part 3 – be sure to do these in a
script).
What is Matlab?
MATLAB is a commonly used commercial package designed to manipulate and plot all
sorts of data. The MATLAB introduction states:
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include:

Math and computation

Algorithm development

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would
take to write a program in a scalar noninteractive language such as C.
Today's lab will use only a few of the features offered by MATLAB, but should give you
enough of an introduction to allow you to understand the basic syntax, input/output and
plotting.
Getting Started
First login to Fedora (not Windows) and open a terminal window.
Then create a new directory in your home directory to use for your MATLAB processing,
cd to that directory and then copy all files from the system directory written on the board
to that directory. Finally start up MATLAB by typing matlab in the terminal window.
PART ONE - Basic Matlab Commands (skip if you are very familiar with MATLAB,
although even some intermediate users may find useful techniques and hints in this
section)
The initial windows that appear include Command (largest), Workspace (usually top
right), and Command history (bottom right). Many operations can be performed either in
the command window or via the drop down menus. In most cases the command window
commands will be listed here.
Simple Arithmetic and common functions
The syntax for this is exactly as you’d expect (i.e. 1+1 (hit enter) -> 2 (output), 5*9 -> 45,
ect).
Exponentials are denoted by the “^” sign (e.g. 2^3 -> 8)
Functions
MATLAB also contains several useful functions built into the program. A function in Matlab
will have the following form:
Function_Name(argument_1, argument_2 (not always needed), …)
Some common Functions containing only a single argument are listed below
-
Square Root : sqrt
Natural Log: log
Log Base 10: log10
Sine (in radians): sin
Sine (in degrees) sind
Example: sqrt(4) -> 2
To see what a given function does and how many arguments are required (or available),
simply type : “help Function_Name” to get a brief overview of the function.
Example: help cos
Vectors and Matrices
MATLAB is designed for easy matrix manipulation. The matrices are like excel
spreadsheets in that columns and rows can be manipulated as a unit or as individual
elements.
Arrays in MATLAB are denoted by pairs of square brackets “[].” Elements in an array are
separated by either commas (moves one column left) or semicolons (moves down one row).
Example:
Row Vector: [1,2,3,4]
Column Vector: [1;2;3;4]
Exercise
In the text that follows, lines beginning with “ >> “ are commands to be entered into the
command window.
>>
>>
A = [1, 5, 9; 3, 7, 11]
B = [0, 6, 12, 18; 2, 8, 14, 20; 4, 10, 16, 22]
Note how the variables “A” and “B” now appear in the “Workspace” window. You can
double click on either the variable names to view the entire array and edit individual elements.
Back in the Command Window, you can pull out one or several elements of an array using the
following format: Matrix_Name(row_number, column_number).
Examples
A(1,3) -> 9
A(1,1:3) -> [1,5,9]
To get only the first column of B: B(:,1)
Or similarly only the second row of B: B(2,:)
Array Operations
Arrays may be multiplied and divided by a scalar as expected. But taking the inverse of each
array element (i.e. converting from period (s) to frequency (Hz)) requires the ./ operation
Examples
Multiply every element in A by 2: 2*A
Divide every element in A by 2: A/2
Inverting each element in A: 1./A
In addition, two arrays may be multiplied together if the dimensions of the two arrays are
compatible.
Example
Multiply A and B and save it to variable C: C=A*B; (the semicolon suppresses output to
the screen)
There are other additional matrix operations that will be helpful later this week when working
on the inverse problem assignment
Take the Inverse of C: inv(C)
Take the transpose of C: C’
PART Two – Loading data, Scripting using Cells, Plotting (including tips on how to
make them look ready for publications and presentations), and introduction to
Fourier Analysis
Loading data
For this part of the exercise, you will need to load into your working directory (the folder
you are currently in) at least the following files: sine_waves, multi_sine and fourier.m
Now, the data is now in your directory, but has not yet been incorporated into your
MATLAB workspace. To do so type:
>>load sine_waves
(loads the file sine_waves from the current directory)
The variable sine_waves should now appear in your workspace and you can manipulate
and examine this array just as you would any other (see Part 1 – vectors and matrices).
Scripting and Cells
The remainder of this exercise will be performed in a Script and NOT in the Command
window. Scripting has the advantage that all of your code is saved, run all at once and then
can be rerun at a later time. Often you will perform the same analysis on several different
data sets (e.g. different seismic stations or channels) and it will be far more efficient to
save your analysis in a script rather than redoing everything again and again from
the command line!
To open a script, click on the piece of paper icon on the top left corner of the MATLAB
window. All the commands that you would use in the Command window still work in the
script, now everything is just run at once from top to bottom.
It is not uncommon for scripts to be several hundred lines long, and sometimes we want
to work on one part of the script without having to run the rest of it. For this reason, many
people divide their script into Cells. Cells are designated in a script by %% followed by a
space and may be run independently of the rest of the script. This is especially helpful
when debugging your code and playing with your plots to get them to look right, as all the
data and processing doesn’t have to be redone each time you make a change.
Starting your script
The first lines of a script are generally commented to state what the script does. You can
comment a line (i.e. it is not read and interpreted by MATLAB when you run the script)
with the % symbol. When you have multiple scripts in your workspace and you don’t
remember what one does, you can read this first commented line by typing
>> help Script_Name
I prefer to set up my scripts so that I can just open one and run it without prepping my
workspace in advance to run my script. For this reason, my first line of code after the
comment line is always
>> Clear all
This clears the entire workspace and is useful in preventing MATLAB from doing things
you don’t expect it to (for example if you use a variable in your script that is already
declared in your workspace, this can really goof things up).
Now load sine_waves into MATLAB and begin scripting the rest of the exercise
Cell #1 – Plotting simple sine waves in the Time Domain
The file sine_waves has 512 rows and 4 columns. The first column is the time in seconds.
Columns 2-4 are the amplitudes of 3 different sine waves, as sampled at the times listed in
column 1.
To view the elements of a matrix or any variable, simply type its name in the Command
Window.
>> sine_waves
or double click on the variable name in Workspace.
Questions: Answer all questions in this exercise by commenting them in your
MATLAB script
 What is the sampling interval of the data (ie. the time in seconds between successive
samples)?
 How many samples are there per second?
 The maximum signal frequency that can be correctly observed is half the sampling
frequency. This is called the Nyquist frequency. What is the Nyquist frequency in this
case?
Plotting data
To plot data, use the plot command and select the columns you want to plot against each
other:
Figure 1
>> figure(1)
>>plot(sine_waves(:,1),sine_waves(:,2));
(amplitude))
(plots column 1 (time) vs column 2
Hit the green triangle (reminds me of a play button) on the top of the script window. Now you
should have a simple (and ugly!) plot of a sine wave appear in a new window. Make it prettier
in figure 2 by zooming in, adding labels, enlarging the tick marks, and changing the color of
and style of the plot.
Figure 2 (Copy the figure one commands and add these below)
>>xlim([0 5]) (Sets the plotting limits of the x-axis)
>>ylim([-1.5 1.5])
>>xlabel(‘Time (s)’, ‘fontsize’, 24) (Labels the x-axis and (optionally) sets the font size
of the label)
>>ylabel(‘Amplitude’, ‘fontsize’, 24)
>>set(gca,’FontSize’,24) (Makes the tick marks larger. This is imperative for
presentations).
You can also change the color, width, and style of your plot. To make a pretty cyan (c)
plot with circles (o) at each datapoint connected by a line(-) twice as thick as the original
plot. Change the plot command beneath figure(2) to:
>> plot(sine_waves(:,1),sine_waves(:,2),’co-‘,’linewidth’,2)
You can also find the x,y value of any point on the graph by selecting data cursor icon on
the Figure toolbar and clicking on the figure.
Now that we have a pretty figure of a single sine wave, we will plot multiple sine waves
on the same plot. If you were just to put two plot commands next to each other, the
second plot command will effectively erase the previous one. In order to prevent this from
happening we use the hold on command after the first plotting command. This must be
done once per figure.
>>hold on
(hold axes on for later plots, hold off allows replotting of new data, the
default is to clear the plot each time (hold off))
Figure 3 (Plot of two sine waves):
>>plot(sine_waves(:,1),sine_waves(:,3),':');
dotted line)
(plots column 1 vs col 3 using a
Add a line at y=0, which makes it easier to determine frequency, create a new variable
called zero_line, and fill it with zeros:
>>zero_line=zeros(512,1);
>>plot(sine_waves(:,1),zero_line);
Feel free to change the colors or styles of the plots, but be sure to keep the axes labeled
and everything neat and legible.
Questions
 What are the frequencies of the sine waves in column 2 and column 3?
 What are their relative amplitudes (ie what is the ratio of their amplitudes)?
Comparing individual sine waves to their sum
Column 4 is the sum of the amplitudes of columns 2 and 3. To plot column 4:
Figure 4: Add the sum of the two sine waves to your previous plot and include a
legend
Hint: type ‘help legend’ into the MATLAB command window to find out how to set up a
legend.
Note that while you can tell that the resulting wave contains more than one frequency, it is
harder to estimate the relative amplitudes of the two frequencies when they are summed
together.
Cell #2 (start a new cell in your script here): Frequency Domain
To transform to the frequency domain, calculate the Fourier transform for the sine waves
in columns 2-4 of sine_waves. Here we will use a function called fourier.m, which is not
part of the basic MATLAB suite (you can write your own functions in MATLAB in a
manner similar to writing a script).
Again, use the help feature in MATLAB to find out how to use the function fourier. Then
set three variables (transformed2, transformed3, and transformed4) to the fourier
transform of rows 2-4 on sine_waves.
Hint: Your code should have the form:
>> transformed2 = fourier(arg1, arg2)
For each sine wave, a new matrix is created with frequency (Hz) in column 1, amplitude
in column 2 and phase in column 3. Looking at the numbers in transformed2 answer the
following questions in your script.




What is the sampling interval in the transformed data (in Hz)
What is the maximum frequency?
Does this agree with your determination of the Nyquist frequency?
Before plotting the spectra, consider what you might expect for the frequency response of
each sine wave.
Convert Column one of your transformed arrays from Hz to seconds (Hint: see Array
Operations at the bottom of the first section of instructions).
Now plot the amplitude spectrum for the sine wave that was in column 2 of sine_waves,
keeping previous plotting commands to keep everything looking neat and with relevant
axis labels (x should be seconds).
Note: You can run just the second cell by clicking within the second cell (should become
highlighted yellow) and then clicking the Evaluate Cell icon (yellow rectangle above a
white rectangle) on the left of the bottom row of the Editor toolbar.
Figure 5:
>>plot(transformed2(:,1),transformed2(:,2));
 At what frequency is there a maximum?
Figure 6: Plot the spectrum of the two sine waves as well as their sum. Use a different
color for each line. Include a Legend.
 What is the peak frequency of the column 3 sine wave?
 What is the relative amplitude of the peaks for the 2 sine waves?
 How does this ratio compare to your measured ratio of the sine wave amplitudes?
 How do the spectral amplitudes of the combined sine waves compare to the spectral
amplitudes of the individual sine waves?
This shows that the combination of sine waves is a linear process, which leads to the
concept that an arbitrarily shaped wave can be created by the addition of a sufficient
number of sine waves.
Cell #3: Construction of a wave plot
Now, in your script, load the file multi_sine.
The file multi_sine includes 10 different sine waves (in columns 2-11) which have been
phase shifted so that there is one time when all the sine waves are at a maximum. Column
1 is time as in sine_waves. You can plot them individually in the time domain to see what
they look like.
Figure 7: Plot all 10 sine waves. Use a google search (ex. “MATLAB colors”) to try
and make each wave a different color.
To add all the sine waves together at each point in time, you can do it the long way:
>>bigwave=multi_sine(:,2)+multi_sine(:,3)+multi_sine(:,4)+multi_sine(:,5)
+multi_sine(:,6)+multi_sine(:,7)+multi_sine(:,8)+multi_sine(:,9)+multi_sine(:,10)
+multi_sine(:,11);
or you can use MATLAB's sum utility (which sums columns) along with its transpose
utility (which swaps rows and columns):
>>bigwave=sum(multi_sine(:,2:11)')';
bigwave is now the sum of all 10 sine waves.
Figure 8: Make a plot of “bigwave” in the time domain
Plot bigwave in one window and in another window plot the 10 sine waves used to create
it. Note how the sum of continuous sine waves results in a wave packet of finite duration.
Figure 9: Now calculate and plot the Fourier amplitude spectrum of bigwave (the
sampling interval is the same as before).
 What are the frequencies of the sine waves that make up the wave pulse in bigwave?
Cell #4 – Frequency of a spike
This example shows how many sine and cosine functions are needed to create a pulse-like
waveform. To create a single pulse, an infinite series of sine and cosine functions have to
be added together. A single pulse can be found in column 13 of multi_sine.
Figure 10: Use “subplot” to create a two-window plot. The top plot will show the spike in
the time domain, and the bottom plot will show the fourier transform of the spike (in Hz).
 What is the frequency response of the spike in column 13 of multi_sine? Why?
Part 3 – Looking at real Seismic Data
For the following sections, you may wish to work in pairs. Please discuss the results
amongst your peers. Do all of your work in a MATLAB script (not the workspace) and
comment your answers to each question.
Start a new cell (refer to part 2 if you don’t know what a cell is)
Now you can examine earthquake data that were collected during an earthquake hazard
assessment study of the Wellington, New Zealand region. The file quake_data includes
10 seconds of S wave recording from 3 different sites for the same earthquake. Column 2
is data from a rock site, column 3 is the recording from a sedimentary basin site and the
column 4 seismograph was located on an old peat bog (now housing development). As
before, time is in column 1.
Figure 11: Neatly plot the 3 seismograms in the time domain and compare the
signals. Use different colors for each site and include a legend.
Time domain
 What is the sampling rate?
 What is the Nyquist frequency?
Frequency domain
Transform the time series to the frequency domain as before (don't forget to include the
new sampling rate).
Figure 12: Make a plot of the three seismograms in the frequency domain. Be sure to
include a legend.
First look at the frequency response of the rock site (column 2).
 What is the range of frequencies in the ground motion?
Now look at the frequency response of the ground motion after the seismic waves have
traveled through the soft sediments below the sites in columns 3 and 4. You may want to
change the axes so that you can focus on the low frequencies.
- Note that there are clearly defined peaks in frequency for columns 3 and 4 but not for the
rock site (column 2).
 What is the frequency at which there is a maximum for columns 3 and 4?
These data were collected in sedimentary basins that can shake like a bowl of jello. A
basin can resonate at particular frequencies just like a simple harmonic oscillator. The
resonance continues long after the seismic energy has dissipated at the nearby rock site.
The frequency of oscillation for a simple cylindrical basin is related to the velocity of the
material and the depth of the basin. A wave whose wavelength is four times the thickness
of the basin will resonate in the basin.
4 x thickness = wavelength = velocity / frequency
 The surface shear wave velocity for the site in column 3 has been measured at 110 m/s,
while for the site in column 4 it was 80 m/s. What is the approximate thickness of the two
basins?
Advanced Topics – Work on as time permits
(New Cell) Loading Sac data, studying earthquakes, and Filtering
It is possible to filter out frequencies that aren't of interest for a particular problem, or to
select for particular frequencies. For example, local earthquakes include much higher
frequency content than distant earthquakes.
We will now examine broadband data recorded in at station LKWY in Utah from a
magnitude 7.9 earthquake on the Denali fault in Alaska in November 2002. First load the
data using the m-file algorithm load_sac:
>>[sachdr,utahz] = load_sac(‘lkwyz.sac’);
This loads information about the data in sachdr and loads the amplitude data into a single
column in utahz. The sampling frequency is 40 Hz.

What are the latitude, longitude, and elevation of the seismic station (hint: look in the sachdr
header).
Figure 13 – Plot the seismogram in the time domain by creating a time vector the
same length of the seismogram that starts at 0 seconds. Give an appropriate title to
the plot.
 What phases are can you identify? Identify the arrival time of the P-wave and the S-wave to find
the P-S travel time lag (use the data cursor tool). A very rough rule of thumb is that the distance
to the epicenter (in km) is about 10 times the P-S lag (in Secs). About how far away from the
seismic station is the earthquake? Verify your answer by looking at the header – how close were
you? If you’re far off you probably aren’t identifying the correct phases.
Identification of phases can sometimes be made easier by filtering out unwanted
frequencies. Use the butterworth filter function ‘filbutt’ to filter the earthquake between 2
and 19 Hz. Hint: Type “help filbutt’ into the command window to learn how to use the
function.
Figure 14 – Plot the original (top) and filtered seismogram (bottom) using a 2-panel
subplot.
Does the filter make it easier to identify the phases?
This particular filter doesn’t have a very sharp frequency cut off. One way to increase the
sharpness of the filter is to filter the data a second or third time using the same parameters.
What does this do to the waveform and the spectra?
Figure 15 – Plot the twice and thrice filtered seismograms on one pot using the
subplot command
 After a 3rd round of filtering do you see phases you didn’t see in the raw data? What do
you think might be causing the high frequency signals? Do you think they are from local
or distant sources? Why? Discuss with others at this section.
(New Cell) Effects of tapering and signal length
Tapering
Two of the reasons that the spectral amplitude peak is spread out over a range of
frequencies are the finite length of the sine wave in time and the abrupt truncation of the
waves at the end of the file. The amplitude of the spurious frequencies can be reduced by
tapering the wave so that its amplitude drops smoothly to zero at the end of the time
series.
You can do this with the function taper:
>>tapered = taper(sine_waves(:,4), 10);
4)
(taper 10% of both ends of sine_waves column
Figure 16 – Again using subplot plot the original (top) and tapered sine wave.
Figure 17 – Use subplot to compare the Fourier Transform of the original and
tapered sine wave (change the output of the FT from Hz to Seconds).
 Has the amplitude of the spurious frequencies decreased?
Signal Length
Figure 18 – Use subplot to compare the Fourier transform of the entire sine wave
(top) to just the first 130 samples.
 How does the spectrum compare to that of the full length signal?
(New Cell) Inverse transform - Frequency domain to time domain
The Fourier transform can be used to go either from time to frequency or from frequency
to time. To verify that this is true, take the frequency domain result from the New Zealand
data for column 2 (rock site) and transform it to the time domain. If qtrans2 is the output
matrix from fourier, with frequency in column 1, amplitude in column 2 and phase in
column 3, then you can calculate the inverse transform by:
>>intrans2=ifourier(qtrans2,100);
where 100 is the resulting sample rate in Hz.
Figure 19 – Use Your favorite plotting method to compare the original recorded
timeseries with that you obtained from the inverse fourier transform of the spectra.
 Does any information appear to have been lost in the transformations?
Filtering a spike in the time domain
Figure 20 – Use subplot to play with various bandpass filters of the spike in the time
(top) and frequency domain. Eventually use a lowpass filter (hint: read the filbutt
help page on how to create a lowpass filter).
 What is the result in the time and frequency domains?
Lowpass filtering of the spike is a good analog of what happens to an impulsive seismic
source as the sensor moves further away from the source.
List of MATLAB commands used in lab
The syntax for all the commands can be found in the help pages except for those marked
"not a basic matlab function". The syntax for these can be found in the files with a .m
extension on the distributed CD.
xlim,ylim - sets user defined axes
clf - clears graphics window
clear - clears all variables
clear name - clears just the variable name
exit - leave matlab
hold on - keeps plot from clearing for each new line
filbut - bandpass filter (not a basic matlab function)
fourier - calculate Fourier transform (not a basic matlab function)
load - loads in a data file
load_sac – loads in a data file in SAC format (not a basic matlab function)
plot - x vs y plot
print - saves current graphics plot to disk
save filename - saves all the current variables to disk in the file filename.mat
sum - sum the columns of a matrix
title - puts title on plot
xlabel - labels x axis
ylabel - labels y axis
zeros - creates a file of zeros
; - at end of command: execute the command but don't print the result on the screen
\' - transpose of a matrix
Download