Insight Through Computing

advertisement
15. Strings
Operations
Subscripting
Concatenation
Search
Numeric-String Conversions
Built-Ins: int2str,num2str,
str2double
Insight Through Computing
Previous Dealings
N = input(‘Enter Degree: ’)
title(‘The Sine Function’)
disp( sprintf(‘N = %2d’,N) )
Insight Through Computing
A String is an Array of
Characters
‘Aa7*>@ x!’
A
a
7
*
>
@
x
This string has length 9.
Insight Through Computing
!
Why are Stirngs Important?
1. Numerical Data often encoded as
strings
2. Genomic calculation/search
Insight Through Computing
Numerical Data is Often Encoded in
Strings
For example, a file containing
Ithaca weather data begins with the string
W07629N4226
Longitude:
Latitude:
Insight Through Computing
76o 29’ West
42o 26’ North
What We Would Like to Do
W07629N4226
Get hold of the substring ‘07629’
Convert it to floating format so that
it can be involved in numerical calculations.
Insight Through Computing
Format Issues
9 as an IEEE floating point number:
0100000blablahblah01001111000100010010
9 as a character:
01000otherblabla
Insight Through Computing
Different
Representation
Genomic Computations
Looking for patterns in a DNA sequence:
‘ATTCTGACCTCGATC’
ACCT
Insight Through Computing
Genomic Computations
Quantifying Differences:
ATTCTGACCTCGATC
ATTGCTGACCTCGAT
Remove?
Insight Through Computing
Working With Strings
Insight Through Computing
Strings Can Be Assigned
to Variables
S = ‘N = 2’
‘N = 2’
S
N = 2;
S = sprintf(‘N = %1d’,N)
sprintf produces a formatted string using fprintf rules
Insight Through Computing
Strings Have a Length
s = ‘abc’;
n = length(s);
% n = 3
s = ‘’;
n = length(s)
% the empty string
% n = 0
s = ‘ ‘;
n = length(s)
% single blank
% n = 1
Insight Through Computing
Concatenation
This:
S = ‘abc’;
T = ‘xy’
R = [S T]
is the same as this:
R = ‘abcxy’
Insight Through Computing
Repeated Concatenation
This:
s = ‘’;
for k=1:5
s = [s ‘z’];
end
is the same as this:
z = ‘zzzzz’
Insight Through Computing
Replacing and Appending
Characters
s = ‘abc’;
s(2) = ‘x’
%
s = ‘axc’
t = ‘abc’
t(4) = ‘d’
%
t = ‘abcd’
v = ‘’
v(5) = ‘x’
%
v = ‘
Insight Through Computing
x’
Extracting Substrings
s = ‘abcdef’;
x
x
x
= s(3)
= s(2:4)
= s(length(s))
Insight Through Computing
%
%
%
x = ‘c’
x = ‘bcd’
x = ‘f’
Colon Notation
s(
Starting
Location
Insight Through Computing
:
)
Ending
Location
Replacing Substrings
s = ‘abcde’;
s(2:4) = ‘xyz’
% s = ‘axyze’
s = ‘abcde’
s(2:4) = ‘wxyz’
% Error
Insight Through Computing
Question Time
s = ‘abcde’;
for k=1:3
s = [ s(4:5) s(1:3)];
end
What is the final value of s ?
A abcde B. bcdea
Insight Through Computing
C. eabcd D. deabc
Problem: DNA Strand
x is a string made up of the characters
‘A’, ‘C’, ‘T’, and ‘G’.
Construct a string Y obtained from x by
replacinig each A by T, each T by A,
each C by G, and each G by C
x: ACGTTGCAGTTCCATATG
y: TGCAACGTCAAGGTATAC
Insight Through Computing
function y = Strand(x)
% x is a string consisting of
% the characters A, C, T, and G.
% y is a string obtained by
% replacing A by T, T by A,
% C by G and G by C.
Insight Through Computing
Comparing Strings
Built-in function strcmp
strcmp(s1,s2) is true if the strings s1
and s2 are identical.
Insight Through Computing
How y is Built Up
x: ACGTTGCAGTTCCATATG
y: TGCAACGTCAAGGTATAC
Start:
After 1 pass:
After 2 passes:
After 3 passes:
Insight Through Computing
y:
y:
y:
y:
‘’
T
TG
TGC
for k=1:length(x)
if strcmp(x(k),'A')
y = [y 'T'];
elseif strcmp(x(k),'T')
y = [y 'A'];
elseif strcmp(x(k),'C')
y = [y 'G'];
else
y = [y 'C'];
end
end
Insight Through Computing
A DNA Search Problem
Suppose S and T are strings, e.g.,
S: ‘ACCT’
T: ‘ATGACCTGA’
We’d like to know if S is a substring of T
and if so, where is the first occurrance?
Insight Through Computing
function k = FindCopy(S,T)
% S and T are strings.
% If S is not a substring of T,
%
then k=0.
% Otherwise, k is the smallest
% integer so that S is identical
% to T(k:k+length(S)-1).
Insight Through Computing
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(1:4))
Insight Through Computing

False
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(2:5))
Insight Through Computing

False
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(3:6))
Insight Through Computing

False
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(4:7)))
Insight Through Computing

True
Pseudocode
First = 1; Last = length(S);
while S is not identical to T(First:Last)
First = First + 1;
Last = Last + 1;
end
Insight Through Computing
Subscript Error
S: ‘ACCT’
T: ‘ATGACTGA’
strcmp(S,T(6:9))
There’s a problem if S is not a substring of T.
Insight Through Computing
Pseudocode
First = 1; Last = length(s);
while Last<=length(T) && ...
~strcmp(S,T(First:Last))
First = First + 1;
Last = Last + 1;
end
Insight Through Computing
Post-Loop Processing
Loop ends when this is false:
Last<=length(T) && ...
~strcmp(S,T(First:Last))
Insight Through Computing
Post-Loop Processing
if Last>length(T)
% No Match found
k=0;
else
% There was a match
k=First;
end
The loop ends for one of two reasons.
Insight Through Computing
Numeric/String
Conversion
Insight Through Computing
String-to-Numeric Conversion
An example…
Convention:
W07629N4226
Longitude:
Latitude:
Insight Through Computing
76o 29’ West
42o 26’ North
String-to-Numeric Conversion
S = ‘W07629N4226’
s1 = s(2:4);
x1 = str2double(s1);
s2 = s(5:6);
x2 = str2double(s2);
Longitude = x1 + x2/60
There are 60 minutes in a degree.
Insight Through Computing
Numeric-to-String Conversion
x = 1234;
s = int2str(x);
% s = ‘1234’
x = pi;
s = num2str(x,’%5.3f’); % s =‘3.142’
Insight Through Computing
Problem
Given a date in the format
‘mm/dd’
specify the next day in the same format
Insight Through Computing
y = Tomorrow(x)
x
02/28
07/13
12/31
Insight Through Computing
y
03/01
07/14
01/01
Get the Day and Month
month = str2double(x(1:2));
day
= str2double(x(4:5));
Thus, if x = ’02/28’ then month is assigned
the numerical value of 2 and day is
assigned the numerical value of 28.
Insight Through Computing
L = [31 28 31 30 31 30 31 31 30 31 30 31];
if day+1<=L(month)
% Tomorrow is in the same month
newDay = day+1;
newMonth = month;
Insight Through Computing
L = [31 28 31 30 31 30 31 31 30 31 30 31];
else
% Tomorrow is in the next month
newDay = 1;
if month <12
newMonth = month+1;
else
newMonth = 1;
end
Insight Through Computing
The New Day String
Compute newDay (numerical) and
convert…
d = int2str(newDay);
if length(d)==1
d = ['0' d];
end
Insight Through Computing
The New Month String
Compute newMonth (numerical) and
convert…
m = int2str(newMonth);
if length(m)==1;
m = ['0' m];
end
Insight Through Computing
The Final Concatenation
y = [m '/' d];
Insight Through Computing
Some other useful string functions
str= ‘Cs 1112’;
length(str)
isletter(str)
isspace(str)
lower(str)
upper(str)
%
%
%
%
%
7
[1 1 0 0 0 0 0]
[0 0 1 0 0 0 0]
‘cs 1112’
‘CS 1112’
ischar(str)
% Is str a char array? True (1)
strcmp(str(1:2),‘cs’)
% Compare strings str(1:2) & ‘cs’. False (0)
strcmp(str(1:3),‘CS’)
% False (0)
Insight Through Computing
ASCII characters
(American Standard Code for Information Interchange)
ascii code
:
:
65
66
67
:
90
:
Insight Through Computing
Character
:
:
‘A’
‘B’
‘C’
:
‘Z’
:
ascii code
:
:
48
49
50
:
57
:
Character
:
:
‘0’
‘1’
‘2’
:
‘9’
:
Character vs ASCII code
str= ‘Age 19’
%a 1-d array of characters
code= double(str)
%convert chars to ascii values
str1= char(code)
%convert ascii values to chars
Insight Through Computing
Arithmetic and relational ops on characters
•
•
•
•
‘c’-‘a’ gives 2
‘6’-‘5’ gives 1
letter1=‘e’; letter2=‘f’;
letter1-letter2 gives -1
• ‘c’>’a’ gives true
• letter1==letter2 gives false
• ‘A’ + 2
gives 67
• char(‘A’+2) gives ‘C’
Insight Through Computing
Example: toUpper
Write a function toUpper(cha) to convert
character cha to upper case if cha is a lower case
letter. Return the converted letter. If cha is not
a lower case letter, simply return the character
cha.
Hint: Think about the distance between a letter
and the base letter ‘a’ (or ‘A’). E.g.,
a b c d e f g h …
distance = ‘g’-‘a’ = 6 = ‘G’-‘A’
A B C D E F G H …
Of course, do not use Matlab function upper!
Insight Through Computing
function up = toUpper(cha)
% up is the upper case of character cha.
% If cha is not a letter then up is just cha.
up= cha;
cha is lower case if it is between ‘a’ and ‘z’
Insight Through Computing
function up = toUpper(cha)
% up is the upper case of character cha.
% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’
end
Insight Through Computing
function up = toUpper(cha)
% up is the upper case of character cha.
% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’
offset= cha - 'a';
% Go same distance from ‘A’
end
Insight Through Computing
function up = toUpper(cha)
% up is the upper case of character cha.
% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’
offset= cha - 'a';
% Go same distance from ‘A’
up= char('A' + offset);
end
Insight Through Computing
Download