Strings: Strings are not built into the C++ core language, but... the Standard Template Libraries (STLs). So to use them you...

advertisement
Strings:
Strings are not built into the C++ core language, but they are provided as one of
the Standard Template Libraries (STLs). So to use them you have put in your
program:
#include <string>
There is a kind of string data type built into the C++ language. They are usually
called C-strings or "native C strings." They're kind of a holdover from the the C
language, on which C++ is built. Partly they are there for "upward compatibility." C
strings are more primitive, in the sense that they do less for you and more errorprone.
When you write a quoted sequence of characters, such as "Hello", you are writing a
C-string. When you do an assignment, such as:
string fred;
fred = "Hello";
The compiler is automatically converting the C-string "Hello" to a string value. You
can also do it explicitly like this:
string fred;
fred = string("Hello");
Strings are nicer: They can grow. They have better error checking. You can do
things with them, such as compare them (<, >, etc.) and concatenate them (glue
them end to end).
String concatenation is denoted by the "+" operator, which has been overloaded to
work on strings as well as other things, such as ints, doubles, and chars.
"Overloading" means giving more than one meaning to an operator of function. You
can overload operators and functions too. The only rule is that the compiler has to
be able to tell from the types of the arguments which definition to use.
To understand things it is helpful to think of them as vectors of characters (they are
almost identical). Many of the same operations apply to both.
What are some of the other things you can do with strings?
Strings have a number of member functions, which are functions that you address
to the string. (Just like for scribblers, you can address commands to them like
forward() and getName(), for example robot.forward(0.75) or robot.getName().)
If S is a string, S.size() returns the length of the string.
S.find(c) returns the location of the first occurrence of the character c in S. If c
doesn't occur in S, it returns a special value, string::npos (which stands for "no
position").
We can address locations in a string with indices, just as we do for vectors. So S[0]
is the first character in S, S[1] is the second, and so forth. You can assign to these
too.
Notice that the indices of an N-element string string run from 0 to N-1, so that the
last character in a string is always S[S.size()-1]. If you forget this, you'll make an
off-by-one error.
If the index I is too big (>= S.size() ), then S[I] fetches the contents of the location
in memory where S[I] would be if it existed. In other words I bytes from the
beginning of S. It could be anything. If you fetch it, you are getting garbage. If you
assign to it, you are assigning to a memory location that is not part of S, but is part
of some other variable or even the code. This can lead to mysterious errors or
cause your program to blow up. This is called out-of-bound subscripting or indexing,
or over-subscripting a string or vector.
This is potentially a serious problem, and even a computer security risk. "Buffer
overruns" have been used to insert malicious code into computers.
Unfortunately, C and C++ do not do vector and string bounds checking, because it
takes a little computer time, and so these errors are easy to make.
There a safe indexing operation in C++: the .at() member function.
S.at(I) is equivalent to S[I], but with bounds checking.
Bounds checking is like putting a fuse or a circuit breaker in a circuit. It costs a
little, but helps prevent things from blowing up.
.at() throws an out-of-bounds exception (which you can "catch") if the index is too
big or too small.
find is built in to strings, but it's a good exercise to define your own Find function.
To do it, think about how you would do it by hand.
This is a typical example of a sequential search. If the data are not ordered in any
way, the best you can do is to search from beginning to end. If what you are
looking for happens to be at the beginning of the string or vector, then it takes one
pass through the loop. If it is at the end, then it takes N passes through the loop.
On the average (assuming everything is equally likely), it takes about N/2 passes
through the loop. So the time is proportional to N, the size of the string or vector.
So it is called an O(n), order N, or linear algorithm.
If you can organize the data (e.g., alphabetize or sort it), then you can do better,
for example there are O(log N) steps, a logarithmic search. An algorithm that takes
time proportional to N^2 is a quadratic algorithm, which is not so good.
Practice: think about writing Find(string S, string T), which looks for T as a
substring of S, and tells you the first place it occurs, if any. It will require two
nested loops.
Another example, palindrome tester.
Palindromes:
"Madam, I am Adam." — Adam
"Able was I ere I saw Elba." — Napoleon
Ignore everything but letters. Ignore uppercase/lowercase.
Then see if the string is symmetrical.
Don't forget to pay attention to whether it's odd or even length.
It's a palindrome if
S[0] == S[N-1]
S[1] == S[N-2]
S[2] == S[N-3]
....
terminates with
S[X] == S[N-X-1]
Stops if N is even and X == N/2 (integer division).
This also works if N is odd (do you see why?)
Download