sed 's/dog/cat/g'

advertisement
9 The sed Editor
Mauro Jaskelioff
(based on slides by Gail Hopkins)
Introduction
• sed is a Stream Editor
• Designed to edit files in a batch fashion
– Not interactive
• Often used for text substitution
• When you have multiple changes to make
to one or more files:
– Write down the changes in an editing script
– Apply the script to all the files
What does sed do?
• Used to edit input streams
– Input stream can be from a file, from a
pipe or from the keyboard
• Produces results on standard output
– …but results can be put in a file or sent
through a pipe
Typical Uses of sed
• Editing one or more files
automatically
– E.g. replace all occurrences of a string
within a file for a different string
• Simplifying repetitive edits to
multiple files
– E.g. perform the same operation on lots
of similar files
How Does sed Work?
• Each line of input is copied into an internal
buffer known as a “pattern space”
• All editing commands in a sed script are
applied, in order, to each line of input (in
the buffer)
• Editing commands are applied to all lines
in the buffer
– Unless line addressing is used to restrict the
lines affected
How does sed Work? (2)
• If a sed command changes the input,
the next command will apply to this
new (changed) line of input, not the
original one
s/caterpillars/spiders/
s/crawl/run/
sed script
– More on this later!
Furry caterpillars crawl slowly
Furry spiders crawl slowly
Furry spiders run slowly
Pattern space
How does sed Work? (3)
• When sed edits an input file, the
original input file is unchanged
– The editing commands modify a copy of
each original line of input
– When sed outputs the result, it is the copy
that is sent to STDOUT (or redirected to a
file)
• sed keeps a separate buffer, known as
the “hold space”
– Can be used to save data for later retrieval
– For most edits this isn’t needed - only if a
command refers to it
How to Run sed from the
Command Line
• sed [-n] [-e] ’command’ file(s)
– For specifying an editing command on the command line
– E.g.:
• sed 's/ant/flea/g' myCreaturesFile
• sed -e 's/ant/flea/g' -e 's/worm/slug/g' myCreaturesFile
• (what does this mean??? - more about sed commands shortly…)
• sed [-n] -f scriptfile file(s)
– For specifying a scriptfile containing sed commands
– E.g.:
• sed -f myScript myCreaturesFile
• If no file specified, sed reads from STDIN
The -n flag
• sed can be given a -n option
– This tells sed NOT to write the contents of
the pattern space by default to stdout:
• sed -n 's/ant/flea/g’ myCreaturesFile
– Another way of specifying this is to put #n
at the start of a sed script
• Why do we want to stop sed’s output?
– We can then tell sed to print specific lines of
output, rather than the whole pattern
space:
– sed -n 's/swan/coot/p’ myCreaturesFile
– NOTE the p in the above example…
sed Regular Expressions
• sed uses regular expressions
• The format of these is very similar to
those used by grep
sed Regular Expressions
Symbol Matches
Example
^
Beginning of line
/^He/ Line starts with He
$
End of line
/nd$/ Line end in nd
.
Any single character
/./
Would match, a, b, 1, 2, and so on…
*
0 or more occurrences of
preceding character
/we*/
Matches w, we, wee, weee, etc…
?
0 or 1 occurrence of preceding
character
/we?/
Matches w, or we
[]
Any character enclosed in [ ]
[abc]
Matches a, b or c
[^]
Any character NOT enclosed in [^abc]
[]
Matches d, e, f, etc. but NOT a, b or c
sed Regular Expression (2)
Symbol
Matches
Example
\{m,n\}
m-n repetitions of preceding x\{1,3\}
character
Matches x, xx or xxx
\{m,\}
m or more repetitions of
preceding character
y\{4,\}
Matches yyyy, yyyyy, yyyyyy,
etc…
\{,n\}
n or fewer (possibly 0)
repetitions of preceding
character
we\{,5\}
Matches weeeee, weeee,
weee, wee, we or w
\{n\}
Exactly n repetitions of
preceding character
z\{6\}
Matches zzzzzz
\(expression\)
Group operator or region of
interest
SEE LATER EXAMPLE
\n
nth group
SEE LATER EXAMPLE
sed Commands - Syntax
• sed instructions consist of addresses and
editing commands
• They have the general form:
– [address[,address]][!]command [arguments]
– NOTE: here, [] denotes something is optional
– Therefore:
If ! is present then it means anything
NOT in the address(es) stated
Optional arguments to
the command
[address[,address]][!]command [arguments]
Zero or more addresses
The sed command to be executed
• If the address of the command matches the
line of the pattern space (internal buffer), the
command is applied to that line
sed Addresses
• A sed command can have 0, 1 or 2
addresses
• An address in a sed command can be:
– A line number
– The symbol $ (meaning the last line)
– A regular expression enclosed in slashes
(/regex /)
• Therefore, an address can be thought of
as “something that matches” in the
pattern space
sed Addresses (2)
• If no address is specified:
– The command applies to each input line
• If one address is specified:
– The command applies to any line matching the address
– REMEMBER: an address can be a regular expression!
• If two comma-separated addresses are specified
– The command applies to the first matching line and all
succeeding lines up to and including a line matching the
second address
• If an address followed by ! is specified
– The command applies to all lines that DO NOT match the
address
sed Commands
• Consist of a single letter or symbol
– They tell sed to “do something” to the text
at the address specified
– E.g.:
• s means substitute
• g is a flag to the s command. It means global, or
all occurrences of… (more on this later)
• sed 's/ant/flea/g’ myCreaturesFile
• …means substitute all occurrences of the word ant
with the word flea in the file myCreaturesFile
– …in this example, no address is specified
and so sed applies the command to all lines
in the pattern space
sed Commands (2)
• Another example:
– sed -n ’/^squirrel/,/^swift/p’ myCreaturesFile
• Print everything between the line starting squirrel
and the line starting swift, inclusive
• Here, there are 2 addresses, both are regular
expressions:
• /^squirrel/
– The first address is the first line matching “squirrel” at the start
of the line
• /^swift/
– The second address is the first line matching “swift” at the start
of the line
– REMEMBER: regular expressions are written between / and
/
• sed therefore prints between the first matching line
(with squirrel at the start) and all succeeding lines
up to and including a line matching the second
address (with swift at the start)
sed Commands (3)
• An example using !
– sed ’/aardvark/!d’ myCreaturesFile
– Delete any line that doesn’t contain the text
“aardvark” in the file myCreaturesFile
• An example using line numbers:
– sed ’5s/wombat/womble/g’ myCreaturesFile
– Substitute all occurrences of wombat with
womble on line 5
Putting more than one sed
Element in a Command
• An example of two elements together:
– Input file:
a, a, ants on my arm
a, a, ants on my arm
a, a, ants on my arm
they’re causing me alarm!
– sed -e 's/ant/flea/g’ -e ‘s/alarm/to itch/g’ myCreaturesFile
– Output:
a, a, fleas on my arm
a, a, fleas on my arm
a, a, fleas on my arm
they’re causing me to itch!
Putting more than one sed
Element in a Command (2)
• Input file:
At the top of the tree there were 4 parrots and 2 lizards
• sed -e ‘s/parrot/lizard/g’ -e
‘s/lizard/koala/g’myCreaturesFile
• Output from sed:
• Why???
At the top of the tree there were 4 koalas and 2 koalas
…because
• sed read in the line in the file and executed:
– s/parrot/lizard/g
• …to produce the text:
At the top of the tree there were 4 lizards and 2 lizards
• sed then performed the command:
– s/lizard/koala/g
– …on this new edited line to produce:
At the top of the tree there were 4 koalas and 2 koalas
• REMEMBER from previously:
– If a sed command changes the input, the next
command will apply to this new (changed) line of
input, not the original one
Summary of sed Commands (4)
Basic Editing
append text after a line
replace text
insert text before a line
delete lines
substitute
translate characters
a\
c\
i\
d
s
y
n
r
w
q
Line Information
=
p
l
display line number of a line
display the line
display control characters in ascii
Input/Output Processing
skip current line and go to line below
read another file’s contents into the output stream
write input lines to another file
quit the sed script
Yanking and Putting
h
H
g
G
x
copy into hold space; clear out what’s there
copy into hold space; append to what’s there
get the hold space back; wipe out the destination line
get the hold space back; append to the pattern space
exchange contents of hold space and pattern space
Examples of commonly used sed Commands
s
sed ‘s/dog/cat/’ myfile
substitute the first occurrence of dog with cat for
each line found in myfile
sed ‘s/dog/cat/g’ myfile
substitute all occurrences of dog with cat in myfile
sed ‘s/dog/cat/4’ myfile
find every line in myfile with 4 “dog” strings and
substitute the 4th occurrence of dog with cat on each
sed ‘1,2s/dog/cat/g’ myfile
substitute all occurrences of dog with
cat in the first 2 lines of myfile ONLY
sed ‘/dog/,/cat/s/.*//’ myfile
look for the text dog followed by the text cat.
Remove the lines containing them plus all text
(possibly more than one line) in between.
Repeat until end of file myfile.
s/.*// means substitute all text
found for an empty string
Examples of commonly used sed Commands (2)
d
sed ‘1,2d’ myfile
delete everything in myfile between line 1 and
line 2
sed ‘5d’ myfile
delete the fifth line from myfile
sed ‘/^#/d’ myfiledelete all lines starting with # in myfile
p
sed -n ‘/BEGIN/,/END/p’ myfile
find a line containing BEGIN and print
that line and all following lines up to and including
a line containing END. Note: if there is no END,
sed will still print all text after BEGIN due to its
stream oriented nature - it doesn’t know there is
no END until it gets to the end of the file!
Flags to commands
• sed commands can be given flags. We
have already seen the substitute
command with the g flag:
A flag to the s
– s/lizard/koala/g
• Other flags to s are:
command. It tells s
to substitute ALL
occurences of…
– n - replace the nth occurrence of pattern
with replacement text
• e.g. sed ‘s/dog/cat/4’ myfile
– p - print pattern space to stdout if
substitution successful
• e.g. sed -n ‘s/dog/cat/p’ myfile
Flags to Commands (2)
– w filename - write the pattern space of
lines that are changed to resultsfile if
substitution successful
• e.g. sed ‘s/dog/cat/w resultsfile’ myfile
• NOTE: here there must be exactly ONE
SPACE between the w and the resultsfile
• resultsfile will contain only those lines that
sed applied the substitution to
Running sed from a Script
• sed commands can be put in a file
called a script
# this is my sed script
• E.g.:
s/horse/cow/g
s/chicken/duck/g
s/newt/lizard/g
A comment in sed
script.sed
• …and run from the command line:
$ sed -f script.sed myCreaturesFile
Piping to and from sed
(and a much more complicated example!)
• The UNIX who command gives an output:
$ who
zliybbs
zliybsj2
zliybyk2
zliybyk2
zliybbs
zliybyy2
zliybwj
zuczpd
zuczpd
zuczpd
zuczpd
zlizmj
pts/5
pts/6
pts/9
pts/10
pts/11
pts/12
pts/15
pts/17
pts/18
pts/19
pts/20
pts/1
Apr
Apr
Apr
Apr
Apr
Apr
Apr
Apr
Apr
Apr
Apr
Apr
8
8
6
6
8
8
6
6
6
6
6
9
19:11
18:42
14:30
14:31
19:15
20:10
14:34
14:44
14:44
14:44
14:45
08:49
(ss-226-host39.nottingham.edu.cn)
(10.20.50.15)
(ss-226-host67.nottingham.edu.cn)
(ss-226-host67.nottingham.edu.cn)
(10.20.10.85)
Piping to and from sed (2)
(and a much more complicated example!)
• If we wanted to extract only the
machine names from this output, we
could use the following command:
• who | sed -n ‘s/.*(\(.*\))/\1/p’
What ON EARTH
does this
mean???? ☺
Take the output
from the UNIX who
command and pipe
it onto sed
This denotes the start
of a region of interest
Take everything up to and
including the first open
bracket …
who | sed -n ‘s/.*(\(.*\))/\1/p’
Take everything after the
first open bracket “(“up to,
but not including, the close
bracket “)”and keep it for
future referencing in a
region of interest
This denotes the end
of a region of interest
…and substitute it with the
region of interest that was
saved earlier, referenced as
number 1
(REMEMBER from earlier: \n
means nth group)
Piping to and from sed (3)
• If we then wanted to sort the result
into alphabetical order, we could
pipe it onto sort:
who |sed -n ‘s/.*(\(.*\))/\1/p’ | sort
• We could then redirect the whole
output to a file:
who | sed -n ‘s/.*(\(.*\))/\1/p’ | sort >
machines.txt
An Example of Data
Manipulation using sed
• Suppose we had a file names.txt in the form
forename:surname (with a colon in between):
Steve:Bradford
Saun:Higgins
Gail:Hopkins
Sara:Mead
Fred:Smith
Henry:Taylor
• …and we wanted to reverse the names so that
they were in the order surname,forename (with
a comma in between)…
An Example of Data
Manipulation using sed (2)
• sed -e ‘s/\(.*\):\(.*\)/\2,\1/’
EXPLANATION:
This uses regions of interest. It
puts the forename in a region of
interest and then puts the
surname in another region of
interest. It then outputs the
second region of interest
followed by the first.
• …would produce the following output:
Bradford,Steve
Higgins,Saun
Hopkins,Gail
Mead,Sara
Smith,Fred
Taylor,Henry
Using Different Delimiters
• Often, / is used in sed scripts as a
delimiter
• However, other characters can be used
as delimiters instead
– sed takes the first character that it expects
to be the delimiter as the delimiter
• All of these are therefore equally viable:
s/horse/cow/g
s,horse,cow,g
s:horse:cow:g
s$horse$cow$g
• Why would we want a different
delimiter?
Using Different Delimiters (2)
• Suppose we had an HTML file which we
wanted to convert to XHTML
– We therefore want to change
•
•
•
•
•
all occurrences
all occurrences
all occurrences
all occurrences
and so on…
of
of
of
of
<H1> to <h1>
<H2 to <h2>
</H1> to </h1>
</H2> to </h2>
s/<H1>/<h1>/g
s/<H1>/<h1>/g
s:</H1>:</h1>:g
s:</H2>:</h2>:g
…
Here we have
used : as a
delimiter because
there are slashes
in the data
sed Tries to Match the
Longest Expression!
• Suppose we had an HTML file and we
wanted to remove all the markup:
<b>Welcome</b> to the <i>UST</i> website.
• We could instruct sed to find a ‘<‘
character followed by zero or more
other characters until a ‘>’ character:
• sed -e 's/<.*>//g' UST.html
• This would produce:
website.
Why??
sed Tries to Match the
Longest Expression! (2)
• …because sed tries to find the longest
expression that matches:
• <b>Welcome</b> to the
<i>UST</i>
• …instead, we need to specify that sed
looks for a ‘<‘ character followed by
zero or more non-‘>’ characters
followed by a ‘>’ character:
• sed -e 's/<[^>]*>//g' UST.html
• sed will then match <b> and </b> and
<i>, and so on…
Character Classes - POSIX
Compliant sed
• Often in sed you want to specify a regular
expression that contains white space
(TABs, spaces, etc.)
• POSIX compliant sed offers a simple way
of doing this with a character class:
• sed ‘s/[[:space:]]//g’ myfile
• Character classes give you a way of
specifying, within a regular expression,
types of characters to search for
Character Classes (2)
• [:alnum:] Alphanumeric
[a-z A-Z 0-9]
• [:alpha:] Alphabetic [a-z
A-Z]
• [:blank:] Spaces or tabs
• [:cntrl:] Any control
characters
• [:digit:] Numeric digits
[0-9]
• [:graph:] Any visible
characters (no
whitespace)
• [:lower:] Lower-case [az]
• [:print:] Non-control
characters
• [:punct:] Punctuation
characters
• [:space:] Whitespace
• [:upper:] Upper-case [AZ]
• [:xdigit:] hex digits [0-9
a-f A-F]
Summary
• An introduction to sed
• Format of sed statements
• Addresses
• Types of command
• Putting sed inside a script
• Some more advanced examples of
sed
Download