Working with Character Variables

advertisement
SAS Functions
SAS has approximately 150 functions in the following general areas:
Arithmetic
Array
Character
Date and Time
Financial
Mathematical
Probability
Quantile
Random Numbers
Sample Statistics
Special Functions
State and ZIP Code
Trigonometric and Hyperbolic
Truncation
SAS functions perform a calculation or a transformation of the arguments given in parentheses following the
function name.
Function-name(argument, argument, ...)
All functions must have parentheses even if they don't require any arguments. Arguments are separated by commas
and can be variable names, constant values such as numbers or characters enclosed in quotes, or expressions.
birthday = MDY( monborn, dayborn, yearborn);
/* compute a SAS date value using MDY function */
newvalue = INT( LOG(10) );
/* obtain the integer portion of the natural log of 10 */
leftphra = LEFT(charstng);
/* left align a SAS character expression */
a = 'my date';
x = LENGTH(a);
/* LENGTH returns the length of character string, x = 7*/
avrg = MEAN(score1, score2, score3);
/* compute the average of three variables for each individual */
seed = 5
uninum = RANUNI(seed)
/* RANUNI returns a random number generated from the uniform distribution on the interval (0, 1) */
Syntax
ABS (argument)
Description
argument is numeric.
The ABS function returns a nonnegative number equal in magnitude to that of the argument.
Example:
x = abs(2.4);
x = abs(-5);
The values returned are 2.40000 and 5.00000, respectively.
1
Syntax
BETAINV(p,a,b)
Description
p
is a numeric probability, with 0<p<1
a
is a numeric shape parameter, with a>0
b
is a numeric shape parameter, with b>0
The BETAINV function returns the p-th quantile form the beta distribution with shape parameters a and b. The probability that
an observation form a beta distribution is less than or equal to the returned quantile is p. The BETAINV function is the inverse
of the PRBBETA function.
Example:
X=betainv(.001,2,4);
The returned value is 0.01010. The beta distribution is related to many distribution
Syntax
EXP (argument)
Description
argument is numeric.
The EXP function raises the constant e, approximately given by 2.71828, to the power supplied by the argument. The result is
limited by the maximum value of floating-point decimal value on the computer.
Example:
x = exp(1);
x = exp(0);
The values returned are 2.71828 and 1.00000, respectively.
Syntax
FINV(p,ndf,ddf<,nc>)
Description
p
is a numeric probability, with 0<p<1
ndf
is a numeric numerator degrees of freedom parameter, with ndf>0
ddf
is a numeric denominator degrees of freedom parameter, with ddf>0
nc
is an optional numeric noncentrality parameter, with nc>=0.
The FINV function returns the p-th quantile form the F distribution with with numerator degrees of freedom ndf, denominator
degrees of freedom ddf, and noncentrality parameter nc. The probability that an observation from the F distribution is less than
the quantile is p. This function accepts noninterger degrees of freedom parameters ndf and ddf. The FIN function is the inverse
of the PROBF function.
If the optional parameter nc is not specified or has the value 0, the quantile form the central F distribution is returned.
Example:
q1=finv(.95,2,10);
q2=finv(.95,2,10.3,2);
The values returned are 4.1028 and 7.5838, respectively.
2
Syntax
INT (argument)
Description
argument is numeric.
The INT function returns the integer portion of the argument. If the value of argument is positive INT(argument) has the same
result as FLOOR(argument). If the value of argument is negative, INT(argument) has the same result as CEIL(argument).
Example:
X = int(2.4);
X = abs(-5);
The values returned are 2 and -5, respectively.
LOG(argument)
MAX(argument1,argument2, …)
MIN(argument1,argument2, …)
MEAN(argument1,argument2, …)
MOD(argument1,argument2)
x=mod(6,3) returns 0 , x=mod(10,3) returns 1
MINUTE(<time | datetime>) time=’3:19:24’t; m=minute(time); produce a value 19 for m.
MONTH(date)
SECOND(<time | datetime>)
NORMAL(seed) return a standard normal random number. “seed” is an integer. If seed<=0, the time of day is used to
initialize the seed stream.
POISSON(m,n) m is numeric mean parameter, n is an integer random variable. The POISSON function returns the probability
that an observation form POISSON distribution, with mean m, is less than or equal to n.
GAMINV(p,a) returns the p-th quantile from the gamma distribution.
PROBBETA(x,a,b) returns the probability that an observation from beta distribution.
PROBBNML(p,n,m) returns the probability that an observation from a binomial distribution, with probability of success p,
number of trials n, and number of successes m, is less than or equal to m.
PROBCHI(x,df<,nc>) returns the probability that an observation form a chi-square distribution, with degrees of freedom df and
noncentrality parameter nc, is less than or equal to x.
PROBF(x, ndf,ddf<,nc>) returns the probability that an observation from an F distribution.
PROBGAM(x,a) returns the probability that an observation from a gamma distribution, with shape parameter a, is less than or
equal to x.
PROBIT(p) returns the p-th quantile from the standard normal distribution. The PROBIT is the inverse of PROBNORM(x).
PROBT(x,df<,nc>) returns the probability that an observation from a Student’s t distribution, with degrees of freedom df and
noncentrality parameter nc, is less than or equal to x.
RANBIN(seed,n,p) returns a variate generated from binomial distribution with mean np and variance np(1-p).
RANCAU(seed) returns a variate generated from a Cauchy distribution wit location parameter 0 and scale parameter 1.
RANEXP(seed) returns a variate generated from a exponential distribution.
RANGAM(seed,a) returns a variate generated from a gamma distribution with parameter a.
RANK(x) returns an integer representing the position of a character in the ASCII or EBCDIC collating sequence.
RANNOR(seed) returns a variate generated from a standard normal distribution.
RANPOI(seed,m) returns a variate generated from a Poisson distribution with mean m.
RANUNI(seed) returns a number generated from the uniform distribution on the interval (0,1).
SIGN(argument) returns a value of –1 if x<0; a value of 0 if x=0 and a value of 1 if x>0.
SQRT(argument) returns the square root of the argument.
STD(argument, argument, …)
STDERR(argument, argument, …)
TINV(p,df<,nc>) returns the p-th quantile from the student’s t distribution with degrees of freedom df and an noncentrality
parameter nc.
TODAY() returns the current date.
3
Working with Character Variables
DATA
air.depart;
INPUT
country
$ 1-9
cities
11-12
usgate
$ 14-26
othrgate
$ 28-48;
CARDS;
Japan
5 San Francisco Tokyo, Osaka
Italy
8 New York
Rome, Naples
Australia 12 Honolulu
Sydney, Brisbane
;
DATA showchar;
LENGTH usairpt $ 10;
SET air.depart;
schedule = '3-4 tours per season';
remarks ="See last year's schedule";
IF usgate = 'San Francisco' THEN usairpt = 'SFO';
ELSE IF usgate = 'Honolulu'
THEN usairpt = 'HNL';
ELSE IF usgate = 'New York'
THEN usairpt= 'JFK or EWR';
PROC
PRINT DATA=showchar;
VAR country schedule remarks usgate usairpt;
TITLE 'Examples of Some Character Variables';
RUN;
-------------------------------------------------------------------------------------------------------------------------------------------------Examples of Some Character Variables
OBS
COUNTRY
1
2
3
Japan
Italy
Australia
SCHEDULE
3-4 tours per season
3-4 tours per season
3-4 tours per season
REMARKS
USGATE
See last year's schedule San Francisco
See last year's schedule New York
See last year's schedule Honolulu
USAIRPT
SFO
JFK or EWR
HNL
Extracting a Portion of a Character Value
SCAN (source,n<,list-of-delimiters>) /* blank is default delimiter */
LEFT (source) /*left-alignment*/
DATA air.arvdept;
/*LENGTH can be used to assign variable length */
SET air.depart;
arvgate=SCAN(othrgate,1,' , ');
deptgate=LEFT(SCAN(othrgate,2,' , '));
PROC PRINT DATA = air.arvdept;
VAR country othrgate arvgate deptgate;
TITLE 'Dividing Character Values into Terms';
4
---------------------------------------------------------------------Examples of Some Character Variables
OBS
COUNTRY
OTHRGATE
ARVGATE
DEPTGATE
1
2
3
Japan
Italy
Australia
Tokyo, Osaka
Rome, Naples
Sydney, Brisbane
Tokyo
Rome
Sydney
Osaka
Naples
Brisbane
Combining Character Values: Concatenation, Trimming Blanks
DATA
all;
SET air.depart;
allgate=TRIM(usgate) | | ', ' | |othrgate;
/* TRIM drops trailling blanks */
IF country = 'Brazil' THEN allgate=othrgate;
PROC PRINT DATA=all;
VAR country usgate othrgate allgate;
TITLE 'Readable Concatenated Values';
------------------------------------------------------------------------------------Examples of Some Character Variables
OBS
COUNTRY
USGATE
OTHRGATE
ALLGATE
1
2
3
Japan
Italy
Australia
San Francisco
New York
Honolulu
Tokyo, Osaka
Rome, Naples
Sydney, Brisbane
San Francisco, Tokyo, Osaka
New York, Rome, Naples
Honolulu, Sydney, Brisbane
Selection of Observations
SAS Output
--------------------------------------------------------------------------Data Set ARTS.ARTTOUR
OBS CITY NIGHTS LANDCOST EVENTS
DESCRIBE
1
Rome 3
750
7
4 M, 3 G
2
Paris 8
1680
6
5 M, 1 other
......
DATA
GUIDE
D'Amico
Lucas
BACKUP
Torres
Lucas
revise;
SET arts.arttour;
IF
city='Rome' THEN landcost=landcost+30;
IF
events>nights THEN calendar='Check schedule';
ELSE calendar='No Problem';
IF
guide='Lucas' AND nights>7
THEN guide='Torres';
IF landcost>=1500 THEN price='High
';
ELSE IF landcost>=700 THEN price='Medium';
ELSE price='Low';
5
Using More than One Comparisons in a Condition
(with AND, &, OR, | )
/* In a SAS condition statement, AND has higher priority than OR.
IF city='Paris' OR city='Rome' AND guide='Lucas' OR
guide="D'Amico"
THEN topic='Art history';
IF (city='Paris' OR city='Rome') AND
(guide='Lucas' OR guide="D'Amico")
THEN topic='Art history';
/* In computing terms, a value of TRUE is a 1 and a value of FALSE is 0. In SAS system, any numeric value
other than 0 or missing is true; a value of 0 or missing is false.
IF
landcost
THEN remarks='Ready to budget';
/* is equivalent to */
IF landcost NE . AND landcost NE 0
THEN remarks='Ready to budget';
/* The SAS system distinguishes between uppercase and lowercase letters in comparisons.
UPCASE(city) is not the same as 'city'
IF
UPCASE(city) = 'MADRID'
THEN
guide='Duncan';
/* Comparing with a shorter character string
IF guide =: 'D' THEN chosen = 'Yes';
ELSE chosen = 'No';
/* In this example guide='D' select a record which its guide variable contains a string of 8 characters long, 'D
' */
IF guide<=:'L' THEN group='A-L';
ELSE group ='M-Z';
/* In this example, guide<=:'L' helps selecting records whose variable 'guide' contain string that start with a letter
less than letter L. */
Finding a Value Anywhere within Another Character Value
INDEX(source, excerpt)
/* The function returns the position of the first character of excerpt, which is a positive number. If the excerpt
doesn't occur in the source, the function return a 0. */
Example:
IF INDEX(describe, 'other')
ELSE otherev='No';
THEN
otherev='Yes'
6
Download