*LOCUS NUMBER;
*THIS PROGRAM CAN UTILIZE ANY TYPE OF AUTOTETRAPLOID MARKER DATA, WETHER
DOMINANT, CO-DOMINANT, OR CO-DOMINANT WITH AMBIGUES DOSAGE. SSR IS THE TERM
USED IN THE CODE BUT APPLIES TO ANY TYPE OF MARKER USED;
%MACRO SSR ;
27
%MEND SSR;
*SETS MAXIMUM NUMBER OF ALLELES OBSERVED PER LOCUS AT THE LOCUS WITH THE MOST
ALLELES THE PROGRAM CAN HANDLE A NUMBER UP TO 99;
%MACRO MAXALL ;
20
%MEND MAXALL;
*GENERAL NOTE THIS PROGRAM IS CURRENTLY SET UP FOR AUTOTETRAPLOIDS WITH 27
SSR SCORED ON INDIVIDUALS;
*DATA SET A IS THE PROGENY DATA, MARKERS HAVE TO BE LABELED M + ALLELE +
LOCUS IN SEQUENTIAL ORDER FOR SAS MACROS TO PROCESS;
*EACH ALLELE PER LOCUS PER PROGENY SHOULD ONLY APPEAR ONCE IRRESPECTIVE OF
DOSAGE IF KNOWN;
*ALLELES HAVE TO BE RECORDED AS NUMBERS;
*DATA SET A CONTAINS AN EXAMPLE OF 4 GENOTYPES;
DATA A;
LENGTH FEMALE $ 16 ;
INPUT ID FEMALE $
M11 M21 M31 M41 M12 M22 M32 M42 M13 M23 M33 M43 M14 M24
M34 M44 M15 M25 M35 M45 M16 M26 M36 M46 M17 M27 M37 M47
M18 M28 M38 M48 M19 M29 M39 M49 M110 M210 M310 M410 M111 M211
M311 M411 M112 M212 M312 M412 M113 M213 M313 M413 M114 M214 M314 M414
M115 M215 M315 M415 M116 M216 M316 M416 M117 M217 M317 M417 M118 M218
M318 M418 M119 M219 M319 M419 M120 M220 M320 M420 M121 M221 M321 M421
M122 M222 M322 M422 M123 M223 M323 M423 M124 M224 M324 M424 M125 M225
M325 M425 M126 M226 M326 M426 M127 M227 M327 M427;
CARDS ;
1141 S1-P1 136 142 145 .
.
.
265
196
271
.
.
.
.
.
161 164 172 .
327 330 339 351 143 146 .
. . .
188 195 .
.
.
221 229 232
243
242
.
.
.
.
.
.
.
.
.
.
273 275 .
. . .
. .
374 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1142 S1-P2 136 142 145 .
128
244
139
.
247
.
150 .
158 .
.
.
.
.
.
.
.
.
.
.
168 177 .
.
.
.
.
.
153 .
.
.
.
.
.
.
.
.
.
265 271 .
273 . .
278 286 .
.
192 196 199 .
.
.
161 164 172 . 221 225 .
327 330 348 351 143 154 .
231 . .
188 195 .
.
.
243
242
246
.
260
.
.
.
.
.
200 213 .
374 389 .
.
.
.
.
.
.
.
.
.
.
128 . .
244 247 .
.
.
.
.
158 .
.
.
.
.
.
.
.
.
177 184 .
. . .
144 .
. .
. . .
.
.
.
1143 S1-P2 136 142 145 .
. 265 268 271 .
161
327
164
330
169
.
.
.
221
143
229
146
232
149
.
.
.
.
.
.
.
.
192 196 .
273 . .
.
.
283 286 294 .
207 . . .
374 .
146 .
. .
.
.
.
.
.
.
1144 S2-P5 136 142 .
. 265 268 271
.
.
.
.
192
273
196
275
.
.
.
.
.
.
.
.
286 294 .
207 . .
374 389 .
146 . .
. . . .
.
*ETC WITH ONE LINE PER PROGENY;
;
.
.
.
.
.
.
.
.
.
.
.
.
.
158 .
.
.
128 . .
244 247 .
.
.
.
.
.
.
.
.
.
.
243 245 .
242 . .
177 180 184
. . .
144 .
153 .
. .
.
.
.
161 167 169 . 221 225 232
327 330 348 351 143 146 .
. .
188 .
.
.
.
.
158 .
.
.
128 . .
244 247 .
.
.
.
.
.
.
.
.
.
.
243
242
260
.
.
.
177 180 .
.
.
.
144 .
153 .
.
.
.
.
.
*DATA SET B UPDATES PROGENY SET TO ADD F TO CREATE A SEQUENTIAL NUMBER
VARIABLE F FOR FEMALES FOR SAS MACRO PROCESSING;
DATA B; SET A;
IF FEMALE = 'S1-P1' THEN F = 1 ;
IF FEMALE = 'S1-P2' THEN F = 2 ;
IF FEMALE = 'S1-P3' THEN F = 3 ;
IF FEMALE = 'S2-P5' THEN F = 4 ; *ETC;
IF F = '.' THEN DELETE ; *THIS ELIMINATES ANY PROGENY THAT DO NOT HAVE A
SEQUENTIAL F NUMBER;
RUN ;
2
*DATA SET C1 CONTAINS PARENTAL GENOTYPES, MARKERS AGAIN LABLED P + ALLELE +
LOCUS IN SEQUENTIAL ORDER FOR SAS MACROS TO PROCESS;
DATA C1;
INPUT F ID
P11 P21 P31 P41 P12 P22 P32 P42 P13 P23 P33 P43 P14 P24
P34 P44 P15 P25 P35 P45 P16 P26 P36 P46 P17 P27 P37 P47
P18 P28 P38 P48 P19 P29 P39 P49 P110 P210 P310 P410 P111 P211
P311 P411 P112 P212 P312 P412 P113 P213 P313 P413 P114 P214 P314 P414
P115 P215 P315 P415 P116 P216 P316 P416 P117 P217 P317 P417 P118
P218 P318 P418 P119 P219 P319 P419 P120 P220 P320 P420 P121 P221 P321 P421
P122 P222 P322 P422 P123 P223 P323 P423 P124 P224 P324 P424 P125
P225 P325 P425 P126 P226 P326 P426 P127 P227 P327 P427;
CARDS ;
1 1.1 136 142 . .
232 265 268 271 .
.
.
192
273
196
.
.
.
.
.
161 164 169 172 221 225 229
327 330 348 351 143 146 .
. . .
188 195 .
.
.
243
242
246
.
260
.
.
.
.
.
278 286 292 294 128 .
. . . . 244 247
.
.
374 389 .
. . .
. . .
.
.
.
.
150
158
.
.
.
.
.
. .
.
2.1 136 142 145 .
. 265 271 . .
161 169 . .
327 330 339 .
.
.
.
.
.
177 184 .
. . .
144 .
153 .
. . .
.
.
221
140
229
143
.
146
.
.
.
.
149 192 199 .
. 265 273 .
.
.
278
207
283
.
286
.
.
.
.
.
374 379 389 .
146 . . .
155 . . .
. . . .
188 198 208 .
128 134 .
244 . .
.
.
.
.
158 .
.
.
.
.
.
.
.
.
243 246 249
240 242 .
177 184 .
. . .
.
.
.
.
.
.
.
.
.
3 3.1 136 142 .
. 263 265 268
.
271
167
327
169
330
172
.
.
.
.
.
189
.
192
.
196
.
.
.
231
188
.
195
.
.
.
.
221 229 232
140 143 .
243 260 261
242 255 .
.
.
.
.
286 294 .
202 . .
374 379 .
146 . .
. . .
.
.
.
.
.
128 143 .
244 247 .
. .
150 .
158 .
.
.
.
.
.
.
.
.
177 180 184
. . .
.
.
.
.
.
.
.
.
.
.
.
*ETC. WITH ONE LINE PER PARENT;
;
DATA C; SET C1;
* THIS DATA SET CAN BE USED TO ALTER C1. C NEEDS MARKER DATA AND A
SEQUENTIAL F NUMBER FOR TESTED MOTHERS SO THAT FALSE POSITIVE PROBIBILITIES
CAN BE CALCULATED FOR EACH MOTHER BASED ON THE MOTHERS GENOTYPE;
DROP ID;
RUN ;
*ALLELE FREQUENCY CALCULATIONS;
*****IMPORTANT!!!**** BE SURE TO RECORD THE TOTAL NUMBER OF OBSERVATIONS IN
BX1 FROM THE LOG SO THIS NUMBER CAN BE MANUALLY INPUTTED INTO THE SAS CODE IN
%MACRO X4;
DATA BX1; SET B; *FIRST STEP USES PROGENY DATA FILE B;
%MACRO X1 ;
%DO A = 1 %TO % SSR
IF M1&A = .
%BY 1 ;
THEN FLAG&A = 1 ; *IDENTIFIES NULLIPLEX LOCI;
%END ;
%MEND ;
% X1 ;
MC = SUM(OF FLAG:);
INDIVIDUAL;
*DETERMINES TOTAL NUMBER OF NULLIPLEX LOCI PER
IF MC > 10 THEN DELETE ; *ALLOWS USER TO DELETE INDIVIDUALS DEEMED
HAVING TO MANY NULLIPLEX LOCI, IN THIS EXAMPLE CASE INDIVIDUALS HAVING > 10
NULLIPLEX LOCI;
RUN ;
*STACKS ALL 4 ALLELE POSITIONS INTO ONE ALLELE POSITION PER MARKER, END
RESULT IS DATA SET B2;
PROC SORT DATA =BX1;
BY ID;
RUN ;
%MACRO X2 ;
%DO A = 1 %TO % SSR
M&A = M&B&A;
%BY 1 ;
%END ;
%MEND ;
%MACRO X3 ;
%DO B = 1 %TO 4 %BY 1 ;
DATA BB1&B; SET BX1;
% X2 ;
RUN;
%END ;
%MEND ;
% X3 ;
DATA B2; SET BB11 BB12 BB13 BB14;
RUN ;
*FIRST ESTIMATES INDIVIDUAL DOSAGE-UNADJUSTED ALLELE FREQUENCIES AT EACH
LOCUS;
%MACRO X4 ;
%DO A = 1 %TO % SSR %BY 1 ;
PROC FREQ DATA=B2; *DOSAGE-UNADJUSTED ALLELE FREQUENCY
ESTMIMATION;
TABLE M&A/OUT=B3&A;
RUN;
DATA B4&A; SET B3&A; *ASSUMING RANDOM MATTING EQUILIBRIUM THIS
STEP ESTIMATES ALLELE FREQUENCIES OF EACH ALLELE AT EACH LOCUS;
IF M&A = .
THEN DELETE;
DROP PERCENT;
FC = 1 -(( 1 -(COUNT/ 1000 ))**( 1 / 4 )); *****IMPORTANT!!!**** THE
1000 NEEDS TO BE MANUALLY CHANGED TO TOTAL NUMBER OF PROGENY EVALUATED GET
THIS NUMBER FROM TOTAL NUMBER OF OBSERVATIONS IN DATA SET BX1;
IF FC = 1 THEN FC = 0 ;
IF M&A = 0 THEN FC = (COUNT/ 1000 )**( 1 / 4 );
*****IMPORTANT!!!**** THE 1000 NEEDS TO BE MANUALLY CHANGED TO TOTAL NUMBER
OF PROGENY EVALUATED GET THIS NUMBER FROM TOTAL NUMBER OF OBSERVATIONS IN
DATA SET BX1;
MARK = &A;
ALLELE = M&A;
DROP M&A;
RUN;
%END ;
%MEND ;
% X4 ;
%MACRO X5 ;
%DO A = 1 %TO % SSR %BY 1 ;
B4&A
%END ;
%MEND ;
DATA B4; SET % X5 ; *THIS DATA SET CONTAINS ESTIMATED ALLELE FREQUENCIES OF
EACH ALLELE AT EACH LOCUS. CAUTION ALLELE FREQUENCIES AT A LOCUS OFTEN DO NOT
EXACTLY ADD UP TO ONE DUE TO CALCULATION METHODOLOGY (DUE TO 1. PRESENCE OF
NULL ALLELES; 2. STOCHASTIC ERROR ASSOCIATED WITH SAMPLE SIZE SAMPLED FROM
THE POPULATION; AND/OR 3. VIOLATIONS OF RANDOM MATING EQUILIBRIUM). IT IS
RECOMMENDED THAT AT LEAST 400 TO 500 PROGENY ARE SAMPLED FROM THE POPULATION
TO GET ACCEPTABLE POPULATION ALLELE FREQUENCY ESTIMATES DUE TO AUTOTETRAPLOID
GENETICS MAKING IT HARDER TO DETERMINE ALLELE FREQUENCIES.;
RUN ;
*PROCESSES ALLELE FREQUENCY DATA SET B4 TO MERGE TO THE PARENTAL GENOTYPE
FILE C;
%MACRO X7 ;
%DO M = 1 %TO % MAXALL %BY 1 ;
%IF &S < 10 %THEN %DO ;
RENAME COL&M = AL0&S&M;
%END ;
%ELSE %DO ;
%END ;
RENAME COL&M = AL&S&M;
%END ;
%MEND ;
%MACRO X8 ;
%DO M = 1
%IF
%TO %
&S <
MAXALL
10
%BY
%THEN
1
%DO
;
;
RENAME COL&M = FC0&S&M;
%END
%END
;
%ELSE
;
%DO ;
RENAME COL&M = FC&S&M;
%END ;
%MEND ;
%MACRO X12 ;
%DO S = 1 %TO % SSR %BY 1 ;
DATA B5&S; SET B4;
RUN;
IF MARK NE &S THEN DELETE;
KEEP ALLELE;
PROC TRANSPOSE OUT=B6&S;
DATA B7&S; SET B6&S;
DROP _NAME_;
RUN;
% X7 ;
DUMB = 1 ;
DATA B8&S; SET B4;
RUN;
IF MARK NE &S THEN DELETE;
KEEP FC;
PROC TRANSPOSE OUT=B9&S;
DATA BX&S; SET B9&S;
DROP _NAME_;
% X8 ;
DUMB = 1 ;
%END ;
RUN;
%MEND ;
% X12 ;
*MERGES POPULATION ALLELE FREQUENCY DATA TO PARENTAL GENOTYPE FILE C;
DATA B11; SET C;
DUMB = 1 ;
RUN ;
%MACRO X13 ;
%DO S = 1 %TO % SSR
BX&S B7&S
%BY 1 ;
%END ;
%MEND ;
DATA B12; MERGE B11 % X13 ;
BY DUMB;
PRR = 1 ; *SETS THE OVRALL FALSE POSITIVE FREQUENCY TO ONE, THIS NUMBER
GETS ITERATIVLY SMALLER WITH EACH LOCUS ANALYZED;
%MACRO X15 ;
*THIS MACRO DETERMINES THE SUM OF ALLELE FREQUENCIES AT EACH
LOCUS TERMED "ALTOT";
%DO S = 1 %TO % SSR
ALTOT&S =
%DO M =
%IF
1
0 ;
%TO
%BY
%
&S <
1 ;
MAXALL
10
%BY
%THEN
1
%DO
;
;
IF FC0&S&M NE .
THEN DO;
ALTOT&S = ALTOT&S + FC0&S&M;
%END ;
END;
%ELSE %DO ;
IF FC&S&M NE
%END ;
%END ;
END;
.
THEN DO;
ALTOT&S = ALTOT&S + FC&S&M;
NULL&S = 1 - ALTOT&S; *1 - ALLELE FREQUENCY TOTAL PER LOCUS
IS ASSIGNED AS THE NULL ALLELE FREQUENCY;
IF NULL&S < 0 THEN NULL&S = 0 ; *IF TOTAL ALLELE FRQUENCY IS
GREATER THAN ONE THAN NULL ALLELE FREQUENCY IS SET TO ZERO;
%END ;
%MEND ;
% X15 ;
%MACRO X14 ;
%DO S = 1 %TO % SSR %BY 1 ;
* MACRO X9&X10 CREATES FL VARIABLES WHICH INDICATES WHICH
OF ALL POSSIBLE ALLELES FOR A LOCUS ARE PRESENT IN THE MOTHER;
%MACRO X9
%DO
;
M = 1
%IF
%TO %
&S <
%END ;
MAXALL
10
%BY
%THEN
FL0&S&M = 0
1
%DO
;
;
;
%ELSE %DO ;
%END ;
%END ;
FL&S&M = 0 ;
%MEND ;
% X9 ;
%MACRO X10 ;
%DO P = 1 %TO 4 %BY 1 ;
%DO M = 1
%IF
%TO %
&S <
MAXALL
10
%BY
%THEN
1
%DO
;
;
IF P&P&S NE .
THEN DO;
IF P&P&S = AL0&S&M THEN
FL0&S&M = 1 ;
%END ;
END;
%ELSE %DO ;
IF P&P&S NE .
THEN DO;
IF P&P&S = AL&S&M THEN FL&S&M
= 1 ;
%MEND ;
%END ;
%END ;
%END ;
END;
% X10 ;
PR&S = 0 ;
*AT EACH LOCUS SUMS POPULATION ALLELE FREQUENCIES OF ALL
ALLELES OBSERVED IN THE MOTHER;
%MACRO X11 ;
%DO M = 1 %TO % MAXALL %BY 1 ;
%IF &S < 10 %THEN %DO
IF FL0&S&M = 1
;
THEN PR&S = PR&S +
FC0&S&M;
%END ;
%ELSE %DO ;
%END ;
%END ;
IF FL&S&M = 1 THEN PR&S = PR&S + FC&S&M;
%MEND ;
% X11 ;
PR&S = PR&S + NULL&S; *FOR EACH LOCUS SUM OF POPULATION
ALLELES FREQUENCIES OBSERVED IN THE MOTHER + NULL FREQUNCIES;
IF PR&S > 1 THEN PR&S = 1 ; *PR VALUES FROM LINE ABOVE CAN
HAVE A MAXIMUM OF 1;
PRR&S = (PR&S)** 2 ; *PR VALUES SQUARED GIVES PRR[LOCUS
NUMBER] = FOR THE LOCUS PROBABILITY OF OBSERVING THE GIVEN MATERNAL ALLELE
CONFIGURATION;
PRR = PRR&S * PRR; *GIVES THE PRODUCT OF ALL PRR VALUES FOR
THE GRAND PROBABILITY OF OBSERVING THE MATERNAL ALLELE CONFIGURATION ACROSS
ALL LOCI;
%END ;
%MEND ;
% X14 ;
RUN ;
*PRINTS THE OUTPUT OF PRR AND ALL PRR[LOCUS] VALUES;
%MACRO X16 ;
%DO S = 1 %TO % SSR %BY 1 ;
PRR&S
%END ;
%MEND ;
PROC PRINT DATA =B12;
VAR F PRR % X16 ;
RUN ;