Supplement 1

advertisement

*LOCUS NUMBER;

*THIS PROGRAM CAN UTILIZE ANY TYPE OF AUTOTETRAPLOID MARKER DATA, WETHER

DOMINANT, CO-DOMINANT, OR CO-DOMINANT WITH AMBIGUES DOSAGE. SSR IS THE TERM

USED IN THE CODE BUT APPLIES TO ANY TYPE OF MARKER USED;

%MACRO SSR ;

27

%MEND SSR;

*SETS MAXIMUM NUMBER OF ALLELES OBSERVED PER LOCUS AT THE LOCUS WITH THE MOST

ALLELES THE PROGRAM CAN HANDLE A NUMBER UP TO 99;

%MACRO MAXALL ;

20

%MEND MAXALL;

*GENERAL NOTE THIS PROGRAM IS CURRENTLY SET UP FOR AUTOTETRAPLOIDS WITH 27

SSR SCORED ON INDIVIDUALS;

*DATA SET A IS THE PROGENY DATA, MARKERS HAVE TO BE LABELED M + ALLELE +

LOCUS IN SEQUENTIAL ORDER FOR SAS MACROS TO PROCESS;

*EACH ALLELE PER LOCUS PER PROGENY SHOULD ONLY APPEAR ONCE IRRESPECTIVE OF

DOSAGE IF KNOWN;

*ALLELES HAVE TO BE RECORDED AS NUMBERS;

*DATA SET A CONTAINS AN EXAMPLE OF 4 GENOTYPES;

DATA A;

LENGTH FEMALE $ 16 ;

INPUT ID FEMALE $

M11 M21 M31 M41 M12 M22 M32 M42 M13 M23 M33 M43 M14 M24

M34 M44 M15 M25 M35 M45 M16 M26 M36 M46 M17 M27 M37 M47

M18 M28 M38 M48 M19 M29 M39 M49 M110 M210 M310 M410 M111 M211

M311 M411 M112 M212 M312 M412 M113 M213 M313 M413 M114 M214 M314 M414

M115 M215 M315 M415 M116 M216 M316 M416 M117 M217 M317 M417 M118 M218

M318 M418 M119 M219 M319 M419 M120 M220 M320 M420 M121 M221 M321 M421

M122 M222 M322 M422 M123 M223 M323 M423 M124 M224 M324 M424 M125 M225

M325 M425 M126 M226 M326 M426 M127 M227 M327 M427;

CARDS ;

1141 S1-P1 136 142 145 .

.

.

265

196

271

.

.

.

.

.

161 164 172 .

327 330 339 351 143 146 .

. . .

188 195 .

.

.

221 229 232

243

242

.

.

.

.

.

.

.

.

.

.

273 275 .

. . .

. .

374 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1142 S1-P2 136 142 145 .

128

244

139

.

247

.

150 .

158 .

.

.

.

.

.

.

.

.

.

.

168 177 .

.

.

.

.

.

153 .

.

.

.

.

.

.

.

.

.

265 271 .

273 . .

278 286 .

.

192 196 199 .

.

.

161 164 172 . 221 225 .

327 330 348 351 143 154 .

231 . .

188 195 .

.

.

243

242

246

.

260

.

.

.

.

.

200 213 .

374 389 .

.

.

.

.

.

.

.

.

.

.

128 . .

244 247 .

.

.

.

.

158 .

.

.

.

.

.

.

.

.

177 184 .

. . .

144 .

. .

. . .

.

.

.

1143 S1-P2 136 142 145 .

. 265 268 271 .

161

327

164

330

169

.

.

.

221

143

229

146

232

149

.

.

.

.

.

.

.

.

192 196 .

273 . .

.

.

283 286 294 .

207 . . .

374 .

146 .

. .

.

.

.

.

.

.

1144 S2-P5 136 142 .

. 265 268 271

.

.

.

.

192

273

196

275

.

.

.

.

.

.

.

.

286 294 .

207 . .

374 389 .

146 . .

. . . .

.

*ETC WITH ONE LINE PER PROGENY;

;

.

.

.

.

.

.

.

.

.

.

.

.

.

158 .

.

.

128 . .

244 247 .

.

.

.

.

.

.

.

.

.

.

243 245 .

242 . .

177 180 184

. . .

144 .

153 .

. .

.

.

.

161 167 169 . 221 225 232

327 330 348 351 143 146 .

. .

188 .

.

.

.

.

158 .

.

.

128 . .

244 247 .

.

.

.

.

.

.

.

.

.

.

243

242

260

.

.

.

177 180 .

.

.

.

144 .

153 .

.

.

.

.

.

*DATA SET B UPDATES PROGENY SET TO ADD F TO CREATE A SEQUENTIAL NUMBER

VARIABLE F FOR FEMALES FOR SAS MACRO PROCESSING;

DATA B; SET A;

IF FEMALE = 'S1-P1' THEN F = 1 ;

IF FEMALE = 'S1-P2' THEN F = 2 ;

IF FEMALE = 'S1-P3' THEN F = 3 ;

IF FEMALE = 'S2-P5' THEN F = 4 ; *ETC;

IF F = '.' THEN DELETE ; *THIS ELIMINATES ANY PROGENY THAT DO NOT HAVE A

SEQUENTIAL F NUMBER;

RUN ;

2

*DATA SET C1 CONTAINS PARENTAL GENOTYPES, MARKERS AGAIN LABLED P + ALLELE +

LOCUS IN SEQUENTIAL ORDER FOR SAS MACROS TO PROCESS;

DATA C1;

INPUT F ID

P11 P21 P31 P41 P12 P22 P32 P42 P13 P23 P33 P43 P14 P24

P34 P44 P15 P25 P35 P45 P16 P26 P36 P46 P17 P27 P37 P47

P18 P28 P38 P48 P19 P29 P39 P49 P110 P210 P310 P410 P111 P211

P311 P411 P112 P212 P312 P412 P113 P213 P313 P413 P114 P214 P314 P414

P115 P215 P315 P415 P116 P216 P316 P416 P117 P217 P317 P417 P118

P218 P318 P418 P119 P219 P319 P419 P120 P220 P320 P420 P121 P221 P321 P421

P122 P222 P322 P422 P123 P223 P323 P423 P124 P224 P324 P424 P125

P225 P325 P425 P126 P226 P326 P426 P127 P227 P327 P427;

CARDS ;

1 1.1 136 142 . .

232 265 268 271 .

.

.

192

273

196

.

.

.

.

.

161 164 169 172 221 225 229

327 330 348 351 143 146 .

. . .

188 195 .

.

.

243

242

246

.

260

.

.

.

.

.

278 286 292 294 128 .

. . . . 244 247

.

.

374 389 .

. . .

. . .

.

.

.

.

150

158

.

.

.

.

.

. .

.

2.1 136 142 145 .

. 265 271 . .

161 169 . .

327 330 339 .

.

.

.

.

.

177 184 .

. . .

144 .

153 .

. . .

.

.

221

140

229

143

.

146

.

.

.

.

149 192 199 .

. 265 273 .

.

.

278

207

283

.

286

.

.

.

.

.

374 379 389 .

146 . . .

155 . . .

. . . .

188 198 208 .

128 134 .

244 . .

.

.

.

.

158 .

.

.

.

.

.

.

.

.

243 246 249

240 242 .

177 184 .

. . .

.

.

.

.

.

.

.

.

.

3 3.1 136 142 .

. 263 265 268

.

271

167

327

169

330

172

.

.

.

.

.

189

.

192

.

196

.

.

.

231

188

.

195

.

.

.

.

221 229 232

140 143 .

243 260 261

242 255 .

.

.

.

.

286 294 .

202 . .

374 379 .

146 . .

. . .

.

.

.

.

.

128 143 .

244 247 .

. .

150 .

158 .

.

.

.

.

.

.

.

.

177 180 184

. . .

.

.

.

.

.

.

.

.

.

.

.

*ETC. WITH ONE LINE PER PARENT;

;

DATA C; SET C1;

* THIS DATA SET CAN BE USED TO ALTER C1. C NEEDS MARKER DATA AND A

SEQUENTIAL F NUMBER FOR TESTED MOTHERS SO THAT FALSE POSITIVE PROBIBILITIES

CAN BE CALCULATED FOR EACH MOTHER BASED ON THE MOTHERS GENOTYPE;

DROP ID;

RUN ;

*ALLELE FREQUENCY CALCULATIONS;

*****IMPORTANT!!!**** BE SURE TO RECORD THE TOTAL NUMBER OF OBSERVATIONS IN

BX1 FROM THE LOG SO THIS NUMBER CAN BE MANUALLY INPUTTED INTO THE SAS CODE IN

%MACRO X4;

DATA BX1; SET B; *FIRST STEP USES PROGENY DATA FILE B;

%MACRO X1 ;

%DO A = 1 %TO % SSR

IF M1&A = .

%BY 1 ;

THEN FLAG&A = 1 ; *IDENTIFIES NULLIPLEX LOCI;

%END ;

%MEND ;

% X1 ;

MC = SUM(OF FLAG:);

INDIVIDUAL;

*DETERMINES TOTAL NUMBER OF NULLIPLEX LOCI PER

IF MC > 10 THEN DELETE ; *ALLOWS USER TO DELETE INDIVIDUALS DEEMED

HAVING TO MANY NULLIPLEX LOCI, IN THIS EXAMPLE CASE INDIVIDUALS HAVING > 10

NULLIPLEX LOCI;

RUN ;

*STACKS ALL 4 ALLELE POSITIONS INTO ONE ALLELE POSITION PER MARKER, END

RESULT IS DATA SET B2;

PROC SORT DATA =BX1;

BY ID;

RUN ;

%MACRO X2 ;

%DO A = 1 %TO % SSR

M&A = M&B&A;

%BY 1 ;

%END ;

%MEND ;

%MACRO X3 ;

%DO B = 1 %TO 4 %BY 1 ;

DATA BB1&B; SET BX1;

% X2 ;

RUN;

%END ;

%MEND ;

% X3 ;

DATA B2; SET BB11 BB12 BB13 BB14;

RUN ;

*FIRST ESTIMATES INDIVIDUAL DOSAGE-UNADJUSTED ALLELE FREQUENCIES AT EACH

LOCUS;

%MACRO X4 ;

%DO A = 1 %TO % SSR %BY 1 ;

PROC FREQ DATA=B2; *DOSAGE-UNADJUSTED ALLELE FREQUENCY

ESTMIMATION;

TABLE M&A/OUT=B3&A;

RUN;

DATA B4&A; SET B3&A; *ASSUMING RANDOM MATTING EQUILIBRIUM THIS

STEP ESTIMATES ALLELE FREQUENCIES OF EACH ALLELE AT EACH LOCUS;

IF M&A = .

THEN DELETE;

DROP PERCENT;

FC = 1 -(( 1 -(COUNT/ 1000 ))**( 1 / 4 )); *****IMPORTANT!!!**** THE

1000 NEEDS TO BE MANUALLY CHANGED TO TOTAL NUMBER OF PROGENY EVALUATED GET

THIS NUMBER FROM TOTAL NUMBER OF OBSERVATIONS IN DATA SET BX1;

IF FC = 1 THEN FC = 0 ;

IF M&A = 0 THEN FC = (COUNT/ 1000 )**( 1 / 4 );

*****IMPORTANT!!!**** THE 1000 NEEDS TO BE MANUALLY CHANGED TO TOTAL NUMBER

OF PROGENY EVALUATED GET THIS NUMBER FROM TOTAL NUMBER OF OBSERVATIONS IN

DATA SET BX1;

MARK = &A;

ALLELE = M&A;

DROP M&A;

RUN;

%END ;

%MEND ;

% X4 ;

%MACRO X5 ;

%DO A = 1 %TO % SSR %BY 1 ;

B4&A

%END ;

%MEND ;

DATA B4; SET % X5 ; *THIS DATA SET CONTAINS ESTIMATED ALLELE FREQUENCIES OF

EACH ALLELE AT EACH LOCUS. CAUTION ALLELE FREQUENCIES AT A LOCUS OFTEN DO NOT

EXACTLY ADD UP TO ONE DUE TO CALCULATION METHODOLOGY (DUE TO 1. PRESENCE OF

NULL ALLELES; 2. STOCHASTIC ERROR ASSOCIATED WITH SAMPLE SIZE SAMPLED FROM

THE POPULATION; AND/OR 3. VIOLATIONS OF RANDOM MATING EQUILIBRIUM). IT IS

RECOMMENDED THAT AT LEAST 400 TO 500 PROGENY ARE SAMPLED FROM THE POPULATION

TO GET ACCEPTABLE POPULATION ALLELE FREQUENCY ESTIMATES DUE TO AUTOTETRAPLOID

GENETICS MAKING IT HARDER TO DETERMINE ALLELE FREQUENCIES.;

RUN ;

*PROCESSES ALLELE FREQUENCY DATA SET B4 TO MERGE TO THE PARENTAL GENOTYPE

FILE C;

%MACRO X7 ;

%DO M = 1 %TO % MAXALL %BY 1 ;

%IF &S < 10 %THEN %DO ;

RENAME COL&M = AL0&S&M;

%END ;

%ELSE %DO ;

%END ;

RENAME COL&M = AL&S&M;

%END ;

%MEND ;

%MACRO X8 ;

%DO M = 1

%IF

%TO %

&S <

MAXALL

10

%BY

%THEN

1

%DO

;

;

RENAME COL&M = FC0&S&M;

%END

%END

;

%ELSE

;

%DO ;

RENAME COL&M = FC&S&M;

%END ;

%MEND ;

%MACRO X12 ;

%DO S = 1 %TO % SSR %BY 1 ;

DATA B5&S; SET B4;

RUN;

IF MARK NE &S THEN DELETE;

KEEP ALLELE;

PROC TRANSPOSE OUT=B6&S;

DATA B7&S; SET B6&S;

DROP _NAME_;

RUN;

% X7 ;

DUMB = 1 ;

DATA B8&S; SET B4;

RUN;

IF MARK NE &S THEN DELETE;

KEEP FC;

PROC TRANSPOSE OUT=B9&S;

DATA BX&S; SET B9&S;

DROP _NAME_;

% X8 ;

DUMB = 1 ;

%END ;

RUN;

%MEND ;

% X12 ;

*MERGES POPULATION ALLELE FREQUENCY DATA TO PARENTAL GENOTYPE FILE C;

DATA B11; SET C;

DUMB = 1 ;

RUN ;

%MACRO X13 ;

%DO S = 1 %TO % SSR

BX&S B7&S

%BY 1 ;

%END ;

%MEND ;

DATA B12; MERGE B11 % X13 ;

BY DUMB;

PRR = 1 ; *SETS THE OVRALL FALSE POSITIVE FREQUENCY TO ONE, THIS NUMBER

GETS ITERATIVLY SMALLER WITH EACH LOCUS ANALYZED;

%MACRO X15 ;

*THIS MACRO DETERMINES THE SUM OF ALLELE FREQUENCIES AT EACH

LOCUS TERMED "ALTOT";

%DO S = 1 %TO % SSR

ALTOT&S =

%DO M =

%IF

1

0 ;

%TO

%BY

%

&S <

1 ;

MAXALL

10

%BY

%THEN

1

%DO

;

;

IF FC0&S&M NE .

THEN DO;

ALTOT&S = ALTOT&S + FC0&S&M;

%END ;

END;

%ELSE %DO ;

IF FC&S&M NE

%END ;

%END ;

END;

.

THEN DO;

ALTOT&S = ALTOT&S + FC&S&M;

NULL&S = 1 - ALTOT&S; *1 - ALLELE FREQUENCY TOTAL PER LOCUS

IS ASSIGNED AS THE NULL ALLELE FREQUENCY;

IF NULL&S < 0 THEN NULL&S = 0 ; *IF TOTAL ALLELE FRQUENCY IS

GREATER THAN ONE THAN NULL ALLELE FREQUENCY IS SET TO ZERO;

%END ;

%MEND ;

% X15 ;

%MACRO X14 ;

%DO S = 1 %TO % SSR %BY 1 ;

* MACRO X9&X10 CREATES FL VARIABLES WHICH INDICATES WHICH

OF ALL POSSIBLE ALLELES FOR A LOCUS ARE PRESENT IN THE MOTHER;

%MACRO X9

%DO

;

M = 1

%IF

%TO %

&S <

%END ;

MAXALL

10

%BY

%THEN

FL0&S&M = 0

1

%DO

;

;

;

%ELSE %DO ;

%END ;

%END ;

FL&S&M = 0 ;

%MEND ;

% X9 ;

%MACRO X10 ;

%DO P = 1 %TO 4 %BY 1 ;

%DO M = 1

%IF

%TO %

&S <

MAXALL

10

%BY

%THEN

1

%DO

;

;

IF P&P&S NE .

THEN DO;

IF P&P&S = AL0&S&M THEN

FL0&S&M = 1 ;

%END ;

END;

%ELSE %DO ;

IF P&P&S NE .

THEN DO;

IF P&P&S = AL&S&M THEN FL&S&M

= 1 ;

%MEND ;

%END ;

%END ;

%END ;

END;

% X10 ;

PR&S = 0 ;

*AT EACH LOCUS SUMS POPULATION ALLELE FREQUENCIES OF ALL

ALLELES OBSERVED IN THE MOTHER;

%MACRO X11 ;

%DO M = 1 %TO % MAXALL %BY 1 ;

%IF &S < 10 %THEN %DO

IF FL0&S&M = 1

;

THEN PR&S = PR&S +

FC0&S&M;

%END ;

%ELSE %DO ;

%END ;

%END ;

IF FL&S&M = 1 THEN PR&S = PR&S + FC&S&M;

%MEND ;

% X11 ;

PR&S = PR&S + NULL&S; *FOR EACH LOCUS SUM OF POPULATION

ALLELES FREQUENCIES OBSERVED IN THE MOTHER + NULL FREQUNCIES;

IF PR&S > 1 THEN PR&S = 1 ; *PR VALUES FROM LINE ABOVE CAN

HAVE A MAXIMUM OF 1;

PRR&S = (PR&S)** 2 ; *PR VALUES SQUARED GIVES PRR[LOCUS

NUMBER] = FOR THE LOCUS PROBABILITY OF OBSERVING THE GIVEN MATERNAL ALLELE

CONFIGURATION;

PRR = PRR&S * PRR; *GIVES THE PRODUCT OF ALL PRR VALUES FOR

THE GRAND PROBABILITY OF OBSERVING THE MATERNAL ALLELE CONFIGURATION ACROSS

ALL LOCI;

%END ;

%MEND ;

% X14 ;

RUN ;

*PRINTS THE OUTPUT OF PRR AND ALL PRR[LOCUS] VALUES;

%MACRO X16 ;

%DO S = 1 %TO % SSR %BY 1 ;

PRR&S

%END ;

%MEND ;

PROC PRINT DATA =B12;

VAR F PRR % X16 ;

RUN ;

Download