Yeast Data analysis Summary and conclusions Hierarchical modelling of automated imaging data Darren Wilkinson http://tinyurl.com/darrenjw School of Mathematics & Statistics, Newcastle University, UK Theory of Big Data 2 UCL, London 6th–8th January, 2016 Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Overview Background: Budding yeast as a model for genetics High-throughput robotic genetic experiments Image analysis and data processing (Stochastic) modelling of growth curves Hierarchical modelling of genetic interaction Summary and conclusions Joint work with Jonathan Heydari, Keith Newman, Conor Lawless and David Lydall (and others in the “Lydall lab”) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Budding yeast biology and genetics Synthetic genetic array Telomeres HTP yeast SGA robotic screens Saccharomyces cerevisiae Saccharomyces cerevisiae, often known as budding yeast, and sometimes as brewer’s yeast or baker’s yeast, is a single-celled eukaryotic organism Eukaryotic cells contain a nucleus (and typically other organelles, such as mitochondria) It is useful as a model for higher eukaryotes, having a great deal of biological function conserved with humans It is the most heavily studied and well-characterised model organism in biology (eg. first fully sequenced eukaryote) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Budding yeast biology and genetics Synthetic genetic array Telomeres HTP yeast SGA robotic screens Synthetic Genetic Array (SGA) Possible to obtain a library of around 4,500 mutant strains, each of which has one of the non-essential genes silenced through insertion of a (kanMX) antibiotic resistance cassette and tagged with a unique DNA barcode These strains (stored frozen in 96-well plates) can be manipulated by robots in 96-well plates (8×12), or on solid agar in 96, 384 or 1536-spot format Synthetic Genetic Array (SGA) is a clever genetic procedure using robots to systematically introduce an additional mutation into each strain in the library by starting from a specially constructed query strain containing the new mutation Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Budding yeast biology and genetics Synthetic genetic array Telomeres HTP yeast SGA robotic screens Telomeres The ends of linear chromosomes require special protection in order not to be targeted by DNA damage repair machinery (bacteria often avoid this problem by having just one chromosome arranged in a single loop) Telomeres are the ends of the chromosomal DNA (which have a special sequence), bound with special telomere-capping proteins that protect the telomeres CDC13 is an essential telomere-capping protein in yeast cdc13-1 is a point-mutation of cdc13, encoding a temperature-sensitive protein which functions similarly to wild-type CDC13 below around 25 ◦ C, and leads to “telomere-uncapping” above this temperature Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Budding yeast biology and genetics Synthetic genetic array Telomeres HTP yeast SGA robotic screens Basic structure of an experiment 1 Introduce a mutation (such as cdc13-1) into an SGA query strain, and then use SGA technology (and a robot) to cross this strain with the single deletion library in order to obtain a new library of double mutants 2 Inoculate the strains into liquid media, grow up to saturation then spot back on to solid agar 4 times 3 Incubate the 4 different copies at different temperatures (treatments), and image the plates multiple times to see how quickly the different strains are growing 4 Repeat steps 2 and 3 four times (to get some idea of experimental variation) 5 Repeat steps 2 to 4 with a “control” library that does not include the query mutation Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Budding yeast biology and genetics Synthetic genetic array Telomeres HTP yeast SGA robotic screens Some numbers relating to an experiment Initial SGA work (introducing mutations into the query and the library) takes around 1 month of calendar time, and several days of robot time The inoculation, spotting and imaging of the 8 repeats takes 1 month of calendar time, and around 2 weeks of robot time The experiment uses around £3,000 of consumables (plastics and media) The library is distributed across 72 96-well plates or 18 solid agar plates (in 384 format, or 1536 in quadruplicate) If each plate is imaged 30 times, there will be around 35k high-resolution photographs of plates in 384 format, corresponding to around 13 million colony growth measurements (400k time series) This is big data! Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Data analysis pipeline Image processing (from images to colony size measurements) Fitness modelling (from colony size growth curves to strain fitness measures) Modelling genetic interaction (from strain fitness measures to identification of genetically interacting strains, ranked by effect size) Possible to carry out three stages separately, but benefits to joint modelling through borrowed strength and proper propagation of uncertainty. Not practical to integrate image processing step into the joint model, but possible to jointly model second two stages. Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Automated image analysis (Colonyzer) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Growth curve A his3Δ htz1Δ 0 6 12 18 24 30 36 42 0.10 0.05 0.00 Normalised cell density (AU) 0.15 B 0 10 20 30 40 Time since inoculation (h) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Growth curve modelling We want something between a simple smoothing of the data and a detailed model of yeast cell growth and division Logistic growth models are ideal — simple semi-mechanistic models with interpretable parameters related to strain fitness Basic deterministic model: dx = rx(1 − x/K), dt subject to initial condition x = P at t = 0 r is the growth rate and K is the carrying capacity Analytic solution: x(t) = Darren Wilkinson — UCL, 7/1/2016 KP ert K + P (ert − 1) Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Statistical model Model observational measurements {Yt1 , Yt2 , . . .} with Yti = xti + εti Can fit to observed data yti using non-linear least squares or MCMC Can fit all (400k) time courses simultaneously in a large hierarchical model which effectively borrows strength, especially across repeats, but also across genes Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Growth curve model l m n Time Point ylmn ŷlmn Repeat rlm Klm orf ∆ τlK τlK σ K,o Kp νl rlo Klo σ r,o σ τ,K τ K,p rp Darren Wilkinson — UCL, 7/1/2016 σν σ τ,r τ r,p P Population νp Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Colony fitness The results of model fitting are estimates (or posterior distributions) of r and K for each yeast colony, and also the corresponding gene level parameters Both r and K are indicative of colony fitness — keep separate where possible Often useful to have a scalar measure of fitness — many possibilities, including rK, r log K, or MDR×MDP, where MDR is the maximal doubling rate and MDP is the maximal doubling potential Statistical summaries can be fed in as data to the next level of analysis (or, ultimately, modelled jointly as a giant hierarchical model) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Epistasis From Wikipedia: “Epistasis is the interaction between genes. Epistasis takes place when the action of one gene is modified by one or several other genes, which are sometimes called modifier genes. The gene whose phenotype is expressed is said to be epistatic, while the phenotype altered or suppressed is said to be hypostatic.” “Epistasis and genetic interaction refer to the same phenomenon; however, epistasis is widely used in population genetics and refers especially to the statistical properties of the phenomenon.” Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Multiplicative model Consider two genes with alleles a/A and b/B with a and b representing “wild type” (note that A and B could potentially represent knock-outs of a and b) Four genotypes: aa, Ab, aB, AB. Use [·] to denote some quantitative phenotypic measure (eg. “fitness”) for each genotype Multiplicative model of genetic independence: [AB] × [ab] = [Ab] × [aB] no epistasis [AB] × [ab] > [Ab] × [aB] synergistic epistasis [AB] × [ab] < [Ab] × [aB] antagonistic epistasis Perhaps simpler if re-written in terms of relative fitness: [AB] [Ab] [aB] = × [ab] [ab] [ab] Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Genetic independence and HTP data Suppose that we have scaled our data so that it is consistent with a multiplicative model — what do we expect to see? The independence model [AB] × [ab] = [Ab] × [aB] translates to [query : abc∆] × [wt] = [query] × [abc∆] In other words [query : abc∆] = [query] × [abc∆] [wt] That is, the double-mutant differs from the single-deletion by a constant multiplicative factor that is independent of the particular single-deletion ie. a scatter-plot of double against single will show them all lying along a straight line Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Statistical modelling Assume that Fclm is the fitness measurement for repeat m of gene deletion l in condition c (c = 1 for the single deletion and c = 2 for the corresponding double-mutant) Model: Fclm ∼ N (F̂cl , 1/νcl ) log F̂cl = αc + Zl + δl γcl δl ∼ Bern(p) δl is a variable selection indicator of genetic interaction Then usual Bayesian hierarchical stuff... Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Genetic interaction model c l n Repeat αc σcγ Fclm F̂clm γcl Condition νcl orf ∆ δl Zl σZ Zp Darren Wilkinson — UCL, 7/1/2016 σν Population νp Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction SAN1 60 NMD2 UPF3 EXO1 NAM7 RAD24 RAD17 DDC1 BMH1 LST4 50 IPK1 RPN4 VIP1 UBC4 RPL16B SCS2 YDL119C SAC7 TMA22 30 40 TMA20A CKA2 YER119C− MPH1 MTC5 VPS8 FYV10 EBS1 ARX1 DOA1 SRN2 VID28 YNR005C MCK1 RPL13ARPL2A KRE1 GTR1 OGG1 MCX1 RAD9 YDL118W YIL055C MEH1 CHK1 RPL8A RPL35B RIF2 VPS4 MUM2 BUD27 RTC1 YBL104C NRG2 EDE1 FTR1 RIC1 RPL9A BAS1 NOP16 UFD2 KIN3 RPL37B VPS9 NPR1 PET122 YIL057C ELM1 SIM1 DBP3 AIR1 FET3 BST1 UBP6 SHE4 ARO8 GEF1 RPL35A RPL8B RPL43A SLA1 YMR057C SSF1 RPO41 UTH1 UFD4CRP1 ERG3 TKL1 POT1 RPL42A TAL1 DID2 SNX4 NUP2 BUD14 PIB2 BNR1 NPT1 CST6 DIE2 CYK3 YPT6 GUF1 RVS167 YLR261C OCA6 UBX4 RPL27B YIL161W GUP1 GPA2 RPL11B UBX7 COX23 QCR9 VPS60 MNI1 AYR1 PHO2 RPL37A CCC2 YML010C− B RAD23 RPP1B YLR111W TRP1 RIM1 RPL16A YKR035C CAP2 CTP1 YMR153C− HCM1 YKE4 SIW14 SUB1 GZF3 GPD2 XBP1 SAS4 A ARO2 RPL6B YDL176W MRT4 SUR4 CBP4 RHR2 PHB2 PFK26 OST3 KEX1 RNR3 FCY2 TNA1 OAC1 YNL226W VPS24 COX7 VPS51 RPL24ACCW12 RPL4A VPS21 RPL36A VID24 PUS7 COX12 YJ L185C SKI8 YPL080C RPL34A SAS5 NUP53 AVT5 LRP1 YPR044C YLR218C SYS1 YER093C− AYBL083C ALG3 RPL24B RPL43B RPL26A LIA1 YMR193C− APBP2 PER1 OST4 MNI2 YDR271C MSS18 COQ10 BMH2 SKI2 CRD1 TRM44 A BUD28 RPS4A ICT1 CLG1 YBL059W VID30 KAP122 STM1 CKB2 YLR338W YML013C− YLR290C HPT1 NAP1 SFH5 SER2 DBF2 YDL050C FMP35 RPL21B VPS35 GDH1 CBF1 YJ GID8 L206C HSE1 IMG2 FEN1 ALG12 YBR144C MDM34 YGR259C RPL29 VAM10 SKI7 VRP1 MET18 KEX2 VAC14 VPS17 DOT1 OCA4 OTU2 LAC1 VHS2 YPT7 SPT2 CBT1 EOS1 ERJ 5RBS1 MON1 RPL33B CAT5 OMS1 NBP2 MNE1 YLR407W VPS5 RAV1 SAS2 TEF4VPS38 CYT2 YGL057C BEM4 RPL40B AEP2 ARC18 RSM25 OYE2 DSK2 UPS1 RPP2B UBP2 FUS3 YDR269C TOM70 APT1 MTC3 CKB1 ERG6 RPL17B RPL23A YTA7 PSD1 ACE2 YUR1 RPL9B LRG1 SWI4 CYT1 QCR2 YLR402W RMD5 YDR203W CYC2 CKA1 YBR266C LEU3 MDM38 SWF1 IRC3 HSP12 YGL218W YGL149W GOS1 ALG9 PHO87 YPL062W TPS3 PTC1 MRPL1 RNH202 ERD1 YDR266C HAP2 UBP3 CAC2HAP5 MMM1 YDR348C RVS161 YLR217W MGR2 SYF2 ALD6 PHO80 QCR6 ALG5 PHB1 FYV1 PEX32 VPS1 YLR091W PMS1 HAP3 YBR232C YNR029C PMP3 PKR1 STP1 YML090W GET1 YDR537C PEX8ALG6 MNN11 MIS1 YAP1801 DRS2 MIR1 YLR184W YBR226C MLH1 YDR049W PEX15 YJ L211C CPS1 BNA2 PPZ1 PEX13 URE2 PEX2 BUL1 OCA5 RTC4 CYM1 EMI1 YOR309C ARO1 NBA1 RGS2 PTP3 VTS1 YKL121W KRE11 TOM7 YHL005C GEM1 TOM6 TOP1 HAP4IMP2 JJJ1 RPP1A YJ L120WBRE5 VAM3 PEP8 VAM6 RPE1 VAM7 PUF6 ASC1 AAH1 ATP10 PAH1 ZWF1 RPL22A REI1 MDM10 20 DEG1 CCZ1 YGL024W SPE2 10 BTS1 RTT109 MMS22 MRE11 RAD50 CTK1 SIR3 LSM6 YBR028C YBR099C YDR262W ALD3 GPH1 KNS1 EMI5 YPR004C FIT2 YJ R154W YDL012C NAT1 ASR1 YPL035C YMR310C PGM2 ARA1 YMR206W HIS3 YOR052C PNG1 MNT4 TCM62 HSP82 YBL071C AZF1 STD1 ECM5 UME1 ZRT3 CSM3 KAR9 YPS6 PAN3 PUF3 SYT1 AKL1 TOS3 REH1 PET130 YBR246W YNL011C EFT2 ESBP6 YDL109C GRS1 BFA1 YDR149CSNF1 DIP5 YPR098C MET16 HIR3 HAT1 YPR039W YBR025C YGL217C MRN1 YAP1 NCS2 YPR084W SWC5 SET2 ERG24 LDB18 CHL1 CLB2 PAN2 URM1 DEP1 RAD27 PAC1EAF3 MBP1 APQ12 SHE1YGL042C YMR074C NTO1 YBR277C THR4 UBA4 MCM16 SBP1 YPL102C CSN12 YOR251C BCK1 DBR1 DPB3 SLT2 IMP2' TMA19 ELG1 DYN1 BUB2 FKH2 J NM1 PPH3 YLR143W MAK3 DYN3 DPB4 PTC2 KTI12 YNL120C PPQ1 ARP1 IRA2 PHO13 HGH1 ELP6RTF1 TEL1 HSP26 JTIP41 JPUF4 JEST3 3 HMT1 HSP104 RRD1 YPR050C YKR074W DPH5 FKH1 FPS1 :::RTT103 GIM3 PTC6 ELP4 NUP133 NPL4 HPR5 IRC21 EST1 PPT1 IKI3 BEM2 CDC73 RMD11 LDB19 CTF4 XRS2 MFT1 DPH1 DCC1 THP2 INP52 MAK10 SWM1 PAT1 LTE1 DPH2 MAK31 ELP2 SRB2 MRC1 POL32 CLB5 RIF1 ELP3 YKU70 STI1 :::MRC1 0 Fitness (F) of orfΔ cdc13-1 double mutants at 27°C (doublings2 / day) Genetic interaction results 0 50 100 150 Fitness (F) of orfΔ ura3Δ double mutants at 27°C (doublings2 / day) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Joint modelling of growth curves and genetic interaction We can integrate together the hierarchical growth curve model and the genetic interaction model into a combined joint model This has usual advantages of properly borrowing strength, proper propagation of uncertainty, etc. Also very convenient to avoid requiring a scalar measure of “fitness” If yclmn is the colony size at time point n in repeat m of gene l in condition c, then yclmn ∼ N (ŷclmn , 1/νcl ) ŷclmn = X(tclmn ; Kclm , rclm , P ) log Kclm ∼ N (αc + Klo + δl γcl , 1/τclK ) log rclm ∼ N (βc + rlo + δl ωcl , 1/τclr ) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction Joint model c l m n σcγ γcl Time Point yclmn ŷclmn Repeat rclm Kclm αc σcω Condition ωcl νcl βc Klo δl rlo σ r,o σ τ,r σ K,o σ τ,K Kp τ K,p orf ∆ τlr τlK rp Darren Wilkinson — UCL, 7/1/2016 τ r,p P Population σν νp Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction 80 SAN1 NMD2 EXO1 UPF3 RAD24 NAM7 RAD17 DDC1 BMH1 IPK1 LST4 SCS2 YDL119C VIP1 UBC4 SAC7 TMA22 RPL16B YER119C− A TMA20 CKA2 MPH1FYV10 MTC5 VPS8 EBS1 ARX1 RPL2A SRN2 RAD9 DOA1 MCK1 YNR005C RPL13A GTR1 CHK1 OGG1 YDL118W VID28 YIL055C KRE1 RIF2 MUM2 MEH1 RPL8A VPS4 RPL35B BUD27 MCX1 FTR1 EDE1 RTC1 RPL9A RIC1 YBL104C BAS1 NOP16 KIN3 UFD2 RPL37B NPR1 FET3 NRG2 YIL057C VPS9 DBP3 PET122 ELM1 UBP6 AIR1 BST1 ARO8 SIM1 CRP1 GEF1 SSF1SHE4 RPL35A YMR057C RPL8B ERG3 SLA1 TKL1 TAL1 UTH1 POT1 RPO41 BUD14 NUP2 UFD4 RPL42A BNR1 SNX4 RPL43A DID2 PIB2 DIE2 YLR261C CYK3 RPL27B CST6 YPT6 NPT1 UBX4 RVS167 GUF1 YIL161W OCA6 RPL11B COX23 GUP1 QCR9 AYR1 PHO2 MNI1 CCC2 RPL37A RPP1B VPS60 UBX7 YLR111W RAD23 YML010C− YMR153C− GPA2 CAP2 RPL16A B CTP1 RIM1 HCM1 GZF3 SAS4 A XBP1 YKE4 TRP1 GPD2 SIW14 RPL6BSUB1 YKR035C YDL176W ARO2 PHB2 KEX1 RNR3 MRT4 SUR4 CBP4 FCY2 PFK26 RHR2 COX12 TNA1 RPL4A OST3 VID24 OAC1 PUS7 COX7 VPS21 RPL34A YJ L185C SKI8 SAS5 RPL36A AVT5 SYS1 NUP53 YBL083C YPL080C YER093C− A VPS51 RPL24ACCW12 YLR218C ALG3 RPL24B LRP1 YPR044C LIA1 RPL26A PER1 BMH2 HPT1 YDR271C YNL226W VPS24RPL43B YMR193C− APBP2 YBL059W COQ10 SFH5 VID30 MSS18 CRD1 MNI2 YDL050C SKI2 YLR290C TRM44 ICT1 KAP122 CLG1 BUD28 FMP35 HSE1 STM1 NAP1 YJ L206C YBR144C RPS4A CKB2 OCA4 SER2 CBF1 GDH1 GID8 DBF2 IMG2 RBS1 ALG12 VPS35 ERJ 5LRG1 DOT1 SAS2 YGR259C YML013C− AVAM10 OMS1 VAC14 OTU2 LAC1 NBP2 SKI7 VHS2 RPL29 DSK2 CBT1 KEX2 YLR407W MDM34 SPT2 RPL21B VPS17 MNE1 RSM25 RAV1 OST4 YDR203W IRC3 MET18 RPL33B YGL057C RMD5 YLR338W FEN1 OYE2 MIS1 VPS38 CAT5 YUR1 APT1 MTC3 YBR232C UBP2 RNH202 CYT2 TOM70 YLL029W PHO87 MON1YPT7 BEM4 LEU3 VPS5TEF4 CKA1 ALG9 HSP12 YDR266C ALG5 CYC2 YDR348C ERG6 RPL40B TPS3 YLR091W QCR6 ERD1 ARC18 YBR226C ACE2 YDR537C LHP1 PMP3 FYV1 YLR217W MMS2 GOS1 FUS3 BNA2 MDM38 RTC4 RPL23A NMD4 CYT1 PHB1 SWI4 MNN11 YAP1801 YTA7 AEP2UPS1 CPS1 YDR269C RPL9B PKH3 QCR2 YML090W OCA5 ISU2 YBR285W MDL1 RPL17B MRPL1 HAP2 PPZ1 YGL149W GET1 YKL121W MIR1 ALG6 PEX8 PSD1 CKB1 ASM4 CYM1 URE2 RPP2B VRP1 YLR402W PTP3 YOR131C SWF1 HAP5 HAP3 PEX13 ROT2 YDR049W ILV6 BUL1 YML108W UBC13 NBA1 MLH1 PMS1 CAC2 YJ L211C KRE11 PEX2 DCW1 MGR2 IOC3 PTC1 UBP3 RVS161 YNR029C PEX15 APL1 ARL1 NFI1 YDR467C MMM1 YML053C TOS1 MVB12 CWH41 YBR266C EOS1 YPS7 YPL062W COX8 RAX2 ALG8 YOR008C− ABP1 TOM6 IMP2 YGL218W VTS1 PEX10 RIT1 SRL1 TOM7 CTI6A FIS1 YJ L215C PEX32 TPS1 PEX14 ARL3 RGS2 YAL004W RGA1 LEM3 YNL105W YDR506C SYF2 FMP36 VPS1 RNH201 RSM28 YBR224W PSR2 YDR029W ADE1 YOR082C SKG3 MSC6 POC4 MFB1 OCA1 DCR2 PHO80 YLR184WALD6 STP1 OSH3 PEP8 TMA16 YLR118C SSA1 SWA2 PET127 YBR134W RPL41B COG7 OST5 GEM1 YPL041C INP53 ASH1 VPS75 RNH203 AHP1 COX5B EMI1 MET3 YER077C PKR1DRS2 YNL109W BCH2 YHL005C YBR238C BRE5 VTC4 YML119W YCR051W VPS30 MAD1 TOP1VPS27 ITR1 MEP3 APE2 YGR242W COX5A MGA2 APP1 YML102C− A RTC6 NDL1 YGL235W REX4 YAL058C− A TFP1 YKL158W YAF9 YIL166C ERF2 MLH2 J SN1 SER1 PCT1 GGA2 RTG2 RPP2A PRS5 YHL044W KES1 IRC8 PEX12 YGL046W VAM6 SYC1 YIA6 BEM3 BUG1 ARO1 MGR1 YOL079W PTH1 YHM2 YLR282C GIS3 YIL064W YER078C YGR226C APL3 FMP25 YER066W YSA1 OPI10 YOR309C ATP10 SLM1 PIN4 ENT2 ECM22 RTT102 YMC2 YGL132W :::GAL11 SPO11 SRF6 SBA1 RPP1A YJ L120W IES2VPS29 YKL077W YOR022C YLR334C MSH2 IMD4 YDR445C PRX1 SRF4 ALB1 VAM3 RHO5 COG8 ATG27 HOS2 PEX5 HAP4 RPE1 PAM17 OCA2 PMT2 YPT32 ASC1 PEX30ATG5 KEL1 JVAM7 JJ1 MRP49 INP2 PEX25 YNR020C APS2 VPS13 PAH1 PEX4 HOM2 SIF2 YER084W YPL105C GTR2 MNN2 GDS1 LYS20 SKI3 GIS4 ERV14 TLG2 AAH1 PEX3 SNC2 PEX6PIH1 RPS24A MDM10 KEM1 RPL22A TPM1PUF6 YLR404W UBA3SSH1 FSF1 HCR1 DEG1 RPL19A REI1 ATP18 RPS14A MOG1 ZWF1 YNL198C CHZ1 ICE2 CCZ1 PHO86 BUD13 FKS1 YNG1 BUB3 MLS1 HUL5 PRM4 HST2 SSK1 RHO2 CCS1 STB5GAS1 MET31 PCP1 YIL102C GCV2 HMX1 ATH1 BSP1 NRM1 POM34 MMR1 YDL071C NUP60 VPS53 ZRT1 YLR434C PRE9 PUS4 MCR1 BIK1 NCE102 :::GUK1 YOL153C BUB1 YBR028C FYV12 PBY1 SCS7 RTG3 OM45 AAD3 YDR262W ARR1 AGA1 GSH2 LRE1 HPC2GPH1 KIP2 RPS30B SHE9 CSR2 SKT5 J ID1 ALD3 CWH43 STE50 APC9 PET494 KNS1 IOC2 WWM1 UME1 YPR096C YCF1 SPE2 SCD6 RPS1BVPS41 PHO88 EMI5 SEM1 MPA43 NAT1 RFM1 RTG1 ARA1 YPL183C AKL1 YBR099C MRPS9 YDL012C YJ R154W YMR206W YMR144W AUA1 YPL225W EAF6 MDM12 RPL14A YPR148C RPS23B AKR2 ARF1 SPO14 PGM2 CUE3 ADY4 PNG1 SPE1 IRC15 IBD2 HIS3 RCE1 YPR090W FIT2 SYT1 MSN5 YGL024W ZRT3 DIA2 THR1 LAS21 VIK1 ECM5 TSA1 MNT4 BUD19 STB1 YPS6 RBL2 YBR246W PET130 REH1 RPA14 ELF1 STD1 YMR310C IXR1YDR149C NIF3 AHA1 EFT2 TOS3 YPL035C YNL011C CHS3 HSP82 AZF1 YPR097W PUF3 IOC4 SPE3 GRS1 BFA1 TCM62 ESBP6 YPR004C SWC5 YOR052C MRN1 KAR9 PAN3 YDL109C APL4 ASR1 RPS27A NCS6 MID2 YNL140C ERG24 YNR004W CLB2 YBL071C DIP5 YBR025C MBP1 CSN12 MKS1 CSM3 PAN2 YAP1 SET2 KRE28 CPR7 SLX9 URM1LDB18 PAC1 CHL1 YGL042C SLT2 YPR098C SHE1 LEO1 HAT1 BUD21 RAD52 YBR277C YGL217C BCK1 EAF3 PPH3 DEP1 YPR084W MET16 SNF1 DPB4 YMR074C DPB3 YOR251C DYN3 ELG1 YPR039W FKH2 DYN1 IMP2' YLR143W MCM16 MMS1 SBP1 UBA4 NTO1 HGH1 BUB2 RAD27 PDE2 HSP104 PTC6 JJNM1 NCS2 IRA2 DBR1 JRRN10 JIRC21 3 MAK31 PHO13 YNL120C GIM4 PPQ1 THR4 KTI12ARP1 DPH5 TIP41 SNF5 YPL205C NPL4 FKH1 TEL1 CTF18 YPL102C HMT1 EST3 MAK3PTC2 BEM2 PPT1HSP26 SOH1 LDB19 MMS22 IKI3 FPS1 ELP6 NUP133 TMA19RRD1 SWM1 CDC73 HPR5 YKU80 ELP4ELP2 YPR050C RTF1 DPH1 KAR3 :::RTT103 PUF4 CPA2 YNL171C YKR074W MFT1 STI1 THP2 EST1 RMD11 GIM3 MAK10 POL32 CTF4 LSM1 MRE11 DPH2 DCC1 CLB5 YKU70 MRC1 SRB2 XRS2 :::MRC1 TGS1 INP52 LSM6 RAD50 CTK1 SRB8 ELP3PAT1 LTE1 OPI1 RIF1 40 60 RPN4 20 BTS1 GSH1 RPL20B SGF29 CSE2 RPS9B BUD31 STE2 YMD8 IES5 TRP5 GRR1 SIN4 MFA1 0 Fitness (F) of orfΔ cdc13-1 double mutants at 27°C (doublings2 / day) Joint model results YDL041W SFL1 NHP10 SIR4 ARC1 STE4 SSD1 ARP8 STE11 MED1 0 20 40 60 80 Fitness (F) of orfΔ ura3Δ double mutants at 27°C (doublings2 / day) Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Summary References Big data issues Understanding conflict between model and data in big data contexts — does more data demand more complex models? Model simplifications and improvements to MCMC (block updates and proposals, reparameterisations, GVS, etc.) Basic parallelisation strategies (parallel chains, parallellelised single chain) Investigation of data parallel strategies (consensus Monte Carlo, etc.) Novel representations and implementations of Bayesian hierarchical models using functional programming languages Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Summary References Summary Modern bioscience is generating large, complex data sets which require sophisticated modelling in order to answer questions of scientific interest Big data forces trade-offs between statistical accuracy and computational tractability Stochastic dynamic models are much more flexible than deterministic models, but come at a computational cost — the LNA can sometimes represent an excellent compromise Notions of genetic interaction translate directly to statistical models of interaction Big hierarchical variable selection models are useful in genomics, but can be computationally challenging Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Summary References Funding acknowledgements Major funders of this work: BBSRC MRC Wellcome Trust Cancer Research UK Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data Yeast Data analysis Summary and conclusions Summary References References Addinall, S. G., Holstein, E., Lawless, C., Yu, M., Chapman, K., Taschuk, M., Young, A., Ciesiolka, A., Lister, A., Wipat, A., Wilkinson, D. J., Lydall, D. A. (2011) Quantitative fitness analysis shows that NMD proteins and many other protein complexes suppress or enhance distinct telomere cap defects. PLoS Genetics, 7:e1001362. Heydari, J. J., Lawless, C., Lydall, D. A., Wilkinson, D. J. (2016) Bayesian hierarchical modelling for inferring genetic interactions in yeast, Journal of the Royal Statistical Society, Series C, early view. Heydari, J. J., Lawless, C., Lydall, D. A., Wilkinson, D. J. (2014) Fast Bayesian parameter estimation for stochastic logistic growth models, BioSystems, 122:55-72. Lawless, C., Wilkinson, D. J., Addinall, S. G., Lydall, D. A. (2010) Colonyzer: automated quantification of characteristics of microorganism colonies growing on solid agar, BMC Bioinformatics, 11:287. Wilkinson, D. J. (2009) Stochastic modelling for quantitative description of heterogeneous biological systems, Nature Reviews Genetics. 10(2):122-133. Wilkinson, D. J. (2011) Stochastic Modelling for Systems Biology, second edition. Chapman & Hall/CRC Press. Darren Wilkinson — UCL, 7/1/2016 Hierarchical modelling of automated imaging data