Applications and the Grid The European D at aG ri d http://www.e P DataGrid is a p ro j e c u roj ec - d t f u n a de ta d b g y r i d t th Team e .o E u r g ro p e an U n io n Grid T u to rial 4 / 2 8 / 2 0 0 3 – n ° 1 Overview An applications view of the the Grid – H ig h E nerg B ona r i ef C E P estb C a ex p the M I D s i n H r c M E P ? odel a nd i ts ev ol uti ons towa r ds arth O M Wha a a R b a na g em ent a nd H E P r eq ui r em ents. tter ns nd 2 v a l i da ti on : wha t ha s a l r ea dy b odel of een done on the I D - b a sed di str i b uted c om p . m the H E P servation nd p t do typ iolog B p nd m ents i ssi on a dg l ysi s a ed ? er i m na eds 1 ur r ent G ta essi ng testb B enti on of da r oc ases G H T m se C sics R P hy Why we need to use G E P L y U i c l a a l ns E a r th O b s. a p p l i c a ti ons do ? L y A S T Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 What all applications want from the G rid A homogeneous way of looking at a ‘virtual computing lab ’ V ( V O mad e up of heterogeneous resources as part of a irtual O rganisation) which manages the allocation of resources to authenticated authorised users A uniform way of ‘logging on’ to the Grid and B as ic func tions for j ob s ub mis s ion, data management and monitoring Ab ility to ob tain res ourc es ( s erv ic es ) s atis fying user requirements for data, CPU, software, turnaround… … Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 Common Applications Issues Applications are the end- u f inally m ak ing throu req se C U u of ev g h U irem S E C AS E S : ents in sof tw R I D odeling their u : they sag e of u e f or g a standard techniq are dev ases are narrativ ents of the G are the ones the dif f erence All applications started m sers of an actor u e docu sing elopm m a sy ent m the G stem [ . . . ] I D athering ethodolog ents that describ R ies e the seq to com u ence plete processes W hat Use Cases are N O T : the description of an architecture the representation of an im pl em entation Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 4 Why Use C ses ? a Applications domain ALICE AT LAS CM S LH Cb O t h e r H EP O t h e HEP Common Application Layer r Ap p s … Domains interface D a ta P G B P R a D I D G, Gr i P m g o f h y n id , E U- D a d l ew t a GR a re I D Services (GLOBUS, C o d o r - G ,… ) OS & Net services Computer Scientist domain Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 5 Use Cases (HEPCAL) http://lhcgrid.web.cern.ch/LHCgrid/SC2/RTAG4/finalreport.doc O b A s tain k Grid auth f or re Grid log B row s e v oc oris ation J ob ation of Grid auth oris J ob ation S D m Grid re S D e m s ourc atas V tadata up e e t re g is e irtual datas D atas U s D e e t up r- de ata s D atas D D e ata s P h y D s ic e s D e r de ata re D B D e ata s s d c c e e e e Grid laration ate rializ ation ue re c f e p s s lic t de le f in d c e t v t b on e v r to n a up e c on - Grid s load to th os q m is utp c J ob e s s ourc v p roduc tion n is s c c e ry ub m e iron littin aly ue ry e s s or R f or A b e trie orte v al d or F ailin g P roduc tion J ob s trol s n date ion ov r j ob re s ut A e on J ob A t e v torag e D e Grid aluation e t in tion ( c atalog rif ic s s tan om p ue re c m e de dition le is e m s e s ion tim n ation t m odif ic ation g j ob 1 te ) de le tion ote D e t tran J ob m S ulation im x p e on rim s f orm e itorin ation g J ob n t s of tw are de v e lop m e n t f or th e Grid tion V le ation in ata s E ation al f rom row ue ub e J ob ation atalog C te P c up rror R S to th ue s lic t ac c s atalog s s p trie ata s row e t re e c t m al data s ata s U f in t re ata s t de e t tran e s atalog O J ob load t ac e atas D e e tration irtual datas V c s J ob s date tadata ac e c J ob in E D c ( c atas e om ts p le te ) O V w O C w S on ide re ide re dition of tw are s p p s ourc e ourc ub ub e lis lis re h h s e rv alloc in in ation ation to us e rs g g g datab as e Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 6 High Energy Physics applications Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 7 The LHC challenge HEP is carried out by a community of spread al l ov T L he C g oal ER T N er the w arg for the w at an energ e Hadron C y scal e ( the study of q L HC ex 10 T h L I ex C p 7 + 7 -g G is the most chal l eng T eV p-p) t years corresponding bang ing S Y ,G to the v -1 ( < U 3 10 s) U T s) ery first , al l ow ing l uon pl asma il l produce an unprecedented amount of data collision events / year (+ same from simulation) C E , A T L A ond S , C s to 3 M S , ut to d L H 4 C P b B d ata / year / ex p eriment ) ata storag e center ): up to 1 . 2 G B /s p er eriment ollision event record 2 N uired, stored, anal ysed : ata rate ( inp uark is corresp (A D 10 ER and the model s beyond it ( S erse after the big periments w to be acq ol l ider at C hol e HEP community in the nex instants of the univ orl d est the standard model more than 10,000 users B s are larg e: up to 2 5 M B (real d ata) and (simulation) Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 8 The LHC detectors CMS ATLAS ~6-8 PetaBytes / year ~1010 events/year ~103 batch and interactive users LHCb Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 9 multi-level trigger filter out background reduce data volume online system Hz (40 T B/se data c) r e c offli o ne a rding n al y & sis Grid Tutorial 75 K - specia B/sec) leve l ha l 2 - Hz (7 rdw are 5 e G m B b 5K Hz ( edded pr /sec) oce 5G leve ssor B / sec) l3 s 100 - PC s (100 Hz M leve l1 40 M - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 0 Data Handling and C detector event eventfilter filter (selection (selection&& reconstruction) reconstruction) reconstruction P h o m y s p u ic s tatio A n f o naly s r is processed data event summary data raw data event event reprocessing reprocessing analysis batch batch physics physics analysis analysis b interactive physics analysis er t so n @ c er n .c simulation les.r o event event simulation simulation h analysis objects (extracted by physics topic) Deploying the LHC G lob a l G r id S er v U ic n e L i x a b m grid for a re L a b gion al grou p U C a p S r 1 K A F The LHC Tier 1 r3 s ic art m ie s e Co Tier2 n m p Cen t u t i n c e g U t r e CERN Tier 0 I ran J t al y ap n i n an t op . . a b G . . e rm an y b L a b c .c L . . h γ α U n i y U n i b s t u s ic dy s c y @ h grou p n p er n grid for a β er t so s k p y T b e h N i a les.r o D de ie R U U T E n HEP Data Analysis and Datasets Raw data ( h Re k c B o y A y c u u l s c k s i s h e te , h ~ e i g h 1 M B y te b l u j e s c te r ts s ATLAS Barrel Inner Detector Ð H→bb ts d data ( c O e g c i c s m g c to s m r du l l e y u O i s o s S h ac P p ) E S D ) ~ 1 0 0 b … ( A O D ) ~ 1 0 te Re tr al y B s n tr n k , W te i ts RA r ti o O ar an i z d A O am n d d b y s , ( s f c e D o j e i z e s s b ts T p A G tati s e v e h n y s ti c s ) i c s to p ~ al data o i c 1 k B y te n ts Ð b Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 3 HEP Data Analysis – r o c e ssing Processing fundamentally indep due to indep S o h av endent nature of ‘ ev e concep ( e. g. w h tak A p 2 Production p , 0 group es ~ 0 j ob ents) Ph ysics group ev A T h O D + T A p mb arassing p s’ w h ich p rocess N is p th arallel) 0 * is w 6 ev lanned b ents w ould th rocessing ( ( y ex p 1 ? 0 0 ev 1 - 3 / T en inv b ents p t to ex p ) . e h ysics t) times a year of 1 month olv yte eriment and p ary from ex rocessing 5 ents lete on one node) * ill v ev s of ~ s merging into total set of 2 rocessing econstruction p E ns ents’ day to comp data managers( R r organised in group rocessing for 1 0 atte litting and merging simulation j ob ich ts of sp p endent ( Processing organised into ‘ j ob p 0 * Produce ~ 1 * 9 0 * * 7 G is may b e distrib uted in sev eral centres Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 4 Processing Pa ( 2 ) Individual physics analysis ( acco r H w E le S o + t o le ct ive T g R o f ns . A W d r r up o r k ss ce ve plicat io r g e r b ns o y de f e al A O f o d ve r n ‘ chao t ic’ individuals) ib D r f init io + ut e T y se A t uning G le alg d in e and r ct ive o x r pt ay e un t he acce it hm m s, ir ach o w n ss t o che ck ing nt s) plicat io ill b nt r ill ne dat a ( e e pat t e - physicist s dist r W nal e his w r o acce ccasio e w ds o ct io D ill ne se t o e ant se W ding undr t t erns n o n o f f a f unct io anisat io R A O A n o D + W T + f n in t he A E pr S e x o G in e x pe r im e nt , and D ce ssing pe r im Grid Tutorial e and physics - nt 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 5 Alice: AliE n - E D G in t eg r a t io n EDG R B Server EDG U J DL C I t r a I nst a nsla er t if ica lla EDG t ion t ion C AliEn CE E t es ED Alice SE on EDG nodes Alice Da a t a C ccess b a y t a log u ED G U G e AliEn S EDG nodes N g u E s Data atal o E I W C S (Cerello, Barbera,Buncic Saiz,et al.) e Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 6 What have HEP experiments al read 1 . 0 and d 2 o ne o n the ED G testb ed s . 0 The EDG User Community has actively contributed to the validation of f eb 2 A 0 0 3 some p sup p H C ex p eriments have ran their sof tw reliminary version) to p orted by the testbed 1 middlew status q rep p A dep 0 2 – f eatures p erations rovided by the EDG are uery, lication, s/ w 0 are ( f irstly in erf orm the basics op alidation included j ob submission ( J DL V the f irst and second EDG testbeds ( f eb 2 ) ll f our L y basic data manag reg ister into rep endencies or incomp ) , outp ement op lica catalog erations ( s ) , atibility ( e. g ut retrieval, . check j ob f ile of missing p ossible libs, rp ms) roblems TL A S , challeng CM S , A lice have run intense p es and stress tests during 2 0 0 roduction data 2 Grid Tutorial and 2 - 4 / 2 8 / 2 0 0 0 3 0 – 3 A p p lic ation s an d th e Grid - n ° 1 7 The CMS Stress Test CMS Mo n t e Ca farm ( eg. M r o e P B S d u c t i o n u s i n g B O SS a n d I m p a l a t o o c e m 2 b 5 0 2 5 0 e , 0 , 1 . 0 4 7 G r 0 2 0 ev 0 ev 0 2 t o rid as ‘local farm’ J a n u a r y 2 0 0 3 ents generated by job submission at 4 ent files p sep arate U L E I ’ s roduced b data transferred using automated grid tools during p including transfer to and from mass storage systems at C l s ) odified to treat G D p Originally designed for submitting and monitoring jobs on a ‘local’ r l o E roduction, R N and yon fficiency of 8 3 % for ( small) C M K I N jobs, 7 Grid Tutorial 0 - % 4 for ( large) / 2 8 / 2 0 0 3 – A p p lic C ation s M S an I M d th e jobs Grid - n ° 1 8 The CMS Stress Test Site CE SE Disk Space (GB) lxshare0393 100 lxshare0384 1000(=100*10)* Number of CPUs CERN lxshare0227 122 CNAF testbed008 40+ grid007g 1000* RAL gppce05 16 gppse05 360 NIKHEF tbn09 22 tbn03 35 ccgridli03 120 ccgridli08 400 ccgridli07 200 Legnaro cmsgrid001 50 cmsgrid002 513(+513) Padova grid001 12 grid005 680 Ecole Polytechnique polgrid1 4 polgrid2 220 Imperial College gw39 16 fb00 450 LYON Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 1 9 CMS Stress Test : Architecture of the sy stem CMS RefDB SE CE CMS software BOSS DB parameters EDG J o R u b o n u tpu t f i l teri n ti me mo n i to ri n g g WN JDL Workload Management System SE CE CMS software UI IMPALA/BOSS d P u sh P u l l d i n ata o f o r i n f o Replica Manager SE CE ata reg i strati o n CE SE CMS software Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 0 Main results and observations w from CMS ork RESULTS Could distribute and run CMS s/w G env ironm ent D in E p ~ O B 1 y 0 sic W 5 0 K ev ents f or s with , 0 SERV A 0 0 j obs in 3 TI O N week uic k ly add new sites to p ast turnaround in bug T est was th N sy e ov ew release E stem f ix labour intensiv erall sy eriod time F p S ere able to q h 2 enerated ~ G stem D G 2 ing e ( sinc . 0 th ould f ix tra resourc e m es new sof tware e sof tware was dev f rag suitable f or f ull integ ide ex and installing was initially sh rov elop ing and ile) aj or p roblem s p ration in distributed p Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p rov iding roduc tion lic d th ation s an e Grid - a n ° 2 1 Earth O e rv ati o n ap p l i c ati o n s Global Ozone (GOME) Satellite Data Processing and V s WP9) ( b alidation by The p K MI , I PSL DataGrid testbed p rocessing env EO sites ( H 4/28/2003 N ollan ironm d, F ran and ESA r o v i d es a collaborativ e ent for 3 geographically distributed ce, I taly) DataGrid is a project funded by the European Union Grid Tutorial - 22 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 2 Earth Observation ESA missions: a • b d a 5 • ou t y 0 0 0 (ER 0 EN 1 G V I b G b S 1 / t e s, y SAT y 2 t e s of d a t a p e r ) f or t h e ne mission (2 0 D x 0 t 2 a ) t a . G e r • nh l e v a • h a p c a ont r nc e l p r ist or ic p r e r l l ow imp • id e mining p a e a c Ea r e il it y h iv t h mod e c c e ss h ig h e a of l a r g e s sc t ions (d , : t o a ssing c r b t o EO t s oc a t e a u r l u e od ov l ic t h ib ie nc t a e f u l l ing c omp … sion, d a l e x t a ) Source: L. Fusco, June 2001 Fe d e r i c o . C a r m i n a t i , E U r e v i e w p r e s e n t a t i o n , 1 M a r c h 2 0 0 2 Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 3 ENVISAT • • 33 55 00 • • LL aa 00 uu MM nn cc hh • • 11 00 • • 22 00 00 MM • • 44 00 00 TT • • ~~ • • 11 • • ~~ ii nn 11 00 00 bb bb 00 dd 00 ee uu 00 rr oo dd oo ss tt rr uu ++ 77 EE pp yy mm pp nn ee ss nn dd tt ee ss ““ ss tt aa nn ee dd ii cc aa aa pp pp rr oo rr oo aa gg dd rr uu aa rr yy tt ss oo nn bb oo aa vv rr aa tt aa rr dd dd ee mm bb aa tt ee mm FF ee tt aa dd rr aa aa ”” ff aa dd tt ee 22 aa rr oo ii vv dd uu cc ii ll ii tt ii ee ss cc ii ee nn cc oo 88 ss tt ,, 22 00 00 22 rr dd tt oo rr cc hh pp ee gg rr oo ee dd uu nn dd // yy ee aa rr cc tt ss ss cc ee ii nn uu EE ss ee uu rr rr oo pp pp rr oo ee jj ee cc tt ss Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 4 Earth Observation Two different GOME p inv es OP N Th S I D ( H A etwork R m s ing h tl y tec h niq u es wil l b e nd) a b ea l y ) Tig - L oos c ou el y c p ou l ed p u s l ed - ing u s M P ing N I eu ra l s l ts tel l ite O A ol l a EGO ( I ta R u es ted ER e res a L a OP N tig roc s s re c u erv rem a h ec k ed b tions ents c a y re c oinc V om A L p I a D A T I red a ident in a ( F ON g a ins rea Grid Tutorial - a 4 ra t g nc rou e) . nd tim nd- b / 2 8 / 2 0 0 3 – a s ed A e. p p lic ation s an d th e Grid - n ° 2 5 GOME OZONE Data Processing Mod Level- 1 retriev e ac Level- 2 v ertic ab C L el I D e th i n A c i d R statistic tual p data p al c ov o data (raw satellite measurements) are analysed to h rov olumn of e E en t arth al meth al q uantities : L ev el- 2 ides measurements of atmosp ’ s surf ac data c data (g ysic h ere at a g Z O N E with en lat/ lon loc in a ation e onsists of round- b iv O data ased ob L ev el- 2 serv data c o- reg ations) and c istered with omp ared using ods Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 6 R aw from T th satellite data e GOME instru m ent P h o f r e EO o c 1 e D y e s s a a i n t a r o g c a f h n a G d l l e n a l i d v O M g e : E d a a t i o n t a Level 1 ESA – P r d o W a c e s s i n t a t o i t h O g o P K o z ER o N f n M r a w e A a L I n p G d r o O M N A R data E f i l e N I D s O IPSL Validate GOME ozone profiles W Level 2 ith Grou nd B ased m easu r. Visu alization DataGrid Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 7 GOME Processing Steps (1- 2 Step 1: Transfer Level1 data to the Grid Storage Element Step 2: Register Level1 data with the ReplicaManager ) S i t e H Replicate to other SEs if necessary S C i t e G S E C S E C I n t erf a c e R M ep a n g R M y C d a ep a t a l i c l o d p u S E i t e C S E S er C e p l ic a te i t e B E S C E E S E a g e ta D a ta D a ta t a i nt a p d S E t a M i n i t e D E S E a R User S C l i c a i t e E E S E C Submit job User i t e F E S u t a i nt a p d u t a i nt a p u d t a a t Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 8 GOME Processing Steps (3-4 Step 3: ) Submit jobs to process Level1 data, produce Level2 data C Step 4: L Transfer Level2 data products to the Storage Element F N L : : F P N F N : : L F P N F I n f orm N : : P F C N a d t i on ex Rep l i ca a t a l og a r c C S Se I n M D r c h Se S C t erf a y E t x q s ta R ecut a b e ue s B t rok SS tif SS E e r e k ic a te C iE t e D E C er S E i t e C S E C S S E C v c iE t e E tus tr ie e iE t e F E Resource j ob L scri p SS C e r h i t e G E C ce e t e ori t i es i t e H h ser R J a Submit job U ut h c I n User ert i f i ca A E E i t e B S E S E s ul t i n l e p ut a i tn a p d d ut a i nt a p L d F N L F L N L F F L N og ic a l f il e n a a ut t a me N P F N P h y s ic a l f il e n a Grid Tutorial me - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 2 9 GOME Processing Steps (5-6 Produce Level-2 / LIDAR Coincident data perform VALIDATION Step 5: Level 2 C O I N C I D LI E N T D D A A ) Visualize Results Step 6: R T A Validation Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 0 Biomedical Applications G en om ic s, an post- gen d proteom om ic ic s, s ilitate the sharin gen om ic test grid f or c Fe d e r i c o . C a r m a t i atab - aw om i n d E U r e v i e w p r e s e n e gen t a t i o n , 1 M a r c h 2 om 0 0 roc am d are algorithm parativ , P g of ases an ed ic an Explore strategies that f ac M ic s s im aly ages sis ess the hu ou prod al im u n t of c ed d b ata y agers in ge d igital hospitals. 2 Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 1 Biology and Bio- inf or ap The lic of b e no mi c c o Biologist d i s id t r i b r e amp p ar e d seq ing of the human at ab as e s w uences of p g e , g e o g r ith alread roteins b ap hi c al l y y map p ed nt l i ng s d , at ab t o as i d e e s , r nt i f y e l at e s d elonging to e ( D t o i mi l ar , N A seq uencies) d i t i e i f f e s o r r e s e nt c o mmo n t s d gBL ex amp seq at ions ical goal of these algorithms is to anal y i f f e s d p uences of human genetic cod Typ d d make use of l ar ut e entified seq s io- informatic algorithms to erform research on the map g at ic international community of Biologists has a keen interest in using p p m A S T ( Basic L ocal A le of such an ap uences of p p lignment S earch Tool ) lication seeking p roteins or D N A is an articular in the genomic cod Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p e lic ation s an d th e Grid - n ° 3 2 Grid technology opens the perspective of large com pu tational pow data sou A er and easy access to heterogeneou rces. grid for health w disk and com pu ou A S T 8 & D een b ork for prom io- inform for sharing oting standards and atics and m edical iom edical grid is b eing deployed b y the D ataGrid proj ect http://d 1 etw rces, ew atics first b I ld provide a fram ting resou fostering synergy b inform s O C = b 2 s 7 . c & o T r B d L i s . l u = E /f e N _ P p- c R O g J i /s & R r c C N hi d = E a P d b _ ? R A C C N T _ I A O :6 N = 3 3 D 4 & 5 S & E C S A S L I L O E N R = = 2 P 2 R 1 O 5 J 9 _ 2 I 0 S 0 2 - 1 0 - T Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 3 Biomedical requirements Large user c o m m un High priority jobs i t y ( t h o usan d s o f an o l o D n y m o us/ gro I n up v an agem at a up d ersi o i n m an h o ac c S ec en t um P a Li m c e en c ra t ( a an M ul at e T T ry p n a y et w t i o q resp I B s o f ear) P o n u c ic se a sa a e a rs tion n d be c tw e pu ta grid - w a s om e n u se r tion tion site hou n - w id e d s of / im ge id e Operated on by 10’s of ori th m s i pel i ne proc essi ng pi pel i ne desc sc / l iz P u rf a l l e d ity m te al g i t ed f ast at a ge tiv om l um ages i n i sk en d g o i t al d uri t y d c il e c in at es an agem sp i m n Large v ra d te gi n at a m priv users) h edu ri pti on l ang u ag e / l i ng rk o n n se t i m e ueues Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 4 Diverse Users… Patient has free access to own medical data Ph y s ic ian has comp lete read access to p atients data. F ew p ersons hav e read/ write access. R es ear c may N B io l o h ob tain read access to anony ominativ g is serv C er s e data shou em ic ld b e b lank mou s medical data for research p u ed b efore transmission to these u sers rp oses. t has free access to p h u b lic datab ases. U se web p ortal to access b iolog y serv er ices. al / Ph owns p riv ar m ac o ate data. l o N g ic al m anu f ac tu r er eed to control the p ossib le targ Grid Tutorial - 4 / 2 8 ets for data storag / 2 0 0 3 – A p p lic ation s an e. d th e Grid - n ° 3 5 …and data Biological Data Public and private databases V ery F req H M e d ical d eno th ( do ubles every rm nning 8 - 1 2 m o nth s) ) ats ata D istributed o ag w us f o tro I m ng g S ro uent updates ( versio etero f ast g sem antic es and m ver im ag ing sites etadata Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 6 Web portals for biologists Biologist enters sequences through web interface P ip elined ecution of bio-inform atics algorithm s Genomics comparative analysis (thousands of files of ~Gbyte) Genome comparison takes days of CPU (~n**2) Phylogenetics 2 D ex , 3 D m olecu la r str u ctu r e of p r oteins… The algorithms are currently executed on a local cluster B ig la B u b s ha t gr ow v e b ing p ig clu r essu ster s … r e on r esou r ces – G r id w ill help More and more biologists c omp are larger and larger seq u enc es ( w h ole genomes) … to more and more genomes… w ith f anc ier and f anc Grid Tutorial - 4 / 2 8 ier algorith / 2 0 0 3 – A p p lic ation ms ! ! s an d th e Grid - n ° 3 7 Example GRID application for B iolog dgBLAST or D N datab D A) : d g B L A S to b e search ases to b ed and a pointer to th igh speed (trade of f A score is assigned to every detects onl y e resul ts graph euristic al gorith rel ationsh ips am isol ated regions of Blastn: com against a Blastp : against a vs sensitivity ical l y presented m ong sequences w sim il arity h ich sh are . pares a nucl eotide query pares an am ) candidate sequence sequence nucl eotide sequence datab com e set of e queried. esigned f or h uses an h T requires as input a given sequence (protein f ound and th y ase ino acids query sequence protein sequence Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 8 The Visual DataGrid Blast, a first genomics ap p lication on DataGrid A graphical interface to enter query sequences and select the reference datab ase A script to ex A graphical interface to analyz ecute the B L AS T algorithm on the grid e result Accessib le from the w eb portal genius. ct. infn. it Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 3 9 Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 4 0 Other Medical Applications Complex modelling of anatomical structures Anatomical and functional models, p S urgery R S M R M I S h R modelling ammograph Automatic p simulation imulation of M atoin ealistic models, r eal- time constr aints ar alleliz I s , ar tifacts modeling ies analy ath olog ared and distrib D ata h ier ar ch y , p ar allel simulation sis ies detection uted data management , dy namic indices, op timiz ation, cach Grid Tutorial - 4 / 2 8 / 2 0 0 3 ing – A p p lic ation s an d th e Grid - n ° 4 1 Summary HEP, EO and Biology users have deep interest in the deploym b ent and the ac oosting their c c apac ( 1 / 2 ) om tual availab puter pow ities in an unprec Currently interfacing via A op P I efully ex I D , edented w ay. data p rocessing schem and m es. W ill m ove ore detailed s p erim interfacing ap L R er and data storage onto areas of interactive analysis, H the G evaluating the basic functionality of the tools and their integration into ility of p ents w ill do com lications to G R I D m on w ork under the in um brella of CG HEPCAL (Co m Ap i n p l i c t e g a r t i o a n t i o m La n o o y f e n G r r U ) s w e i d Ca o t o r o k s e w l s s i l l f o i n t o b r e a u t h HEP Co s e e d a s LHC p m a m r o b o a s t o t y d w n i s p f o r t h e e http://lcg.web.cern.ch/LCG/SC2/RTAG4 T t o g h e e r e a t h e e . g Pr o r r e w a i t h . j e m c i n n y t h g e r (G r y p r o j e c t s i n t h e w o r l d a n e m u s t w o r k m HEP w t s i d e Ph y h n a v e , PPD D G a t a , i V T D a G g , Cr o s s g r i d , N o r d u g r i d + U S L) Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 4 2 Summary ( 2 / 2 ) Many challanging issues are facing us : st rengt hen effect iv t est b k o up lut io t he p ns, furt her d w w o rk ill im assiv e p ro d uct io ns o n t he E D G ed eep ev e m p ev ack p lem im ace w elo p lem p m ages t o ent m it h nex ent ing o id ad d d lew t generat io co r int erfacing t hem are co ress gro any new n grid w m p o nent s fo ing user’ s d funct io m p t o E r all E em ut ing D D and G G s. E D G 2 . 0 nalit y. Grid Tutorial - 4 / 2 8 / 2 0 0 3 – A p p lic ation s an d th e Grid - n ° 4 3 Acknowlegements a nd r ef er ences Thanks to the following who provided material and advice J L G G ridP F H P W C ) M , O arris( P S 9 ) . M W ) http: / ( L / L H C R F C C C L ompu E G G P O N P 8 ) , B A ( u R gu L W H A C L P b 1 ) 0 ) , L , J R M ob ontagnat( ertson( L C W G P ) 1 0 , D D ) , F u C arminati( ellmann( L C A G lice) / P O O W P 8 ) sites and docu N B rook( L H C b ) , P H ob son ( C M S ) , J M / lcg. web 9 b lic. web . cern. ch/ lhc- compu . cern. ch/ L C W P lcg. web . cern. ch/ L C G / S C 2 / R TA G 6 http: / / lcg. web . cern. ch/ L C G / S C 2 / R TA G 4 / www. dante. net/ / http: / 1 0 oy le( A tlas) U , C K W P 1 0 ) ting- review- pu b lic/ P u b lic/ R eport_ final. P D F G / lcgapp. cern. ch/ / datagrid- wp8 / edmsoraweb / sty http: / laising( ments http: / http: / P B , T D ontagnat ( ( ( model for regional centres) geant/ H proj ect/ ( . web E E P u C A L G rid u ropean R http: / / / marianne. in2 . cern. ch/ 0 0 1 D ataG / rid- W cedar/ P 8 / doc. info? docu ment_ http: / / / . fr/ eq 3 3 u 2 4 0 9 irements) 0 0 1 / cedar/ datagrid/ wp1 0 doc. info? docu ment_ id= docu ment_ id= 3 3 2 4 1 1 ( R eq ts) eq ts) / www. healthgrid. org www. creatis. insa- ly / id= R grid . cern. ch: 8 p3 etworks) persist/ . cern. ch: 8 . srin. esa. it/ edmsoraweb http: / http: / x se cases) esearch N ( W J ) eview) http: / , J L ) stin( ting- review- pu ting R http: / 8 I E T http: / L P W O reton( aroney http: / A B rank( eview lhc- compu H , V , M ome interesting W S inford( randi( edmsoraweb on. fr/ . cern. ch: 8 0 0 M E 1 / D I G R cedar/ I D / doc. info? Grid Tutorial - 4 / 2 8 / 2 0 0 3 – 3 3 A 2 p 4 p 1 lic 2 ( R ation s an d th e Grid - n ° 4 4