Information Technology in Plant Protection Presentation GIS tools for Plant Protection • Prepared by: – Dr. János Busznyák TÁMOP-4.1.2.A/2-10/1-2010-0012 2 Digital Mapping Tools for Plant Protection • Methods of Obtaining Spatial Data – Manual – Geodesy – With the help of Global Positioning – Photogrammetry – Remote Sensing – Manual Map Digitalisation – Scanning Maps – From Digital Files TÁMOP-4.1.2.A/2-10/1-2010-0012 3 Digital Map • Not only the digital form of the contents of a map ready to be used with a computer. • No need for segmentation, the elements are of real size, has accurate fitting, has topology, often uses layers and objects. • Primary Data Obtaining Methods – Measurements (GPS) – Existing Reports Mostly vector data are obtained from primary data obtaining methods. • From Secondary Sources – By digitalization, adding automatic or manual vectorization. In the case of georeferencing and vectorization in secondary methods, the result is also a vector map. If a secondary data collection (scanning) is not followed by vectorization, the result is a digital raster map. TÁMOP-4.1.2.A/2-10/1-2010-0012 4 Raster-Vector Transformation • Aim – New level of GPS analysis (vector) – New publication possibilities – Lower storage and transfer capacity needs • Preparatory steps – Digitalization of map sheets – Georeferencing, eliminating distortions, projection convertion (lots of work) • Pre-processing • Vectorization – Of areas – Of line-like objects – Of objects • Post-processing TÁMOP-4.1.2.A/2-10/1-2010-0012 5 Vectorization II. • Vectorization – Manual – Semi-automatic – Automatic TÁMOP-4.1.2.A/2-10/1-2010-0012 6 Application of the Automatic Method • Automatic vectorization of a soil map – Single bit – Low data density • Automatic vectorization of a topographic map – 8-bit – High data density Black: convert to line Blue: segmented pixels TÁMOP-4.1.2.A/2-10/1-2010-0012 7 Data Input from Text File • Coordinates of the shape file vertex points – site,lat,long,name,HOTLINK – 1,38.889,-77.035,Washington Monument,http://www.nps.g ov/wamo – 2,38.889,-77.050,Lincoln Memorial,c:/ESRI/AEJEE/DATA /WASHDC/linc.jpg – 3,38.898,-77.036,White House,c:/ESRI/AEJEE/DATA/W ASHDC/whse.txt – 4,38.889,77.009,Capitol,c:/ESRI/AEJEE/ DATA/WASHDC/cap.pdf ESRI Arc Explorer JEE tutorial TÁMOP-4.1.2.A/2-10/1-2010-0012 8 Hybrid Data Model, Mashup Map • With the help of hybrid systems, raster and vector data can be used together. – Vector, raster and attribute data are stored separately, in the most suitable way for the model. – The operations are carried out by these systems in the model that is most suitable for the operation in question. – The systems apply a wide variety of vector-raster transformations before and after the operations. – The GoogleMaps service is based on a hybrid data model. TÁMOP-4.1.2.A/2-10/1-2010-0012 9 Data Quality • Facts that mostly influence data quality: – Origin of data – Geometric accuracy – Accuracy of attribute data – Consistency of attribute data – Topologic consistency – Completeness and validity of data TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Georeferencing • Georeferencing is the process of scaling, rotating, translating and deskewing the image to match a particular size and position. The word was originally used to describe the process of referencing a map image to a geographic location. Source: http://wintopo.com/help/html/georef.htm • Usual ways: – World file – Header (GeoTiff, GeoJP2…) TÁMOP-4.1.2.A/2-10/1-2010-0012 11 Header • Certain image formats include georeferencing information in the header of the image file: – img, – bsq, – bil, – bip, – EXIF – ITT – GeoTIFF – grid TÁMOP-4.1.2.A/2-10/1-2010-0012 12 Word File • Georeferencing information is stored in a separate word file: – The word file contains 6 parameters of an affin transformation that means a connection between the image coordinate system and that of the world coordinate system. – The images are stored as raster data, where each cell of the image is identified by a row and coloumn number. – The name of the word file has to be the same as the image file and be in the same folder. TÁMOP-4.1.2.A/2-10/1-2010-0012 13 Georeferencing with the Help of 2 Reference Points segítségével TÁMOP-4.1.2.A/2-10/1-2010-0012 14 Graphic Georeferencing - Rubber sheeting TÁMOP-4.1.2.A/2-10/1-2010-0012 15 Projection Systems, Conversion • • • • • Projection, date Geoid, geoidundulation Uniform National Projection (UNP - EOV) Transformation Base points, base point systems TÁMOP-4.1.2.A/2-10/1-2010-0012 16 Classification of Projection • Based on image surface shape – Cylinder projection – Cone projection – Flat projection – Other projection • Based on image surface axle – Polar (normal) – Transversal (equatorial) – Oblique (not normal difference) • Based on the contact of the image and base surface – Tangent – Transect TÁMOP-4.1.2.A/2-10/1-2010-0012 17 Important Projection Systems • Systems without projection • Dual projection Hungarian systems • Stereographic projection systems (BUDAPESTI, MAROSVÁSÁRHELYI) • Oblique Mercator Projection • HÉR, HKR, HDR • EOV • Gauss-Krüger • UTM (Universal Transverse Mercator) • GEOREF (World Geographic Reference System) TÁMOP-4.1.2.A/2-10/1-2010-0012 18 Important Ellipsoids • Reference ellipsoids nearing an area of the Earth surface • The centre of the ellipsoid is that of the Earth • The axis of rotation is that of the Earth’s – Parameters • Major axis (equatorial radius) • Oblateness (connection between equatorial and polar radius) • If the centre of the ellipsoid is moved until it fits to the examined area with the least error, we will get the geodesic date. – Bessel (stereographic) – Kraszovszkij (Gauss-Krüger) – Hayford (UTM) – WGS-84 (GPS), – IUGG-67 (EOV) TÁMOP-4.1.2.A/2-10/1-2010-0012 19 Some Interesting Projections • Geographic Projection – WGS 1984 Datum • Ortographic Projection – SPHERE Datum • Eckert IV. Projection – WGS 1984 Datum TÁMOP-4.1.2.A/2-10/1-2010-0012 20 Geoidundulation • • • GPS measurement gives the height above the ellipsoid (h). When calculating height above sea level(H), geoidundulation has to be taken into consideration. Geoidundulation is the separation between the equipotential surface that represents a mean ocean surface and a reference ellipsoid (h=H+N, where N is the value of geoindundulation of the point). Geoid: the surface of oceans and seas, if connected by small canals under the land(Listing 1873) TÁMOP-4.1.2.A/2-10/1-2010-0012 21 Uniform National Projection • The starting coordinates have been placed 200km to the South and 650 km to the West. Thus, the Y coordinates are lower than 400, and the X coordinates are always higher than 400, which means they are easy to distinguish. TÁMOP-4.1.2.A/2-10/1-2010-0012 22 Uniform National Elevation Network(EOMA) • The first elevation of Hungary was carried out based on the Mediterranian base level from 1873-1913. – Height of Nadap main base point: 173,8385 m. • Baltic base level after World War II. – Height of Nadap main base point: 173,1638 m, which is 0,6747 m lower. TÁMOP-4.1.2.A/2-10/1-2010-0012 23 Transformation • ETRS89 (OGPSH) points transformed into the Uniform National Projection (EOV) system and back • The points for the transformation are chosen automatically • Local transformation based on the common points of the OGPSH and EOV systems • With 8 common points in Hungary • With refined Geoidundulation data Etrs89-Eov-Hivatalos-Helyi-TérbeliTranszformáció TÁMOP-4.1.2.A/2-10/1-2010-0012 24 Base Points • Database of Altitudinal Base Points • Database of Horizontal Base Points • Database of OGPS Base Points • Országos GPS Hálózat pontjai (Points of the National GPS Network-OGPSH) TÁMOP-4.1.2.A/2-10/1-2010-0012 25 Videos and Animations for Chapter 1. • Video – Georeferencig (graphical) • Animation – Georeferencing – Geoidundulation – Shape (create) TÁMOP-4.1.2.A/2-10/1-2010-0012 26 Tasks for Chapter 1. I. question Identify the value of geoid-undulation at the Parliament Building, Budapest, Hungary with the help of EHT (or any other) software . II. question Digitalize any map sheet with the help of a scanner. Georeferate it with 3 reference points with the help of GEOREGARCVIEW software. The necessary coordinates can be obtained from mapservers (eg. Googlemaps). III. question Digitalize another map sheet overlapping the previous one with the help of a scanner. Georeferate with 3 reference points with the help of GEOREGARCVIEW software. Open it together with the georeferated file of the previous task with ArcExplorer JEE (or any other) and check its accuracy. The necessary coordinates can be obtained from mapservers (eg. Googlemaps). TÁMOP-4.1.2.A/2-10/1-2010-0012 27 GNSS Device System • Global Positioning – The coordinates of 3 satellites at a given time are needed. – If time can be measured accurately, then wave spread speed and the time will help calculate how far we are from the satellite. – In the case of 1 satellite, it will give a sphere surface. TÁMOP-4.1.2.A/2-10/1-2010-0012 28 Global Positioning II. • If there is a connection with 2 satellites, then we are on the sphere of both satellites. The section of the two spheres is a circle. • The section of the sphere of the third satellite and the circle will be two points, one of which can always be excluded (eg. Points far from the earth surface). TÁMOP-4.1.2.A/2-10/1-2010-0012 29 Differential Correction TÁMOP-4.1.2.A/2-10/1-2010-0012 30 Network RTK in Hungary(2010) • GNSSNet • NtripCaster IP address, port: 84.206.45.44:2101 TÁMOP-4.1.2.A/2-10/1-2010-0012 31 Multi-Base System in Hungary ( 2010) • Geotrade GNSS – Host: www.geotradegnss.hu – Port: 2101 TÁMOP-4.1.2.A/2-10/1-2010-0012 32 Single-Base System (2010) ( 2009 • Georgikon RTK coverage • DGPS forthe whole country of Hungary – http://gnss.georgikon.hu – 193.224.81.88:2101 TÁMOP-4.1.2.A/2-10/1-2010-0012 33 Trimble European VRS System TÁMOP-4.1.2.A/2-10/1-2010-0012 34 Mobile Internet • CSD (Circuit Switched Data) – Line connected mobile internet - 9,6 kbit/s - 1G • GPRS (General Packet Radio Service) – Package connected - 115 kbit/s - 2G • EDGE (Enhanced Data Rates for GSM Evolution) – GPRS reinforcement- 236 kbit/s-os (112-400) - 2,5G • 3G – 3G mobile network, video call 384 kbit/s - 3G • HSPA (High-Speed Downlink/Uplink Packet Access) – HSDPA theoretic data transfer speed depending on device and coverage: up to 21 Mbit/s – 3,5G • 4G LTE (Long Term Evolution) – 1Gbit/s - 4G TÁMOP-4.1.2.A/2-10/1-2010-0012 35 Videos and Animations for Chapter 2. • Video – Trimble VRS system • Animation – GNSSNet service – Geotrade GNSS – Georgikon GNSS Base TÁMOP-4.1.2.A/2-10/1-2010-0012 36 Tasks for Chapter 2. I. question Find the data of the accessible satellites of the Galileo and BEIDOU systems at a given time. II. question Find the terrain control stations of the Navstar GPS system at a given time. III. question Find the worst measurement site on the Earth’s surface concerning ionosphere state at a topical time. Use the ‘space weather forecast’ of Australia (or any other information source). http://www.ips.gov.au/Space_Weather TÁMOP-4.1.2.A/2-10/1-2010-0012 37 Terrain GNSS Measurement and Processing • GNSS Measurement – Planning (almanach) – Realization (online correction: procession too) – Data transfer(exchange formats, RINEX - Receiver Independent Exchange Format) – Processing (vectors, transformation, error correction) – Network equalization (OGPSH – National GPS Network) TÁMOP-4.1.2.A/2-10/1-2010-0012 38 Aim of GNSS Measurement Planning • Guarantee of integrity – GNSS – Way of correction • Guarantee of nedded accuracy – Accuracy of the Rover device – Way of correction – Satellite constellation – Minimalization of other disturbing facts TÁMOP-4.1.2.A/2-10/1-2010-0012 39 Devices for Planning • GNSS satellite data – Almanach • Trimble Planning • Leica Satellite Availability • Topcon Occupation Planning • Receiving correction data – Mobile internet • Gprs coverage • Style, devices, realization TÁMOP-4.1.2.A/2-10/1-2010-0012 40 Almanach • Timing – Further in time – Back in time • General – YUMA formátum,USA Coast Guard Navigációs Központ (YUMA format, USA Coast Guard Navigation Center) – A dátum és a GPS-hét kapcsolata a GPS-naptárban (the connection between date and GPS-week in the GPS calendar) • Trimble • Leica • Topcon TÁMOP-4.1.2.A/2-10/1-2010-0012 41 Trimble Planning TÁMOP-4.1.2.A/2-10/1-2010-0012 42 Channels of Correction Data • Relative – Real time – Radio – Satellite – Internet • Post-processed – Digital data transfer TÁMOP-4.1.2.A/2-10/1-2010-0012 43 Realisation of Measurement • • • • • Connection to satellites, controller Connection to correction service Setting measurement style Starting measurement Recording data TÁMOP-4.1.2.A/2-10/1-2010-0012 44 Preparation of Measurement • Obtain, check and converse existing spatial data • Set up a measurement plan – – – – Need for accuracy Available devices and services Specialities of the area Select measurement method • Places of measurement • Conversion to the format of the terrain device • Upload data to the terrain device TÁMOP-4.1.2.A/2-10/1-2010-0012 45 End of Measurement • • • • • • • Check measurement data Inspection Delete, edit New recording Data Export in needed formats Turn off terrain device TÁMOP-4.1.2.A/2-10/1-2010-0012 46 Processing Data • Load data from terrain device – Formats – Give coordinate system and date – Examine data load mistakes – inspection – Delete, edit • Export to the format of procession TÁMOP-4.1.2.A/2-10/1-2010-0012 47 GIS procession and Analysis of Data • Upload data to GIS system – Conversions – Analyses – Interpolations – Model building – Simulation – Statistical analysis – Publication • Online correction – Procession • Offline correction – Time of measurement – Obtain correction data – Correction – Check TÁMOP-4.1.2.A/2-10/1-2010-0012 48 Checking Transformation • EEHHTT software – Data input • From file • Via keyboard – Set format of data input – Set data conversion direction – Give coordinates TÁMOP-4.1.2.A/2-10/1-2010-0012 49 Typical Terrain Device System • Adatgyűjtő – Navigation accuracy • ArcPad / palmtop with GPS antenna – GPS accuracy • GPS Pathfinder office / Trimble GeoXH – Geodesic accuracy • Trimble Survey Controller / Trimble 5800 • Data procession – GPS Analyst – GPS Pathfinder Office – Trimble Geomatics Office – ArcGIS TÁMOP-4.1.2.A/2-10/1-2010-0012 50 Description of Continuous Topographic GPS Survey Sample • Aim of survey: automatic data collection for 3D relief model • Place of survey: the island of Kányavári, Hungary • Time of survey: 21. December, 2008. 0920h-1530h • Type of survey: RTK; Format of message transfer: CMR+ • PDOP mask: 6, elevation cutoff: 10 degrees, antenna: Trimble 5800, hant: 2m • Coordinate System Hungary Zone Hungarian EOV • Project Datum HD72 (Hungary) • Vertical Datum Geoid Model EGM96 (Global) • Coordinate Units Meters; Distance Units Meters;Height Units Meters • Name of point DeltaX DeltaY DeltaZ Slope Distance RMS • 25001 13189,539m 1880,080m 11396,001m 17531,898m 0,002m • Name of point X Y H • 25001 142686.277 505893.164 109.042 TÁMOP-4.1.2.A/2-10/1-2010-0012 51 Basic GPS Elements of Precision Farming • • • • • Take sample Yield mapping Sensors Auto pilot system Mass flow or sprayer control • Row control • Seeder control TÁMOP-4.1.2.A/2-10/1-2010-0012 52 Precision Management System (IKR) • • • • • • • • • • 1. GPS survey of field blocks, soil sample taking plan 2. Take soil sample according to plan every 3-5 acres 3. Soil examination (extended and holistic) 4. Make nutrient content maps 5. Information, services for professional advice, analyses 6. Agrochemical service 7. Differentiated fertiliser plan 8. Differentiated nutrient output, plant number plan 9. Seeding with base station 10. Precision herbicid plan (based on Hu, KA, pH map and weed uptake) • 11. Ffertiliser quantity, upload into professional advice system • 12. Download data from the Internet TÁMOP-4.1.2.A/2-10/1-2010-0012 53 Precision Management System • IKR TÁMOP-4.1.2.A/2-10/1-2010-0012 54 Evaluation of Tillage Experiments • Spreadsheet TÁMOP-4.1.2.A/2-10/1-2010-0012 55 Evaluation of Tillage Experiments II. • Spreadsheet • GIS software (weed density) • GIS software (weed density) TÁMOP-4.1.2.A/2-10/1-2010-0012 56 3D Model TÁMOP-4.1.2.A/2-10/1-2010-0012 57 Videos and Animations for Chapter 3. • Video – GNSSNet OGPSH • Animation TÁMOP-4.1.2.A/2-10/1-2010-0012 58 Tasks for Chapter 3. I. II. III. question Create a forecast for tomorrow 1200hr and 1215hr above 10 degree elevation cutoff for the area of the Helikon strand, Keszthely, Hungary (Lambda = 46 degree 45 minutes, Fí = 17 degree 15 minutes, h = 150 m). GDOP= PDOP= HDOP VDOP= TDOP= Number of GPS satellites = Number of Glonass satellites= Number of Galileo satellites= Number of Compass satellites= question In the IKR precision management system, which service(s) can use correction GNSS base data? question Is soil sample take in the IKR precision management system realized with a yield map or a grid? TÁMOP-4.1.2.A/2-10/1-2010-0012 59 Remote Sensing Device System, 3D Modelling • Remote Sensing – With the help of remote sensing, objects can be examined that are not in a direct connection with the sensor. – In a narrow sense, the concept of remote sensing is usually used for aerial and space images. In a wider sense, it can also be defined for eg. remote measurements or medical applications. – Remote sensing is the acquisition of information about an object or phenomenon, without making physical contact with the object. In modern usage, the term generally refers to the use of aerial sensor technologies to detect and classify objects on Earth (both on the surface, and in the atmosphere and oceans) by means of propagated signals (e.g. electromagnetic radiation emitted from aircraft or satellites). TÁMOP-4.1.2.A/2-10/1-2010-0012 60 Characteristics of Remote Sensing • The measurement does not influence the examined object, or change its state. • It can be used at wavelengths out of the visisble range. The result can be examined in the visible spectrum. • Objective, exact data can be obtained. • Spatial, several dimension data can be obtained. • Lots of data can be obtained from big areas in a short time. • Areas that can not be reached or examined with other methods can be examined. TÁMOP-4.1.2.A/2-10/1-2010-0012 61 Clasification of Sensors • Active sensors – sense the reflection of their own radiation • Passive sensors – have no emission • One or more wavelength range • Images with more than one band are called (depending on the number of bands) multispectral or hiperspectral. TÁMOP-4.1.2.A/2-10/1-2010-0012 62 Information from Sensors • Geometric – pixel: the space of one point of the image measurable on the earth surface, its real extension. • Spectral – the value of radiation from the object • Radiometric – characterises the colour depth of the pixels • Temporal – the time interval between the images TÁMOP-4.1.2.A/2-10/1-2010-0012 63 Electromagnetic Spectrum • Wavelength, frequency – Visible light (0,4 - 0,7 µm) – Infrared (0,7 µm felett) – Ultraviolet (0,4 µm alatt) TÁMOP-4.1.2.A/2-10/1-2010-0012 64 Atmospheric Effects • Scatter - Multi path scattering • Occlusion – Influencing factors • • • • • Traveled distance Radiation energy Composition of the atmosphere Size of particles Wavelength TÁMOP-4.1.2.A/2-10/1-2010-0012 65 Visible and Infrared Range • Chlorophyl absorbs the energy of the wavelengths between 0.45 and 0.67 µm,mostly blue and red colours, thus the colour of the healthy plant is green. • In an unhealthy plant, the yellow colour together with the green can be caused by red reflection caused by chlorophyl decrease. • Reflection within the range 0.7 and 1.3 µm highly depends on leaf structure (sort specific), and dramatically increases. • Effect of stratification, water occlusion bands above 1.3 µm. • Above 1.3 µm, reflection is inversely proportional to the whole water content of the leaf. TÁMOP-4.1.2.A/2-10/1-2010-0012 66 Visible and és Infrared Range II. • The reflection curve of plant sorts are identifiable. • Image correction (atmospheric distortion) • Sample points • Spectrum TÁMOP-4.1.2.A/2-10/1-2010-0012 67 Spectral Bands and Resolution of Landsat TM • • • • • • • TM 1 TM 2 TM 3 TM 4 TM 5 TM 6 TM 7 0.45 – 0.52 µm(blue) 0.52 – 0.60 µm(green) 0.63 – 0.69 µm (red) 0.76 – 0.90 µm(near infrared) 1.55 – 1.75 µm(medium infrared) 10.42 – 12.50 µm(thermal infrared) 2.08 – 2.35 µm(middle infrared) 30 m 30 m 30 m 30 m 30 m 120 m 30 m TÁMOP-4.1.2.A/2-10/1-2010-0012 68 Planned Objects of Satellite Sensing • ASPRS (ASPRS satellite database) TÁMOP-4.1.2.A/2-10/1-2010-0012 69 Hiperspectral Imaging in Hungary • 2002. DLR DAIS, 79 band system • 2006. with the help of AISA DUAL hiperspectral camera, aerial data collection service was launched by the University of Debrecen (Hungary) and the Ministry of Rural Development. – Senses in a maximum of 498 bands, at the wavelength of 0.45–2.45 micrometres. TÁMOP-4.1.2.A/2-10/1-2010-0012 70 LANDSAT 5 TM • National Aeronautics and Space Administration (NASA) and U.S. Geological Survey (USGS) (1999) • Images in 7 bands (6 bands 30 m, termal-infra 60 m terrain resolution) • Sun-synchronic orbit (the satellite travels above a given site at the same local time) • Circulates at the height of 705 km • Can take images of an area of 185x170 km every 16 days TÁMOP-4.1.2.A/2-10/1-2010-0012 71 Application of Landsat Images • TM 1 0.45 – 0.52 µm differentation of land from plants, mapping of artificial surfaces. • TM 2 0.52 – 0.60 µm mapping plant cover, identification of artificial surfaces. • TM 3 0.63 – 0.69 µm differentation of planted surfaces from plantless surfaces, identification of artificial surfaces. • TM 4 0.76 – 0.90 µm identification of plant sorts, definition of green mass, survey of plant vitality, mapping water surfaces, mapping soil water content. • TM 5 1.55 – 1.75 µm examination of soil and plant water content, differentation of cloudiness from snow blanket. • TM 6 10.42 – 12.50 µm mapping heat emission (plant stress, heat pollution) • TM 7 2.08 – 2.35 µm differentation between rock types, mapping plant eater content TÁMOP-4.1.2.A/2-10/1-2010-0012 72 Ortophoto • Imaging : central perspective • Photogrammetry: defines the extention of real objects from the sizes taken from the image – The resulting ortophoto (image data of the Earth surface obtained by a satellite or aerial data collectors with geographic reference) can comprehensively be used with GPS systems – During the planning and realisation of imaging, a GPS device system and adequate relief data are needed. TÁMOP-4.1.2.A/2-10/1-2010-0012 73 Photogrammetry • Photogrammetric evaluation is based on stereoscopy with perspectivic mapping between aerial and space images taken using central projection. – The essence of stetoscopy is that given terrain objects are mapped in different ways in images from different sources. The task of photogrammetry is to measure the difference between parallaxes, and calculate spatial coordinates. TÁMOP-4.1.2.A/2-10/1-2010-0012 74 Remote Sensing Data in Agriculture • • • • • • Differentation of types of vegetation Cover and yield Calculatio Productivity of biomass Vitality and disease of flora State of soil – IMG files • View • Select bands • Colour bands – Erdas ViewFinder 2.1 – http://rst.gsfc.nasa.gov/Front/overview.html – FÖMI oktatóanyag (tutorial of the Institute of Geodesy, Cartography and Remote Sensing, Hungary) TÁMOP-4.1.2.A/2-10/1-2010-0012 75 Application of 3D Models • • • • Model of objects Relief model Terrain model Elevation model – Digital elevation model (DEM) is the topographic visualisation of the earth surface. It is usually used for relief maps, 3D visualisation, waterflow modelling, and in the case of aerial image correction. Applies remote sensing data or traditional land surveying data. – Raster based elevation model – Vector based elevation model TÁMOP-4.1.2.A/2-10/1-2010-0012 76 Raster and Vector Models • Source elevation data create regular grid cells. The size of the cell is constant within the model. The height of the relevant geographic area can be considered constant in the same grid cell. • Divides space into triangles not covering one another. – Vertices of every triangle are data points, with the value of x, y, z. – The points are connected with lines, which gives Delaunay triangles. – A TIN (Triangulated Irregular Network) is a complete graph, which keeps its topologic connection with the relevant element (intersection, edge and triangle). – Input data fit directly into the model. TÁMOP-4.1.2.A/2-10/1-2010-0012 77 Global Relief Model • SRTM (Shuttle Radar Topography Mission 2000) program – Digital relief of about 80% of the Earth’s surface, with the help of radar system (Endeavour 11 days) – Radar-interferometry, with two receivers 60 m from one another – Mapped area: 60 degrees North, 57 degrees south – Resolution 3 (USA 1) arcsec TÁMOP-4.1.2.A/2-10/1-2010-0012 78 Global Relief Model II. • TanDEM-X 2010, (TerraSAR-X) – Mapping of the whole surface of the Earth – Horizontal resolution 12 m, vertical resolution: 2m. – Two-radar remote sensing satellite with stereo microwave radar device, at the height of 514 km – Polar sun synchronic orbit – Radiowaves emitted from a satellite with the help of Synthetic-aperture radar (SAR) technique and then reflected from the surface are received with the antenna on the satellite , or the same surface is photographed from two different points. TÁMOP-4.1.2.A/2-10/1-2010-0012 79 3D Relief Model • The digital relief model of Hungary, 5m resolution – 1:10 000 scale EOTR database was used – A GRID derived from vectorized level lines. TÁMOP-4.1.2.A/2-10/1-2010-0012 80 3D Relief Map • Generated from several sources – Level-line digitalization – Digitalization of elevation points – Import GPS survey points – Correction (aerial photo) – Model generation – Publication • Generation from direct GNSS measurement TÁMOP-4.1.2.A/2-10/1-2010-0012 81 Videos and animations for Chapter 4. • Video • Animation – Elevation Model TÁMOP-4.1.2.A/2-10/1-2010-0012 82 Tasks for Chapter 4. I. question Find an aerial image of your place of living from internet sources. II. question Find a space image of your place of living from internet sources. III. question Measure the area of the Kányavári Island (Kányavári-sziget), Hungary on the photos of 1990., 1992. and 2002. Use Erdas ViewFinder (or any other IMG viewer). The images can be found on the remote sensing tutorial website of FÖMI http://www.fomi.hu/taverzekeles_oktatoanyag TÁMOP-4.1.2.A/2-10/1-2010-0012 83 Spatial Data Databases • Types of Mapservers – Static webmaps – Dynamically created webmaps – Animated webmaps – personalized webmaps – Open, reusable webmaps – Interactive webmaps – Webmaps suitable for analysis – Collaborative webmaps TÁMOP-4.1.2.A/2-10/1-2010-0012 84 Types of Webmaps II. • Static webmaps – No animation and interactivity – Only created once, infrequently updated – Mostly scanned paper based maps • Dynamically created webmaps – Created on demand, often from dynamic data sources – Created by server (ArcIMS –ArcSDE) – WMS protocol TÁMOP-4.1.2.A/2-10/1-2010-0012 85 Types of Webmaps III. • Animated webmaps – Show changes in the map over time (water currents, wind patterns, traffic info) – Real time, data from sensors – Updated Rregularly or on demand • Personalized webmaps – Allow user to apply own data filtering, selective content – Personal styling and symbolization – OGC SLD WMS uniform system (Styled Layer Description) TÁMOP-4.1.2.A/2-10/1-2010-0012 86 Types of Webmaps IV. • Open, reusable webmaps • Complex systems, open API(Google Maps, YahooMaps, BingMaps) • Compatible with API „Open Geospatial and W3C Consortium” standards • Interactive webmaps • Chengeable parameters • Easy navigation • Events, descriptions, DOM-manipulations TÁMOP-4.1.2.A/2-10/1-2010-0012 87 Types of Webmaps V • Analytic webmaps – Offer GIS-analysis • Geodata uploaded by user • Geodata provided by server • Analysis is carried out by a serverside GIS, results of analysis are displayed by the client. • Collaborative webmaps – Geometric features being edited by one person can not be changed by any one else at the time. – Quality check is needed before publication (OpenStreetMap, Google Earth, Wiki- Mapia…). TÁMOP-4.1.2.A/2-10/1-2010-0012 88 ‘FÖMI’ • ‘Institute of Geodesy, Cartography and Remote Sensing’, Hungary • Földmérési és Távérzékelési Intézet fontosabb adatbázisai (important databases of the Institute of Geodesy, Cartography and Remote Sensing) TÁMOP-4.1.2.A/2-10/1-2010-0012 89 Hungarian National Rural Network (AIR) • To continuously inform farmers and experts, to provide professional background knowledge for tenders and developments. • Its knowledge base is based on professional news, events, articles, studies, publications-published in an organised, updated system. • A further aim of the site is to prepare for online data service (logbooks, electronic submission of data of farmers working on vulnerable areas), to give info on data in connection with agri-environmental management, to publish relevant thematic maps and to ensure agrar forecast. TÁMOP-4.1.2.A/2-10/1-2010-0012 90 AIR Public Map Library • 1:200.000 scale genetic soil map of Hungary • 40 soil types, 80 sub types, with colours and colour shades • Physical soil kinds (9 categories) with striping • Soil formation rock (28 categories) betűjelekkel TÁMOP-4.1.2.A/2-10/1-2010-0012 91 MARS (Monitoring Agriculture by Remote Sensing) terményhozam-előrejelző rendszer • Obtain, process and store weather data • Apply weather data in the agrometeorology model of the crop growth monitoring system (Crop Growth Monitoring System, CGMS) • Process NOAA-AVHRR and SPOT-VEGETATION satellite images using CORINE land coverage data (CORINE Land Cover, CLC) • Common Research centre – Statistic analysis of data – Quantity forecast – Short time crop yield forecast TÁMOP-4.1.2.A/2-10/1-2010-0012 92 MARS • Monitoring Agriculture by Remote Sensing TÁMOP-4.1.2.A/2-10/1-2010-0012 93 ‘FÖMI NÖVMON’ (Plant Monitoring) TÁMOP-4.1.2.A/2-10/1-2010-0012 94 IKR Precision Map Server TÁMOP-4.1.2.A/2-10/1-2010-0012 95 Soil Data Publication (Georgikon Mapserver, Hun) TÁMOP-4.1.2.A/2-10/1-2010-0012 96 INSPIRE Geoportal • (Infrastructure for Spatial Information in the European Community-INSPIRE) • ‘The INSPIRE Geoportal provide the means to search for spatial data sets and spatial data services, and subject to access restrictions, view and download spatial data sets from the EU Member States within the framework of the Infrastructure for Spatial Information in the European Community (INSPIRE) Directive. • Aims at making available relevant, harmonised and quality geographic information to support formulation, implementation, monitoring and evaluation of policies and activities which have a direct impact on the environment.’ • (www.inspire-geoportal.eu) TÁMOP-4.1.2.A/2-10/1-2010-0012 97 Spatial Data Directive • Inspire should be based on the infrastructures for spatial information that are created by the Member States and that are made compatible with common implementing rules and are supplemented with measures at Community level. These measures should ensure that the infrastructures for spatial information created by the Member States are compatible and usable in a Community and transboundary context. TÁMOP-4.1.2.A/2-10/1-2010-0012 98 Inspire2008 metadata • Member States shall ensure that metadata are created for the spatial data sets and services corresponding to the themes listed in Annexes I, II and III, and that those metadata are kept up to date. TÁMOP-4.1.2.A/2-10/1-2010-0012 99 INSPIRE Geoportal • Online access to a collection of geographic data and services • Does not store or maintain data • Metadata, catalogues can be accessed with several search options • With the help of a map server service, maps and metadata can be searched for and browsed. • Personal maps can be created from existing data sources. TÁMOP-4.1.2.A/2-10/1-2010-0012 10 INSPIRE Geoportal Viewer • INSPIRE Geoportal TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Mashup Mapserver Service • ArcExplorer JEE Corine Land Cover mash up map from several sources • http://vektor.georgikon.hu kvsz • http://geo.kvvm.hu clc (80% transparency) Mashup map: a map that includes another (API), made from several internet sources. TÁMOP-4.1.2.A/2-10/1-2010-0012 10 WebMap and publication Picture MapServer Video Website Web service HTML TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Steps of Realization • Steps of realization : – 1. chose topic – 2. create map, upload data • a. Create web album, upload photos • b. Upload video – 3. create website, embed map – 4. publish website TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Videos and Animations for Chapter 5. • Video – Institute of Geodesy Cartography and Remote Sensing – Hungarian National Rural Netvork – Inspire Geoportal – GoogleMaps service • Animation TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Tasks for Chapter 5. I. question Measure the length of the Belső-tó (‘Inner lake’) of Tihany, Hungary with the help of the topographic map service of the Georgikon Mapserver (or any other mapserver). II. question Create a GoogleMaps map in any agricultural topic with at least 5 objects, inserted images and embed it into a website of the same topic. III. question Embed further mapserver services (Bingmaps, YahooMaps…) into the website you have created. TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Plant protection database • Prepared by: – Dr. Máté Csák TÁMOP-4.1.2.A/2-10/1-2010-0012 10 Plant Protection Information Plant protection database Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Plant protection’s databases: Topics • Database management theory – Information, data – Database models, databases – Database Management Systems • Relation model – Base of theory – Normalized database – Catalog, data-dictionary • Plant protection’s databases – Practical problems and their solutions 10 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Database management - Information • Information technology concepts, words of Latin origin, which is intelligence, news, messages, information does. • Definitions: 1) In general, the data information, news of which we consider relevant, and lack of knowledge has decreased. 2) Knowledge gains, the growth of knowledge, and it means reduce uncertainty. 3) The information provided is new data, news which removes uncertainty and consequences. Wikipedia SH Atlas Kalamár-Csák 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Theoretical - Information The information is the same physical reality of the universe as matter and energy. pure information DNA-molecule Computer data input Information processing Meaningful information protein Calculation results 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Theoretical - Information Manifestations: • Clearly pronounced– Explicit – When the information is completely clear to everyone, not in need of explanation. – For example: the Balaton water at 28 °C • Hidden – Implicit – The data connection between a method can be displayed. – For example: statistical calculation (average) 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Theoretical - Data • The data of an object (any thing that relates to the data), to a specific value (character state, completed forms) for the variable (properties, attributes, characteristic, character). – Therefore be considered as a specific data are defined, you define what kind of object that is variable, what value are added. The figures represented the value unit is always connected. • For example: Name: Arvalin LR; Agent: Zinc phosphate; Volume: 4 % 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Theoretical – Data model • A collection of concepts, which clearly describe the structure of a database. – The structure includes the data type and their relationship to the restrictive conditions for the data. – The database conceptual level, logical structure description. 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Entity-Relationship-ER basic elements of data model ENTITIES ATTRIBUTES RELATIONSHIPS 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER - Entities • Entities : are the principal data objects, which all other things to distinguish, and information is to be collected. – Procedures at issue, and whom we want to store data. – For example: Citizens, Workers, Patients, Custumers; Plants, Agents, Phenological phase, Harmful; Cars, Goods, Accounts ... – The entity to a specific value of the occurrence. · 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER - Attributes Attributes: • Internal structure of the entities • are characteristics of entities that provide descriptive detail about them. – Plants of the named individual characteristic such as : name, Latin name, ... • The property values of an individual's actual value is determined. • For example: Peach, Prunus persica, … 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER – Attributes - Key • If a property or properties to a group of clearly specifies, that the value which the individual is involved, together they are called keys. – For example: name in Plants 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER - Relationship The relationships: • the external structure of entities, • the represent real-world associations among one or more entities. • are described in terms of degree, connectivity, and existence. – For example: Plants-Harmful, Accounts-Goods, ... • A particular occurrence of a relationship is called relationship instance. 11 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Datamodel – ER – Relationship - Types The types of relationships: •Independent connectivity •1:1 connectivity •1:N connectivity •N:M connectivity 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Adatmodell – ER – Kapcsolatok 1. 1. Independent connectivity – The two entities independent of each other, if one set of instances, nothing is linked to a single element or another entities. • For example: • Agent’s Id: Employe’s account 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER – Relationship 2. 2. One –to – one connectivity (1:1): • One of the elements of each set of instances of another entity set exactly one element is linked. – For example: Agent’s Id: Agent’s name 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER – Relationship - 1:1 Connectivity Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Adatmodell – ER – Kapcsolatok 3. 3. One-to-many connectivity: • A set of instances of each element of the B element within the multi-set of instances. – For example: Aetiologies: Diseases 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER – Relationships - 1:N connectivity TÁMOP-4.1.2.A/210/1-2010-0012 Adatmodell – ER – Kapcsolatok 4. 4. Many-to-many connectivity: • A set of intstances of all elements of the B element within the multi-set of instances, vice versa. – Például: Plants : Diseases 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model – ER – Relationship - N:M connectivity Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model - ER definition • The data model is a finite number set of entity, their finite number set of properties and their set of relationship. 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Data model - Types Depending on the core 3 is based on storing the physical data model exist. entity •net, hierarchical + property - connectivity + •relation + + - •Object oriented + + + • + object-relational (mixid data model) 12 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Databases • Database: some relation to each other in a structured set of data, stored so that multiply users can access, typically digital form. • The database is a finite number of entities occur, their are a finite number of property value, and the relationship of the presence data model orgonized as a combination. • Benefit: you can use many at once. The data are stored "single" only. 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Integrated database • Linked to all data that are used by different users in different groupings. • The physical placement of data, centrally, redundancy-free or minimal, controlled redundancy occurs . • Centrally controlled – data protection, – entering the new data, and – change existing data. 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Database Management System (DBMS) • A softvare, which provides the connection to the database. • Allows databases – creation, – query the data, – modification, – maintenance, – large amounts of data on long-term safe storage. 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Database Management System (DBMS) • Grouping – According to the number of users • Single-user • Multi-user – Job sharing as • A tasking • Client-Server – Number of storage locations • A stored • Split /shared 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Database Management System (DBMS) • The system components – Data Definition Language (DDL) • User level • Conceptual level • Physical storage level – Data Manipulation Language (DML) – Data Control Language (DCL) 13 TÁMOP-4.1.2.A/210/1-2010-0012 Database Management System (DBMS) – Operating concept 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 DBMS – Operating concept - Explanation 1 Request for information from the database (Application program) 2 Request the interpretation and analysis (DBMS: syntax, existence, rights) 3a Executeable→ to operating system 3b can not execute → to program 4 Contact the exterior container (operating system) 5 The transfer of the requested data (OS, from storage into buffer) 6 The passing of data, feedback for a program 7 The receipt of data into a program. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Database Management System (DBMS) • Two types: – Has a autonomous languages • • • • • Oracle (1977) DB/2 (1983) SyBase (1987) Informix (1981) Ingres (1980) – Plug-in type • IDMS (1983) • SQL (1986) 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation database model – Theoretical basis • In 1970 Dr. Edgar F. Codd (IBM) create the Relation Database Model. • The data model describes the various types of data, their relation, connections, and their privacy procedures. • The collected data are logically separate entity types, entities (table). Determine that the individual entities, whereas we can clearly identify, and also what additional features (attributes). 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – Relation diagram In VirKor database has seven tables and their properties and relations. 13 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – Relational mode of representation • Relation of entities (special tables) shows. • They describe the real world, different entities and their properties. • Plants table Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - Relational mode of representation • The connection between the entities can be depicted in relations. • The data management comes true with relational operations. • Plants – Pests relation 14 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Benefits and disadvantages Benefits: • Mathematical (set theoretical) based on models • Very close to everyday thinking, • Most flexibly modifiable, • Well-separable, can be made independent the three level. Disadvantages: • The power delivery is less effective. – This is not so big trouble already today. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – The properties of relations • Is a clear relation in the database; • The specimens were characterized by rows and columns of entity properties; • The same number of colums in each row; • Columns within a clear relation to the name; • Any column in a row add up to a value (if no value is NULL); • Columns in any order; • Not two are the same place; • There are least a combination of columns that uniquely identifies the row. This is the primery key. • Identify any data: – Relation name – + column name – + value of primery key Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Table, views We do not store each entity value of each property physically. • Base relation. Table Physically stored. • Virtual relation. View Contains no data. We create from tables with relational operations. • Materialized view. Physically stored. We create from tables with relational operations. Change when you change the default tables. • Snapshot Physically stored. Value of tables, views in a certain moment. • Queries, the selection result. Relation is not true and only temporarily exist. • Temporarily tables Temporarily need that operation, task. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Keys Ensure data integrity, consistency and relation exemption. The system automatically checks: • The primary key and foreign key relations between entities(eg., key of plants in a diseases of plants) – matching – Cascading change – Cascading delete Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Keys PRIMARY KEY • Clearly identify a relation the rows. • The primary key (or part thereof) may not be (or not) null value, and should not contain unnecessary columns. • It is important to decide what should be the primary key if you have more options (eg, person identity: identity card number, tax identification, social insurance number) Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Integrity Integrity additional options: • Define a unique index (this columns will not add the same value in two rows) • Given specific field conditions must be satisfied(eg, the check number only value possibility) Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Indexes • To expedite the indexed column in the direct and sequential access. – Auto maintenance, – You can always be created, deleted, – Slows down the change, – Space is needed. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Keys FOREIGN KEY • A column (combination) in relation to the link only to add value, eithers as a NULL value, or the referenced tablaóe with one of the primary key values are equal. • Establish connections between the 1:N relationship. Shall remain valaid for all the changes, data input, deleting. Plant Protection Information Relation model – Foreign key relationship TÁMOP-4.1.2.A/210/1-2010-0012 Plant Protection Information Relation model – Foreign key relationship TÁMOP-4.1.2.A/210/1-2010-0012 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalization • Normalization is a formal algorithmic process in which the initial data, the negative pattern of consistent application of appropriate rules of succession is logically more transparent better shape form. 15 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalization • The previous steps of design entities well manageable, received a standard take forms. • Algorithmizable. • Result of: – The data will be less need for storage; – The elementary data faster and less errorprone to change; – The database will be logically clearer. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalization – Functional dependence • Any relation of attributes values depend on the values of other attributes. • If one of the attributes of the relation R (X), the independent variable is clearly identified by another attribute (Y), the dependent variable, then we say that Y is functionally dependent on X from the relation R. • Naturally this is a clear relation to the actual content of R is not only valid, but independent of time, for the whole duration of its existence constraint database. • Both the X and Y attributes can be complex, that is consist of several columns as well. • Functional dependence of the usual marked with R.X • R.Y Maybe this even a functional diagram, dependency diagram is represent with a different name. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Representation of functional dependence The arrow Z is from.points to an independent attribute of the dependent attribute. Y and Z in the diagram is functionally dependent from X,Y and Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Full functional dependence • General terms, it R in relation it Y attribute functionally if and only if X (composite) is complete attribute, if it is functionally dependent on X from, but does not depend on X has only a real component of his. – If X is not complex, then the functional and the full functional dependence is the same. – strong – weak Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Full dependence •Be P, Q A and P Q. •Q full dependent (functionally) from P, P only if Q does not depend on the part of set •Otherwise, the dependence is partial. – For example: – ORDERITEM {order_id, goods_id, piece} – REPAYMENTS{deptor_id, month, amount, date} – VISIT{visitor_id, date, time, subject, period} 15 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relációs modell – Tranzitív függőség •Depends on the P to S is transitive, if there exists Q A, and P Q, Q S, but the reverse is not true dependecies. – Példák: P Q S – WORKER {perid, name, class_code, class_name} – ORDERHEADER {order_id, custcode, custname, custaddres, date, deadline, totalvalue} – VISITOR{id, name, firm, firmname, firmaddres, …} 15 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms • • • • • • The entity’s structural state Plant Latin name Deseases 0NF 1NF Apple Malus Applemosaic, domestica Impetigo, Apple 2NF powdery mildew, … 3NF Potato Solanum Staining virus tuberosum reticulated, 4NF Blight, Black … rotting, … 15 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – First normal form • First normal form (1NF) is the relation which – Each column has one and only one attribute is present, – Each row is different, – The order of attributes in each row is same, – There are not repeating fields, – Belongs to each line (at least) a unique key, from which all the other attributes are functionally dependent. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normál forms - 1NF - example Plant Latin name Disease Athology Apple Malus domestica Applemosaic virus Apple Malus domestica Impetigo mushrooms Apple Malus domestica Apple powdery mildew mushrooms Potato Solanum tuberosum Staining virus reticulated virus Potato Solanum tuberosum Blight mushrooms Potato Solanum tuberosum Black stemrotting bacterium Wheat Triticum vulgare Pitch staining mushrooms … 16 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – 1NF - Anomalies It can be seen a lot of redundancy (eg Plant and Latin name). Hidden error possibilities (anomalies of change): • Erase anomaly: – If we erase the removal of the wheat disease Pitch staining • Modify anomaly: – If the potato into the new name blight disease are renamed, you can either „new” plants will should be modified or anywhere. • Enter anomaly: – New disease can be entered only if a plant is already ill (primary key can not be part of a NULL value). Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – 2NF • A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. – Elementary primary key 1NF relations are also automatically in2NF. Key relations are complex, however, in order to eliminate anomalies int the change we need to2NF.(This is not to removes all the amomalies, but could significantly reduce their number). This is called decomposition of relations. – The decomposition happens so, that it 1NF from relation with a projection like that 2 NF, we manufacture relations, the primary keys of which the primary key of the original relation, or parts therefor, are those and can only those column that are fully dependent in the new primary key. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – 2NF - Example Plan Latin name Apple Malus domestica Potato Solanum tuberosum Wheat Triticum vulgare Plan Desease Latin name Apple Apple mosaic virus Apple mosaic virus Apple Impetigo Venturia inaequalis Apple Apple powdery mildew Podosphaera leucotricha Potato Virus networking staining Potato leafroll Potato Black stem rotting Erwinia carotovora subsp. atroseptica Potato Blight Phytophthora infestans Wheat Pitch staining Lidophia graminis Picture 16 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – Decomposition (1NF2 NF) • R(A,B,C,D) before decomposition(1NF) – PRIMARY KEY(A,B) – R.A R.D • After decomposition(2NF) – R1(A,D) – PRIMARY KEY (A) • and – R2(A,B,C) – PRIMARY KEY (A,B) – FOREIGN KEY(A), refers to R1 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – 3NF • A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key. • In other words, the 3NF means that only the functional dependence of the primary and the alternative keys can start up. • Employee of the 2NF relation is not in 3NF, because for example, the class (CLASS) is not the primary or alternate key and other columns (CLASS-NAME), BOSS) is functionally dependent on it. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – Decomposition (2NF3NF) • The decomposition happens so, that the 2 NF relation, we take the projection, which includes only those attributes that are exclusivly dependent on the primary key. This is primarily key will remain the same. The other new relation (or relations, if more than one relationship), the primary key attribute of an independent relation dismantled, and the columns of his dependent attributes. Plant Protection Information Relation model – 3NF - Example TÁMOP-4.1.2.A/210/1-2010-0012 Disease Latin name Aetiology Plan Latin name Apple mosaic virus Apple mosaic virus virus Apple Malus domestica Impetigo fungus Potato Solanum tuberosum Venturia inaequalis Apple powdery mildew Podosphaera leucotricha fungus Potato leafroll virus Erwinia carotovora subsp. atroseptica fungus Phytophthora infestans bacteria Lidophia graminis fungus Wheat Triticum vulgare Apple Virus networking Disease kép Apple mosaic virus staining Blight Impetigo Apple Apple powdery mildew Potato Potato Black stem Virus networking staining Black stem rotting rotting Potato Blight Wheat Pitch staining Plan Apple Pitch staining … 16 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – Decomposition (2NF3 NF) • General terms, if the A, B, C, D columns (any compound can be) of 2 NF relation – R(A,B,C,D) • PRIMARY KEY(A) • R.B R.C • 3NF is the decompositions of the re-establishment relations of the following means: – R1(B,C) • PRIMARY KEY(B) és – R2(A,B,D) • PRIMARY KEY(A) • FOREIGN KEY(A), refers to R1 • The relation R can be set back at any time is clearly a combination of R1 and R2 (B). It is however, that the splitting is done according to the principle set out above. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Relation model – Normalforms – Decomposition - 3 NF Notes: • Not always appropriate to the 3NF shape (e.g., address and zip code). • Most database management system enough to 1 NF, and even the primary key is not required!!! Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Catalog, data dictionary • The database – definition, – relationships, – storage, – how to use maintaining tables, • views of all. • System administration carry out tasks. 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Plant protection database • Pesticides Register • VirKor – assistant educational material 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Records of pesticides • The aim of a database Task of the ER-chemistry Co. register of pesticides manufactured by the Planning • The register should include: – the origin of certain pesticides, – the elements needed to produce the drug, – the possible application areas. • It is assumed that: – a pesticide may be single or multi-component, – more may be used against the pest, – one component can be derived from multiple suppliers. 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Entity types • Pesticides(id, name, degree of hazard, price) • Factories (factory code, name) • The fields of application of drugs (pest , type) • The drug components (compnent name) • The Transporters (Transporter code, date, name, address) 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Entity type of ER-model • • • • • Pesticides(id, name, hazard, price) Factories(id, name) Pests(pestname, type) Components(name) Transporters(id, date, name, address) 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - - ER-diagram 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Relations • Where can produce ? Factories:Pesticides (1:N) • What do you apply? Pesticides:Pest (M:N) – A pesticide plant produces only one, but a plant can produce more gain. – A pesticide may be used against several pests, and in a pest can destroy more times. • What are the ingredients? (M:N) Pesticides:Components – A pesticide consists of several components, but other substances may also be a component creator. • Where did it come from? (M:N) Components:Transport – Carry more of a component supplier, but a number of component suppliers will also be distributed 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Relation model 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Relations Only the primary keys • Pests(pname, type) • Factories(fid, name) • Components (cname) • Transports(tid, tdate, tname, taddress) Primary keys and foreign keys • Pesticedes(pid, name, hazard, price, fid) • Applies(pid, pname, term) • Elements(pid, cname, volume%) • Origines(cname, tid, tdate, quantity) 17 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Pest table 18 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Pesticide form 18 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Pests form 18 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Pesticides Register - Factories form 18 Plant Protection Information VirKor program assistant educational material TÁMOP-4.1.2.A/210/1-2010-0012 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - assistant educational material The Virkor is an assistant educational program, that helps students understand how to recognize diseases of different plants. • Demonstration boards are modern. • Educational resource for students of plant doctor. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How does it works? • Stored in the database: – plants, – diseases, – these relations. • The displayed images are stored in a folder (locally or server). Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – Diseases of plant: Apple Apple proliferation phytoplasma Podosphera leuchotricha Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - Diseases of plant: Apple Apple mosaic virus Monolinia fructigena Plant Protection Information VirKor program – One disease on different Plants: Mosaic virus on cucumber and apple. TÁMOP-4.1.2.A/210/1-2010-0012 Plant Protection Information VirKor program - Symptoms: Necrosis of tissue TÁMOP-4.1.2.A/210/1-2010-0012 Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? • Photographed hand-capture demonstration boards and then were cleaned. • The boards in the data recorded in an Excel spreadsheet. • Created the relational database model. • Developed the application. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? - Data collection - Digitization Digitization of the demonstration boards Original Cleaned Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? - Data collection - Stored Store the data in a worksheet (1NF) Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? - Modify and correct data structure In this case we supplemented the data with some other properties, for example: add plant parts. Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? - Data collection - Analyze the relationship between data • Functional dependencies: – The Latin names of the plants are dependants of the Hungarian names. – The same refers to the disease, the symptoms and the aetiology (e.g. virus). Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program – How it’s made? – Create a Relation database model Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - How it’s made? - Table – Entity: Plants • Apple and it diseases Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - How it’s made? - Table – Entity: Diseases Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - How it’s made? - Table – Entity: Plants’ diseases Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 VirKor program - How it’s made? - Develop tutor program • Form of Plants • Setting the properties of each tool • Programming each event – For example • load image file in the picture box • Change the status of checkboxes • Etc. Thank you for your kind attention. Made by: Máté Csák PhD. AZ ELŐADÁS LETÖLTHETŐ: - Plant Protection Information TÁMOP-4.1.2.A/210/1-2010-0012 Bibliography • Quittner P. - Baksa-Haskó G. (2008): Adatbázisok, Adatbázis-kezelő rendszerek, DE ATC AVK • KUPCSIKNÉ FITUS I. (2004): Adatbáziskezelés, AIFSZ képzés tananyaga • TÍMÁR L. ET AL. (2007): Építsünk könnyen és lassan adatmodellt!, Pannon Egyetemi Kiadó, 46/2007, pp. 23-99. • HERNANDEZ, M. J. – Viescas, J. L. (2009): SQL-lekérdezések földi halandóknak, Kiskapu. • ULLMAN, J. D. – Widom, J. (2008): Adatbázisrendszerek Alapvetés 2. átdolgozott kiadás, Panem Kiadó. • CZENKY M. (2005): Adatmodellezés - SQL és ACCESS alkalmazás - SQL Server és ADO, ComputerBooks. 20 Bioinformatics • Prepared by: – Sándor Nagy TÁMOP-4.1.2.A/2-10/1-2010-0012 20 Information Technology in Plant Protection Bioinformatics Bioinformatics Databases and homology searching Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Contents • • • • • What does Bioinformatics mean? Structure and operation of DNA Bioinformatical databases Using databases Exercise Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Definition • Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Fields of Bioinformatics • Superindividual Bioinformatics uses systematical modelling in order tp know biological systems • Molecular Bioinformatics does protein and nucleotid analysis and planning • Computing Bioinformatics is focusing on utilization of biological systems Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Aim of Bioinformatics Is to decipher the genetically encoded information, which lead us information on the followings: • 3D sturcture, • Function, • Evolutionary relations. DNA Protein Function Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Questions answered by Bioinformatics • In which other creature can we find the actual sequences? (→ ortholog searching)? • What kind of variatons can occure in a certain creature? (→paralog searching)? • What is the rate of heterogeity in a certain paralog (→searching polymorphism)? • Which positions are important in a given sequency (→ evolutionary conserved) ? Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Basics: Structure of the DNA • • • • • • • • Double helix, in which nucleotide bases on the two strands are connected by Hydrogene bonds: A:T - 2, G:C – 3 H-bonds Base pairing: complemeter nucleotide bases within the long polymer are: A:T and G:C replication, Genetic code- isn’t monotone Two helical chains each coiled round the same axis, and each with a pitch of 34 Ångströms (3.4 nanometres) and a radius of 10 Ångströms These two strands run in opposite directions to each other and are therefore anti-parallel, 5′ (five prime) and 3′(three prime) ends It containes four bases: adenine (A), cytosine (C), guanine (G), thymine (T) Structure of DNA: – http://www.youtube.com/watch?v=qy8dk5iS1f0&feature=player _embedded Information Technology in Plant Protection Bases: replication of DNA • Following from the rule of Base pairing: Hydrogene bonds within the double helix can be pulled apart, both strands are templates for the synthesis of a new strand. Result of this process: same structure • Genetically coded information ensured by the order of nucleotides • DNA replication: – http://www.youtube.com/watch?v =E8NHcQesYl8&feature=related TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Történeti áttekintés • Early 50thies– publishing insuline sequence • 1953 Watson-Crick: Structure of DNA • Early 70thies – creating algorithms for sequenal analysis: – Dot matrix – Local and Global Sequence Alignment – BLAST algoritmus • 1972 first computer stored databases of proteine sequences • 1979 GenBank prototype Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Bioinformatical databases • Gene Bank: NCBI (National Center for Biotechnology Information) – http://www.ncbi.nlm.nih.gov/ • European Molecular Biology Laboratory – European Bioinformatics Institute - EMBL-EBI – http://www.ebi.ac.uk/ • DNA DataBank of Japan – DDBJ – http://www.ddbj.nig.ac.jp/ Information Technology in Plant Protection Choosing database Searching keywords http://www.ncbi.nlm.nih.gov/ TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Navigation in Database - Gene Bank • Choosing database: – Pubmed – database of publications – Protein – database of proteines – Nucleotide – database of nucleic acid – Genom – database of whole genomes – Gene – database of genes TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Navigation in Database - Gene Bank • Exercise: – Looking for information on Phytophthora infestans • Database: Taxonomy, Keyword: Phytophthora infestans • Result of our search (next slide) – Taxonomic classification – Databse results from GeneBank – References to other resources • Choosing the following reference link on the result page we can reach all sequences in the Database: Nucleotid – Dirket • Looking for INF2A gene within the results Information Technology in Plant Protection Eligazodás az adatbázisban Gene Bank • Eredmény tábla: TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Azonosítók Information on publishing Information on structure Gén azonosítás Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Aminoacid order Nukleotide sequence Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Navigation in Database- GenBank • Dataformats: : – Summary – short description, important information – GenBank – own format of Genbank, detailed data – FASTA – name+identifiers+sequence most common used format – ASN.1 – international format 22 Information Technology in Plant Protection Navigation in Database- GenBank • FASTA format: – Advantage: • commonly used • simple • small – Disadvantage: • less information TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Navigation in Database- GenBank • Accession number: – AY693804 - Phytophthora infestans INF2A (Inf2A) gene, complete cds – Accepted international identifier for nukelic acids and protein sequences • GI (Genbank Identification) number: – GI:51832280 - - Phytophthora infestans INF2A (Inf2A) gene, complete cds – Identifier especially used only by Genbank Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Use for … ? • To quest genetic information of a given organism • To compare and check our results • Basis of comparing experiments • Collection of papers and publication • Base of researches Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Exercise: – Look for the genetic code and proteine sequence of a chosen important causative agent and examine the availability of its’ genome. – Save the given result in FASTA format, textfile. Keep the saved file, it is required for the exercise next time. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Content • Comparing two sequences • Searching homologue sequences – BLAST • Nucleotide BLAST - BLASTN • Proteine BLAST – BLASTP • Use for what? • Exerxise Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 How to compare two sequences – Dot matrix • Dot Matrix method (Gibbs and McIntyre, 1970): It compares two amino acid or nucleotide sequences in a way of placing the two sequences in a matrix in both vertical and horizontal direction and it draws a dot in case of parity. • Exceedingly suitable for visual demonstration of mutations, deletions and insertions. Information Technology in Plant Protection How to compare two sequences – Global Sequence Alignment TÁMOP-4.1.2.A/210/1-2010-0012 • Other well known analytical method is the Global Sequence Alignment which uses dynamical programming. • Essence of the process: examining analogy of the sequences with the help of a scoring system on the whole sequence. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 How to compare two sequences – Local Sequence Alignment • Using also the dynamic programming process. • Essence of the process: examining analogy of the sequences with the help of a scoring system. It tries to create the best alignment. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Pair Sequence Similarity Search • Basic Local Alignment Search Tool (BLAST) the most effective and common process of searching similarity – Peculiarities: • Fast • Effective sensibility – Types: • Blastn – for nucleotide sequences • Blastp – for proteins • Blastx – for translated nucleotid sequences • http://www.ncbi.nlm.nih.gov/blast Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Pair Sequence Similarity Search • Types: BLAST Kereső szekvencia Adatbázis Blastn Nucleotide Nucleotide Blastp Proteine Proteine Blastx 6 frame translated nucleotide Proteine Tblastn Proteine 6 frame translated nucleotide Tblastx 6 frame translated nukleotide 6 frame translated nucleotide Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Entering sequences - copy - upload Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Most important settings: Blastn – Searching Databases • The most commonly used one is the notredundant nucleotide database (chosen one) • It is possible to narrow searching in case we add taxonomical data in section „Organism”. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Most important settings: Blastn – program optimalisation • Megablast: searching analogies with 95% or bigger similarity, very fast. • D megablast: exceedingly suitable for comparison of species, bit slower. • Suitable for the comparison of any sequences, it indicates little similarities, slow. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Most important settings: Blastp – Searching Databases • The most commonly used one is the notredundant protein database (chosen one) • It is possible to narrow searching in case we add taxonomical data in section „Organism”. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Most important settings: Blastp – program optimalisation • Blastp: simple searching in protein database. • PSI-BLAST: searching algorithm with position-specific scoring • PHI-BLAST: searching with patternspecific scoring system. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 BLAST result evaluation • Look for those nucleotide sequences which are similar to AY693804 - Phytophthora infestans INF2A gene. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 BLAST result evaluation Searching parameters Garphical demonstration of the result Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 BLAST result evaluation • Result summary table – Max score • Bigger value means bigger similarity – Query coverage • Bigger value means bigger similarity Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 BLAST result evaluation • Result summary table – E value – Expected value • Lower value means higher similarity. – Max ident – maximal query alignment • Higher value means higher similarity Information Technology in Plant Protection BLAST result evaluation • Detailed sequence alignment: TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 BLAST on local environment • It is possible to run BLAST program in local environment. – It is useful in the following cases: • Comparing sequences to local databases • Operations requiring large number of calculations • ftp://ftp.ncbi.nih.gov/blast/ – Command line running with parameter inputs. – Supporting many operating systems (also 32 and 64 bits architectures) – Detailed help – First step is the database formatting, next step is similarity analysis. – It is able to create a lot of output formats Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 What can we do with it? • We have an unknown sequence from an unknown source. – What can be the source? – To which gene is similar? – What can be the function of the protein coded by this sequence? • Use the sequence in FASTA format as query parameter in the BLAST program. From the result we can answer the questions above. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Exercise – Using of the FASTA format sequence saved in the previous presentation search relatives with similarity analysis (BLAST). Information Technology in Plant Protection Contents • • • • • Multiple sequence alignments Examining protein sequences Protein 3D models Use for what? Exercise TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple Sequence Alignment • Essence: – Trying to fit more sequences at the same time. Possible differences are estimated by penalties. • Use: – Searching common peculiars and parameters – Inserting new sequences - taxonomy – Protein structures – Phylogenetical analysis Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple Sequence Alignment - An example TTGACATG CCGGGG---A AACCG TTGACATG CCGGTG--GT AAGCC TTGACATG -CTAGG---A ACGCG TTGACATG -CTAGGGAAC ACGCG TTGACATC -CTCTG---A ACGCG ******** ?????????? ***** • What is the consensus sequence? • In case of differences it is difficult to detect common patterns. That’s why we use alignment software. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple sequence alignment • Types of alignment: – Manual: hand-made, laborious and long process. Human faults can occur. – Automata: faster, sometimes it doesn’t consider biological requirement. – Combinated: we gain the best result by using together the manual and the automata processes. First use the computer then refine it by hand. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple sequence alignment • Most used program for multiple sequence alignment: CLUSTAL W – Downloadable local version: http://www.clustal.org – WWW version: http://www.ebi.ac.uk/clustalw – Use progressive alignment method – Fast, low memory usage application – More sequence alignment effective – Able to use drawing simple phylogenetic trees Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple sequence alignment • Exercise: Do the multiple alignment on the following sequence: – Sequence file http://align.bmr.kyushuu.ac.jp/mafft/online/server/ Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Multiple sequence alignment • Example: – Examine the catalase sequences of the plants below: • • • • Paprika (Capsicum annuum) Tobacco (Nicotiana tabacum) Tomato (Solanum lycopersicum) Potato(Solanum tuberosum) • Sequence file • http://align.genome.jp/ Information Technology in Plant Protection Multiple sequence alignment TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Multiple sequence alignment The result with Jalview program. TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Phylogenetics • The study of evolutionary relatedness among various groups of organisms through molecular sequencing data and morphological data matrices. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Phylogenetics • Relation of phylogenetical analysis and sequence alignments. – Sequence alignment determine the similarity and difference of the aligned sequences. – In case of ortholog sequences: differences of sequences from different species arises from the mutations collected during their different evolution. – Number of mutations, namely the rate of the difference between the sequences is connected to the evolutionary distance between the two species: the longer ago the two species separated, the higher sequence difference occur within their ortholog genes. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Phylogenetics • Steps of phylogenetic analysis: – 1. – 2. – 3. – 4. Sequence alignment Definition of evolutionary model Tree build Examination of the tree(s) Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Phylogenetics • Example: – Do the multiple sequence alignment with the sequence file below and draw the phylogenetic tree. – Sequence file http://align.bmr.kyushuu.ac.jp/mafft/online/server/ Information Technology in Plant Protection Multiple sequence alignment • 1. Sequence alignment TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Multiple sequence alignment TÁMOP-4.1.2.A/210/1-2010-0012 4. Examination of the tree Information Technology in Plant Protection Phylogenetics TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Use for what? • Function prediction • Relative finding • Identification TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Exercise – Look for resistance genes with known sequences. Do multiple sequence alignments on them. Evaluate similarities and create a phylogenetic tree. – What are the consequences of the result? Information Technology in Plant Protection Bioniformatics multiple sequence alignment Protein sequence analysis Information Technology in Plant Protection Content • • • • Characteristics of the proteins Protein sequence analysis Protein 3D models Practical task TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Characteristics of the proteins Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function Properties: • • • • • • • • • • 20 amino acid coding triplets Four types of bases 43 = 64 type of triplets enough 20 amino acid coding 61 triplet coding amino acids 3 stop sign (stop codon) UAA, UAG, UGA 1 codon sign the start of translation (start codon, methionine AUG) One triplet coding only one type of amino acid, but the same amino acid can determinate more triplets degeneration synonymous triplets: coding the same amino acid The gene and it has coded protein chain is coo linear The code is zero overlapped The genetically code is universal on the living resources. Information Technology in Plant Protection Characteristics of the proteins DNA nucleotide order RNA nucleotide order Protein amino acid order TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Characteristics of the proteins The code table. (Griffiths et al., An Introduction of Genetic analysis, 8th Ed. Fig. 9-8.) TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Group of proteins: TÁMOP-4.1.2.A/210/1-2010-0012 Basis of biological activity • Enzymes (pepsin) • Protection proteins (immunoglobulin) • Transport proteins (hemoglobin, mioglobin, transpherin ) • Hormones (insulin, ACTH) • Structure proteins (collagen, elastin, keratin) • Toxins (snake poison) Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Characteristics of protein sequences • Protein synthesis video: – http://www.youtube.com/watch?v=NJxo bgkPEAo • Protein structure video: – http://www.youtube.com/watch?v=lijQ3a 8yUYQ Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Caracteristics of protein sequences • Structures: – Primary structure: order of amino acids • „MSASSSSALPPLVPALYRWK” – Secondary structure: spatial structure regularly repeating local structures. The most common examples are the: • Alpha helix and Beta sheet – Tertiary structure: the overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another. – Quaternary structure: several protein molecules (polypeptide chains), usually called protein subunits in this context, which function as a single protein complex Information Technology in Plant Protection Characteristics of protein sequences TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Characteristics of protein sequences • Protein databases: – Protein database: UniProt http://www.uniprot.org/ – Protein structure database: ProteinDataBank http://www.rcsb.org/pdb/home/home.do – Protein interaction database: String http://string.embl.de/ Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Protein sequence analysis • Analysis primary structure – First of all we examine the distribution and physical-chemical qualities of aminoacids. • Example: HCA - Hydrophobic Cluster Analysis acetyl-transpherase protein sequence from fusarium. – http://mobyle.rpbs.univ-paris-diderot.fr/cgibin/portal.py?form=HCA Information Technology in Plant Protection Fehérje szekvenciák vizsgálata TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Protein sequence analysis TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Protein sequence analysis • For general similarity trials • We can gain information by general physical and chemical examinations. • It is a big challenge for today’s technology to determine the structure and the function of a protein from the DNA sequence. • For instance several diseases can be defeated if we are able to solve this problem. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Protein sequence analysis • Prediction according to the dimension: – 1D: amino acid properties, which are able to write as 1D string. Example: sequence, secondary structure, hydrophobicity – 2D: distance and contacts between amino acid pairs – 3D: configuration prediction on the basis of all atom coordinates Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Protein sequence analysis • Common analysis for protein structures: – 3D configuration visualization → we can see some important properties. – 3D configuration aligning→ similar structure – similar function. – 3D configuration classifying → line condition, similar function. – 3D configuration predicting → secondary, tertiary, quaternary structure prediction. – Small molecules docking → medical product candidate molecule for known structure molecule. – Protein structure behavior→ molecule-dynamic simulation. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Protein 3D models • In order to understand the function of a proteine, we have to know its 3D structure and quaternary structure. Then we are able to conclude its linking possibilities to other molecules, proteines, enzymes. • Let’s see an example for acetyltranspherase protein sequence from fusarium : – http://www.rcsb.org/pdb/explore/explore.do ?structureId=3FP0 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Analysis of protein synergy • Database of Interacting Proteins – It collect data about proteines interacting (bonding) with each others by experimental results. – It describes about 11 000 interaction of about 6200 proteine. – Specific details of one interaction: one proteine, the other proteine, interacting regions, experimental methods, dissociation constant, references. – Example: we can show how interactionnetwork graphs build (nodes clickable). Information Technology in Plant Protection Analysis of protein synergy • http://dip.doe-mbi.ucla.edu/ TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Exercise – Look for the proteine sequence and (if it exists) the 3D structure of an important causative agent in an optional sequence database. Examine the possible linking points. Information Technology in Plant Protection BioinformaticsGenomes Information Technology in Plant Protection Content • • • • • What is genome? Genome projects Genome browser software Use for what? Exercise TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Definition of genome? • The genome: „The genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA.” (Wikipedia) Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genomics is the discipline of Genome • Genomics: encompasses a broader scope of scientific inquiry associated technologies than when genomics was initially considered. A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels. – Functional genomics: attempts to make use of the vast wealth of data produced by genomic projects – Structural genomics: attempts to determine the structure of every protein encoded by the genome Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Tools of genomics • Microarray (v. chip): It is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high-throughput screening methods. • Types: – – – – – DNA microarrays (oligonucleotids or cDNA) Protein microarrays Cellular microarrays Tissue microarrays Antibody microarrays Information Technology in Plant Protection Tools of genomics TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genome projects • In the last decade several genome projects has started in the world. The aim of these projects is the entire recognition of the genetic code of more and more creatures. In August 2010 genomes of nearly 2500 species are known entirely or partly. • Important projects: – http://genome.ucsc.edu Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genome projects • First successful genome project was the Hemophilus Influenzae in 1995, done by Fleischmann et al. The first plant genome was Arabidopsis thaliana in 2000. The entire exploration of the humane genome was completed in 2003. • Tools of bioinformatics play an important role in the analytical work following genome sequencing. This time happen assembling the genetically code. Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genom browsers • Trough genome browsers we are able to view clear and arranged format of genomic data. • In some case genome is the starting point of genetically analysis. • Some genome browsers: – – – – http://genome.ucsc.edu http://www.ensembl.org http://ecrbrowser.dcode.org http://www.ncbi.nlm.nih.gov/mapview/ Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genome browsers • Properties of UCSC Genome browser – News and information on the start page – Genome grouping on the basis of class and genome. – Selectable sequence assembly by date (tracking able) – Searching by test Information Technology in Plant Protection Position on Chromosome Chromosome number Base position Known genes Mammals conservations TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Genome browsers • Options under the graphical interface: – Gene and gene probability options – mRNA and EST options – Expressions and regulation – Comparison options – Variations and duplicates Information Technology in Plant Protection Genome browsers • NCBI MapViewer properties – Four level system • • • • Start page Genome View Map View Sequence View – Ability for keyword searching – Access to 800 species genomes TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection Genome browsers • NCBI MapViewer TÁMOP-4.1.2.A/210/1-2010-0012 Information Technology in Plant Protection TÁMOP-4.1.2.A/210/1-2010-0012 Exercise – With the help of genome browser look for the genome of an important causative agent in an optional sequence database. Examine what kind of genes can occur in the surroundings of the one-millionth nucleotide on the second chromosome of this organism. Information Technology in Plant Protection Prepared by: Dr. János Busznyák - Dr. Máté Csák– Sándor Nagy Georgikon Faculty PRESENTATION - CAN BE DOWNLOADED FROM: