Information Technology in Plant Protection Presentation

advertisement
Information Technology in Plant Protection
Presentation
GIS tools for Plant Protection
• Prepared by:
– Dr. János Busznyák
TÁMOP-4.1.2.A/2-10/1-2010-0012
2
Digital Mapping Tools for Plant Protection
• Methods of Obtaining Spatial Data
– Manual
– Geodesy
– With the help of Global Positioning
– Photogrammetry
– Remote Sensing
– Manual Map Digitalisation
– Scanning Maps
– From Digital Files
TÁMOP-4.1.2.A/2-10/1-2010-0012
3
Digital Map
• Not only the digital form of the contents of a map ready to
be used with a computer.
• No need for segmentation, the elements are of real size, has
accurate fitting, has topology, often uses layers and objects.
• Primary Data Obtaining Methods
– Measurements (GPS)
– Existing Reports
Mostly vector data are obtained from primary data obtaining
methods.
• From Secondary Sources
– By digitalization, adding automatic or manual vectorization.
In the case of georeferencing and vectorization in secondary
methods, the result is also a vector map. If a secondary data
collection (scanning) is not followed by vectorization, the result is a
digital raster map.
TÁMOP-4.1.2.A/2-10/1-2010-0012
4
Raster-Vector Transformation
• Aim
– New level of GPS analysis (vector)
– New publication possibilities
– Lower storage and transfer capacity needs
• Preparatory steps
– Digitalization of map sheets
– Georeferencing, eliminating distortions, projection convertion
(lots of work)
• Pre-processing
• Vectorization
– Of areas
– Of line-like objects
– Of objects
• Post-processing
TÁMOP-4.1.2.A/2-10/1-2010-0012
5
Vectorization II.
• Vectorization
– Manual
– Semi-automatic
– Automatic
TÁMOP-4.1.2.A/2-10/1-2010-0012
6
Application of the Automatic Method
• Automatic vectorization
of a soil map
– Single bit
– Low data density
• Automatic vectorization
of a topographic map
– 8-bit
– High data density
Black: convert to line
Blue: segmented pixels
TÁMOP-4.1.2.A/2-10/1-2010-0012
7
Data Input from Text File
• Coordinates of the shape file
vertex points
– site,lat,long,name,HOTLINK
– 1,38.889,-77.035,Washington
Monument,http://www.nps.g
ov/wamo
– 2,38.889,-77.050,Lincoln
Memorial,c:/ESRI/AEJEE/DATA
/WASHDC/linc.jpg
– 3,38.898,-77.036,White
House,c:/ESRI/AEJEE/DATA/W
ASHDC/whse.txt
– 4,38.889,77.009,Capitol,c:/ESRI/AEJEE/
DATA/WASHDC/cap.pdf
ESRI Arc Explorer JEE tutorial
TÁMOP-4.1.2.A/2-10/1-2010-0012
8
Hybrid Data Model, Mashup Map
• With the help of hybrid systems, raster and vector data can
be used together.
– Vector, raster and attribute data are stored separately, in
the most suitable way for the model.
– The operations are carried out by these systems in the
model that is most suitable for the operation in question.
– The systems apply a wide variety of vector-raster
transformations before and after the operations.
– The GoogleMaps service is based on a hybrid data model.
TÁMOP-4.1.2.A/2-10/1-2010-0012
9
Data Quality
• Facts that mostly influence data quality:
– Origin of data
– Geometric accuracy
– Accuracy of attribute data
– Consistency of attribute data
– Topologic consistency
– Completeness and validity of data
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Georeferencing
• Georeferencing is the process of scaling, rotating, translating
and deskewing the image to match a particular size and
position.
The word was originally used to describe the process of
referencing a map image to a geographic location. Source:
http://wintopo.com/help/html/georef.htm
• Usual ways:
– World file
– Header (GeoTiff, GeoJP2…)
TÁMOP-4.1.2.A/2-10/1-2010-0012
11
Header
• Certain image formats include georeferencing information in
the header of the image file:
– img,
– bsq,
– bil,
– bip,
– EXIF
– ITT
– GeoTIFF
– grid
TÁMOP-4.1.2.A/2-10/1-2010-0012
12
Word File
• Georeferencing information is stored in a separate word file:
– The word file contains 6 parameters of an affin
transformation that means a connection between the
image coordinate system and that of the world coordinate
system.
– The images are stored as raster data, where each cell of the
image is identified by a row and coloumn number.
– The name of the word file has to be the same as the image
file and be in the same folder.
TÁMOP-4.1.2.A/2-10/1-2010-0012
13
Georeferencing with the Help of 2 Reference Points
segítségével
TÁMOP-4.1.2.A/2-10/1-2010-0012
14
Graphic Georeferencing - Rubber sheeting
TÁMOP-4.1.2.A/2-10/1-2010-0012
15
Projection Systems, Conversion
•
•
•
•
•
Projection, date
Geoid, geoidundulation
Uniform National Projection (UNP - EOV)
Transformation
Base points, base point systems
TÁMOP-4.1.2.A/2-10/1-2010-0012
16
Classification of Projection
• Based on image surface shape
– Cylinder projection
– Cone projection
– Flat projection
– Other projection
• Based on image surface axle
– Polar (normal)
– Transversal (equatorial)
– Oblique (not normal difference)
• Based on the contact of the image and base surface
– Tangent
– Transect
TÁMOP-4.1.2.A/2-10/1-2010-0012
17
Important Projection Systems
• Systems without projection
• Dual projection Hungarian systems
• Stereographic projection systems (BUDAPESTI,
MAROSVÁSÁRHELYI)
• Oblique Mercator Projection
• HÉR, HKR, HDR
• EOV
• Gauss-Krüger
• UTM (Universal Transverse Mercator)
• GEOREF (World Geographic Reference System)
TÁMOP-4.1.2.A/2-10/1-2010-0012
18
Important Ellipsoids
• Reference ellipsoids nearing an area of the Earth surface
• The centre of the ellipsoid is that of the Earth
• The axis of rotation is that of the Earth’s
– Parameters
• Major axis (equatorial radius)
• Oblateness (connection between equatorial and polar radius)
• If the centre of the ellipsoid is moved until it fits to the
examined area with the least error, we will get the geodesic
date.
– Bessel (stereographic)
– Kraszovszkij (Gauss-Krüger)
– Hayford (UTM)
– WGS-84 (GPS),
– IUGG-67 (EOV)
TÁMOP-4.1.2.A/2-10/1-2010-0012
19
Some Interesting Projections
• Geographic Projection
– WGS 1984 Datum
• Ortographic Projection
– SPHERE Datum
• Eckert IV. Projection
– WGS 1984 Datum
TÁMOP-4.1.2.A/2-10/1-2010-0012
20
Geoidundulation
•
•
•
GPS measurement gives the height
above the ellipsoid (h). When
calculating height above sea level(H),
geoidundulation has to be taken into
consideration.
Geoidundulation is the separation
between the equipotential surface that
represents a mean ocean surface and a
reference ellipsoid (h=H+N, where N is
the value of geoindundulation of the
point).
Geoid: the surface of oceans and seas,
if connected by small canals under the
land(Listing 1873)
TÁMOP-4.1.2.A/2-10/1-2010-0012
21
Uniform National Projection
• The starting coordinates have been placed 200km to the
South and 650 km to the West. Thus, the Y coordinates are
lower than 400, and the X coordinates are always higher than
400, which means they are easy to distinguish.
TÁMOP-4.1.2.A/2-10/1-2010-0012
22
Uniform National Elevation Network(EOMA)
• The first elevation of
Hungary was carried out
based on the
Mediterranian base level
from 1873-1913.
– Height of Nadap main
base point: 173,8385 m.
• Baltic base level after
World War II.
– Height of Nadap main
base point: 173,1638 m,
which is 0,6747 m lower.
TÁMOP-4.1.2.A/2-10/1-2010-0012
23
Transformation
• ETRS89 (OGPSH) points
transformed into the Uniform
National Projection (EOV)
system and back
• The points for the
transformation are chosen
automatically
• Local transformation based on
the common points of the
OGPSH and EOV systems
• With 8 common points in
Hungary
• With refined Geoidundulation
data
Etrs89-Eov-Hivatalos-Helyi-TérbeliTranszformáció
TÁMOP-4.1.2.A/2-10/1-2010-0012
24
Base Points
• Database of Altitudinal Base Points
• Database of Horizontal Base Points
• Database of OGPS Base Points
• Országos GPS Hálózat pontjai (Points of the National GPS
Network-OGPSH)
TÁMOP-4.1.2.A/2-10/1-2010-0012
25
Videos and Animations for Chapter 1.
• Video
– Georeferencig (graphical)
• Animation
– Georeferencing
– Geoidundulation
– Shape (create)
TÁMOP-4.1.2.A/2-10/1-2010-0012
26
Tasks for Chapter 1.
I.
question
Identify the value of geoid-undulation at the Parliament Building,
Budapest, Hungary with the help of EHT (or any other) software .
II. question
Digitalize any map sheet with the help of a scanner. Georeferate it with
3 reference points with the help of GEOREGARCVIEW software.
The necessary coordinates can be obtained from mapservers (eg.
Googlemaps).
III. question
Digitalize another map sheet overlapping the previous one with the
help of a scanner. Georeferate with 3 reference points with the help of
GEOREGARCVIEW software. Open it together with the georeferated file
of the previous task with ArcExplorer JEE (or any other) and check its
accuracy.
The necessary coordinates can be obtained from mapservers (eg.
Googlemaps).
TÁMOP-4.1.2.A/2-10/1-2010-0012
27
GNSS Device System
• Global Positioning
– The coordinates of 3
satellites at a given time
are needed.
– If time can be measured
accurately, then wave
spread speed and the
time will help calculate
how far we are from the
satellite.
– In the case of 1 satellite, it
will give a sphere surface.
TÁMOP-4.1.2.A/2-10/1-2010-0012
28
Global Positioning II.
• If there is a connection
with 2 satellites, then we
are on the sphere of both
satellites. The section of
the two spheres is a
circle.
• The section of the sphere
of the third satellite and
the circle will be two
points, one of which can
always be excluded (eg.
Points far from the earth
surface).
TÁMOP-4.1.2.A/2-10/1-2010-0012
29
Differential Correction
TÁMOP-4.1.2.A/2-10/1-2010-0012
30
Network RTK in Hungary(2010)
• GNSSNet
• NtripCaster IP address,
port: 84.206.45.44:2101
TÁMOP-4.1.2.A/2-10/1-2010-0012
31
Multi-Base System in Hungary ( 2010)
• Geotrade GNSS
– Host:
www.geotradegnss.hu
– Port: 2101
TÁMOP-4.1.2.A/2-10/1-2010-0012
32
Single-Base System (2010)
( 2009
• Georgikon RTK coverage
• DGPS forthe whole
country of Hungary
– http://gnss.georgikon.hu
– 193.224.81.88:2101
TÁMOP-4.1.2.A/2-10/1-2010-0012
33
Trimble European VRS System
TÁMOP-4.1.2.A/2-10/1-2010-0012
34
Mobile Internet
• CSD (Circuit Switched Data)
– Line connected mobile internet - 9,6 kbit/s - 1G
• GPRS (General Packet Radio Service)
– Package connected - 115 kbit/s - 2G
• EDGE (Enhanced Data Rates for GSM Evolution)
– GPRS reinforcement- 236 kbit/s-os (112-400) - 2,5G
• 3G
– 3G mobile network, video call 384 kbit/s - 3G
• HSPA (High-Speed Downlink/Uplink Packet Access)
– HSDPA theoretic data transfer speed depending on device
and coverage: up to 21 Mbit/s – 3,5G
• 4G LTE (Long Term Evolution)
– 1Gbit/s - 4G
TÁMOP-4.1.2.A/2-10/1-2010-0012
35
Videos and Animations for Chapter 2.
• Video
– Trimble VRS system
• Animation
– GNSSNet service
– Geotrade GNSS
– Georgikon GNSS Base
TÁMOP-4.1.2.A/2-10/1-2010-0012
36
Tasks for Chapter 2.
I. question
Find the data of the accessible satellites of the Galileo and
BEIDOU systems at a given time.
II. question
Find the terrain control stations of the Navstar GPS system
at a given time.
III. question
Find the worst measurement site on the Earth’s surface
concerning ionosphere state at a topical time. Use the ‘space
weather forecast’ of Australia (or any other information
source).
http://www.ips.gov.au/Space_Weather
TÁMOP-4.1.2.A/2-10/1-2010-0012
37
Terrain GNSS Measurement and Processing
• GNSS Measurement
– Planning (almanach)
– Realization (online correction: procession too)
– Data transfer(exchange formats, RINEX - Receiver
Independent Exchange Format)
– Processing (vectors, transformation, error correction)
– Network equalization (OGPSH – National GPS Network)
TÁMOP-4.1.2.A/2-10/1-2010-0012
38
Aim of GNSS Measurement Planning
• Guarantee of integrity
– GNSS
– Way of correction
• Guarantee of nedded accuracy
– Accuracy of the Rover device
– Way of correction
– Satellite constellation
– Minimalization of other disturbing facts
TÁMOP-4.1.2.A/2-10/1-2010-0012
39
Devices for Planning
• GNSS satellite data
– Almanach
• Trimble Planning
• Leica Satellite Availability
• Topcon Occupation Planning
• Receiving correction data
– Mobile internet
• Gprs coverage
• Style, devices, realization
TÁMOP-4.1.2.A/2-10/1-2010-0012
40
Almanach
• Timing
– Further in time
– Back in time
• General
– YUMA formátum,USA Coast Guard Navigációs Központ
(YUMA format, USA Coast Guard Navigation Center)
– A dátum és a GPS-hét kapcsolata a GPS-naptárban (the
connection between date and GPS-week in the GPS
calendar)
• Trimble
• Leica
• Topcon
TÁMOP-4.1.2.A/2-10/1-2010-0012
41
Trimble Planning
TÁMOP-4.1.2.A/2-10/1-2010-0012
42
Channels of Correction Data
• Relative
– Real time
– Radio
– Satellite
– Internet
• Post-processed
– Digital data transfer
TÁMOP-4.1.2.A/2-10/1-2010-0012
43
Realisation of Measurement
•
•
•
•
•
Connection to satellites, controller
Connection to correction service
Setting measurement style
Starting measurement
Recording data
TÁMOP-4.1.2.A/2-10/1-2010-0012
44
Preparation of Measurement
• Obtain, check and converse
existing spatial data
• Set up a measurement plan
–
–
–
–
Need for accuracy
Available devices and services
Specialities of the area
Select measurement method
• Places of measurement
• Conversion to the format of
the terrain device
• Upload data to the terrain
device
TÁMOP-4.1.2.A/2-10/1-2010-0012
45
End of Measurement
•
•
•
•
•
•
•
Check measurement data
Inspection
Delete, edit
New recording
Data
Export in needed formats
Turn off terrain device
TÁMOP-4.1.2.A/2-10/1-2010-0012
46
Processing Data
• Load data from terrain device
– Formats
– Give coordinate system and date
– Examine data load mistakes
– inspection
– Delete, edit
• Export to the format of procession
TÁMOP-4.1.2.A/2-10/1-2010-0012
47
GIS procession and Analysis of Data
• Upload data to GIS system
– Conversions
– Analyses
– Interpolations
– Model building
– Simulation
– Statistical analysis
– Publication
• Online correction
– Procession
• Offline correction
– Time of measurement
– Obtain correction data
– Correction
– Check
TÁMOP-4.1.2.A/2-10/1-2010-0012
48
Checking Transformation
• EEHHTT software
– Data input
• From file
• Via keyboard
– Set format of data input
– Set data conversion direction
– Give coordinates
TÁMOP-4.1.2.A/2-10/1-2010-0012
49
Typical Terrain Device System
• Adatgyűjtő
– Navigation accuracy
• ArcPad / palmtop with GPS antenna
– GPS accuracy
• GPS Pathfinder office / Trimble GeoXH
– Geodesic accuracy
• Trimble Survey Controller / Trimble 5800
• Data procession
– GPS Analyst
– GPS Pathfinder Office
– Trimble Geomatics Office
– ArcGIS
TÁMOP-4.1.2.A/2-10/1-2010-0012
50
Description of Continuous Topographic GPS Survey
Sample
• Aim of survey: automatic data collection for 3D relief model
• Place of survey: the island of Kányavári, Hungary
• Time of survey: 21. December, 2008. 0920h-1530h
• Type of survey: RTK; Format of message transfer: CMR+
• PDOP mask: 6, elevation cutoff: 10 degrees, antenna: Trimble 5800, hant:
2m
• Coordinate System Hungary Zone Hungarian EOV
• Project Datum HD72 (Hungary)
• Vertical Datum Geoid Model EGM96 (Global)
• Coordinate Units Meters; Distance Units Meters;Height Units Meters
• Name of point
DeltaX DeltaY DeltaZ Slope Distance RMS
•
25001 13189,539m 1880,080m 11396,001m 17531,898m 0,002m
• Name of point
X
Y
H
• 25001
142686.277
505893.164
109.042
TÁMOP-4.1.2.A/2-10/1-2010-0012
51
Basic GPS Elements of Precision Farming
•
•
•
•
•
Take sample
Yield mapping
Sensors
Auto pilot system
Mass flow or sprayer
control
• Row control
• Seeder control
TÁMOP-4.1.2.A/2-10/1-2010-0012
52
Precision Management System (IKR)
•
•
•
•
•
•
•
•
•
•
1. GPS survey of field blocks, soil sample taking plan
2. Take soil sample according to plan every 3-5 acres
3. Soil examination (extended and holistic)
4. Make nutrient content maps
5. Information, services for professional advice, analyses
6. Agrochemical service
7. Differentiated fertiliser plan
8. Differentiated nutrient output, plant number plan
9. Seeding with base station
10. Precision herbicid plan (based on Hu, KA, pH map and weed
uptake)
• 11. Ffertiliser quantity, upload into professional advice system
• 12. Download data from the Internet
TÁMOP-4.1.2.A/2-10/1-2010-0012
53
Precision Management System
• IKR
TÁMOP-4.1.2.A/2-10/1-2010-0012
54
Evaluation of Tillage Experiments
• Spreadsheet
TÁMOP-4.1.2.A/2-10/1-2010-0012
55
Evaluation of Tillage Experiments II.
• Spreadsheet
• GIS software (weed density)
• GIS software (weed density)
TÁMOP-4.1.2.A/2-10/1-2010-0012
56
3D Model
TÁMOP-4.1.2.A/2-10/1-2010-0012
57
Videos and Animations for Chapter 3.
• Video
– GNSSNet OGPSH
• Animation
TÁMOP-4.1.2.A/2-10/1-2010-0012
58
Tasks for Chapter 3.
I.
II.
III.
question
Create a forecast for tomorrow 1200hr and 1215hr above 10 degree elevation cutoff
for the area of the Helikon strand, Keszthely, Hungary (Lambda = 46 degree 45
minutes, Fí = 17 degree 15 minutes, h = 150 m).
GDOP=
PDOP=
HDOP
VDOP=
TDOP=
Number of GPS satellites =
Number of Glonass satellites=
Number of Galileo satellites=
Number of Compass satellites=
question
In the IKR precision management system, which service(s) can use correction GNSS
base data?
question
Is soil sample take in the IKR precision management system realized with a yield map
or a grid?
TÁMOP-4.1.2.A/2-10/1-2010-0012
59
Remote Sensing Device System, 3D Modelling
• Remote Sensing
– With the help of remote sensing, objects can be examined that
are not in a direct connection with the sensor.
– In a narrow sense, the concept of remote sensing is usually used
for aerial and space images. In a wider sense, it can also be
defined for eg. remote measurements or medical applications.
– Remote sensing is the acquisition of information about an object
or phenomenon, without making physical contact with the
object. In modern usage, the term generally refers to the use of
aerial sensor technologies to detect and classify objects on Earth
(both on the surface, and in the atmosphere and oceans) by
means of propagated signals (e.g. electromagnetic radiation
emitted from aircraft or satellites).
TÁMOP-4.1.2.A/2-10/1-2010-0012
60
Characteristics of Remote Sensing
• The measurement does not influence the examined object,
or change its state.
• It can be used at wavelengths out of the visisble range. The
result can be examined in the visible spectrum.
• Objective, exact data can be obtained.
• Spatial, several dimension data can be obtained.
• Lots of data can be obtained from big areas in a short time.
• Areas that can not be reached or examined with other
methods can be examined.
TÁMOP-4.1.2.A/2-10/1-2010-0012
61
Clasification of Sensors
• Active sensors
– sense the reflection of their own radiation
• Passive sensors
– have no emission
• One or more wavelength range
• Images with more than one band are called (depending on the
number of bands) multispectral or hiperspectral.
TÁMOP-4.1.2.A/2-10/1-2010-0012
62
Information from Sensors
• Geometric
– pixel: the space of one point of the image measurable on
the earth surface, its real extension.
• Spectral
– the value of radiation from the object
• Radiometric
– characterises the colour depth of the pixels
• Temporal
– the time interval between the images
TÁMOP-4.1.2.A/2-10/1-2010-0012
63
Electromagnetic Spectrum
• Wavelength, frequency
– Visible light (0,4 - 0,7 µm)
– Infrared (0,7 µm felett)
– Ultraviolet (0,4 µm alatt)
TÁMOP-4.1.2.A/2-10/1-2010-0012
64
Atmospheric Effects
• Scatter - Multi path scattering
• Occlusion
– Influencing factors
•
•
•
•
•
Traveled distance
Radiation energy
Composition of the atmosphere
Size of particles
Wavelength
TÁMOP-4.1.2.A/2-10/1-2010-0012
65
Visible and Infrared Range
• Chlorophyl absorbs the energy of the wavelengths between
0.45 and 0.67 µm,mostly blue and red colours, thus the
colour of the healthy plant is green.
• In an unhealthy plant, the yellow colour together with the
green can be caused by red reflection caused by chlorophyl
decrease.
• Reflection within the range 0.7 and 1.3 µm highly depends on
leaf structure (sort specific), and dramatically increases.
• Effect of stratification, water occlusion bands above 1.3 µm.
• Above 1.3 µm, reflection is inversely proportional to the
whole water content of the leaf.
TÁMOP-4.1.2.A/2-10/1-2010-0012
66
Visible and és Infrared Range II.
• The reflection curve of
plant sorts are
identifiable.
• Image correction
(atmospheric
distortion)
• Sample points
• Spectrum
TÁMOP-4.1.2.A/2-10/1-2010-0012
67
Spectral Bands and Resolution of Landsat TM
•
•
•
•
•
•
•
TM 1
TM 2
TM 3
TM 4
TM 5
TM 6
TM 7
0.45 – 0.52 µm(blue)
0.52 – 0.60 µm(green)
0.63 – 0.69 µm (red)
0.76 – 0.90 µm(near infrared)
1.55 – 1.75 µm(medium infrared)
10.42 – 12.50 µm(thermal infrared)
2.08 – 2.35 µm(middle infrared)
30 m
30 m
30 m
30 m
30 m
120 m
30 m
TÁMOP-4.1.2.A/2-10/1-2010-0012
68
Planned Objects of Satellite Sensing
• ASPRS (ASPRS satellite
database)
TÁMOP-4.1.2.A/2-10/1-2010-0012
69
Hiperspectral Imaging in Hungary
• 2002. DLR DAIS, 79 band system
• 2006. with the help of AISA DUAL hiperspectral camera,
aerial data collection service was launched by the University
of Debrecen (Hungary) and the Ministry of Rural
Development.
– Senses in a maximum of 498 bands, at the wavelength of
0.45–2.45 micrometres.
TÁMOP-4.1.2.A/2-10/1-2010-0012
70
LANDSAT 5 TM
• National Aeronautics and Space Administration (NASA) and
U.S. Geological Survey (USGS) (1999)
• Images in 7 bands (6 bands 30 m, termal-infra 60 m terrain
resolution)
• Sun-synchronic orbit (the satellite travels above a given site
at the same local time)
• Circulates at the height of 705 km
• Can take images of an area of 185x170 km every 16 days
TÁMOP-4.1.2.A/2-10/1-2010-0012
71
Application of Landsat Images
• TM 1
0.45 – 0.52 µm differentation of land from plants,
mapping of artificial surfaces.
• TM 2
0.52 – 0.60 µm mapping plant cover, identification of
artificial surfaces.
• TM 3
0.63 – 0.69 µm differentation of planted surfaces from
plantless surfaces, identification of artificial surfaces.
• TM 4
0.76 – 0.90 µm identification of plant sorts, definition
of green mass, survey of plant vitality, mapping water surfaces,
mapping soil water content.
• TM 5
1.55 – 1.75 µm examination of soil and plant water
content, differentation of cloudiness from snow blanket.
• TM 6
10.42 – 12.50 µm mapping heat emission (plant
stress, heat pollution)
• TM 7
2.08 – 2.35 µm differentation between rock types,
mapping plant eater content
TÁMOP-4.1.2.A/2-10/1-2010-0012
72
Ortophoto
• Imaging : central perspective
• Photogrammetry: defines the extention of real objects from
the sizes taken from the image
– The resulting ortophoto (image data of the Earth surface
obtained by a satellite or aerial data collectors with
geographic reference) can comprehensively be used with
GPS systems
– During the planning and realisation of imaging, a GPS
device system and adequate relief data are needed.
TÁMOP-4.1.2.A/2-10/1-2010-0012
73
Photogrammetry
• Photogrammetric evaluation is based on stereoscopy with
perspectivic mapping between aerial and space images taken
using central projection.
– The essence of stetoscopy is that given terrain objects are
mapped in different ways in images from different sources.
The task of photogrammetry is to measure the difference
between parallaxes, and calculate spatial coordinates.
TÁMOP-4.1.2.A/2-10/1-2010-0012
74
Remote Sensing Data in Agriculture
•
•
•
•
•
•
Differentation of types of vegetation
Cover and yield
Calculatio
Productivity of biomass
Vitality and disease of flora
State of soil
– IMG files
• View
• Select bands
• Colour bands
– Erdas ViewFinder 2.1
– http://rst.gsfc.nasa.gov/Front/overview.html
– FÖMI oktatóanyag (tutorial of the Institute of Geodesy, Cartography
and Remote Sensing, Hungary)
TÁMOP-4.1.2.A/2-10/1-2010-0012
75
Application of 3D Models
•
•
•
•
Model of objects
Relief model
Terrain model
Elevation model
– Digital elevation model (DEM) is the topographic
visualisation of the earth surface. It is usually used for relief
maps, 3D visualisation, waterflow modelling, and in the
case of aerial image correction. Applies remote sensing
data or traditional land surveying data.
– Raster based elevation model
– Vector based elevation model
TÁMOP-4.1.2.A/2-10/1-2010-0012
76
Raster and Vector Models
• Source elevation data create
regular grid cells. The size of the
cell is constant within the model.
The height of the relevant
geographic area can be considered
constant in the same grid cell.
• Divides space into triangles not
covering one another.
– Vertices of every triangle are data
points, with the value of x, y, z.
– The points are connected with lines,
which gives Delaunay triangles.
– A TIN (Triangulated Irregular
Network) is a complete graph, which
keeps its topologic connection with
the relevant element (intersection,
edge and triangle).
– Input data fit directly into the model.
TÁMOP-4.1.2.A/2-10/1-2010-0012
77
Global Relief Model
• SRTM (Shuttle Radar
Topography Mission 2000)
program
– Digital relief of about 80%
of the Earth’s surface, with
the help of radar system
(Endeavour 11 days)
– Radar-interferometry, with
two receivers 60 m from
one another
– Mapped area: 60 degrees
North, 57 degrees south
– Resolution 3 (USA 1) arcsec
TÁMOP-4.1.2.A/2-10/1-2010-0012
78
Global Relief Model II.
• TanDEM-X 2010, (TerraSAR-X)
– Mapping of the whole surface of the Earth
– Horizontal resolution 12 m, vertical resolution: 2m.
– Two-radar remote sensing satellite with stereo microwave
radar device, at the height of 514 km
– Polar sun synchronic orbit
– Radiowaves emitted from a satellite with the help of
Synthetic-aperture radar (SAR) technique and then
reflected from the surface are received with the antenna
on the satellite , or the same surface is photographed from
two different points.
TÁMOP-4.1.2.A/2-10/1-2010-0012
79
3D Relief Model
• The digital relief model
of Hungary, 5m
resolution
– 1:10 000 scale EOTR
database was used
– A GRID derived from
vectorized level lines.
TÁMOP-4.1.2.A/2-10/1-2010-0012
80
3D Relief Map
• Generated from several
sources
– Level-line digitalization
– Digitalization of elevation
points
– Import GPS survey points
– Correction (aerial photo)
– Model generation
– Publication
• Generation from direct
GNSS measurement
TÁMOP-4.1.2.A/2-10/1-2010-0012
81
Videos and animations for Chapter 4.
• Video
• Animation
– Elevation Model
TÁMOP-4.1.2.A/2-10/1-2010-0012
82
Tasks for Chapter 4.
I.
question
Find an aerial image of your place of living from internet sources.
II. question
Find a space image of your place of living from internet
sources.
III. question
Measure the area of the Kányavári Island (Kányavári-sziget),
Hungary on the photos of 1990., 1992. and 2002. Use Erdas
ViewFinder (or any other IMG viewer). The images can be found
on the remote sensing tutorial website of FÖMI
http://www.fomi.hu/taverzekeles_oktatoanyag
TÁMOP-4.1.2.A/2-10/1-2010-0012
83
Spatial Data Databases
• Types of Mapservers
– Static webmaps
– Dynamically created webmaps
– Animated webmaps
– personalized webmaps
– Open, reusable webmaps
– Interactive webmaps
– Webmaps suitable for analysis
– Collaborative webmaps
TÁMOP-4.1.2.A/2-10/1-2010-0012
84
Types of Webmaps II.
• Static webmaps
– No animation and interactivity
– Only created once, infrequently updated
– Mostly scanned paper based maps
• Dynamically created webmaps
– Created on demand, often from dynamic data sources
– Created by server (ArcIMS –ArcSDE)
– WMS protocol
TÁMOP-4.1.2.A/2-10/1-2010-0012
85
Types of Webmaps III.
• Animated webmaps
– Show changes in the map over time (water currents, wind
patterns, traffic info)
– Real time, data from sensors
– Updated Rregularly or on demand
• Personalized webmaps
– Allow user to apply own data filtering, selective content
– Personal styling and symbolization
– OGC SLD WMS uniform system (Styled Layer Description)
TÁMOP-4.1.2.A/2-10/1-2010-0012
86
Types of Webmaps IV.
• Open, reusable webmaps
• Complex systems, open API(Google Maps, YahooMaps,
BingMaps)
• Compatible with API „Open Geospatial and W3C
Consortium” standards
• Interactive webmaps
• Chengeable parameters
• Easy navigation
• Events, descriptions, DOM-manipulations
TÁMOP-4.1.2.A/2-10/1-2010-0012
87
Types of Webmaps V
• Analytic webmaps
– Offer GIS-analysis
• Geodata uploaded by user
• Geodata provided by server
• Analysis is carried out by a serverside GIS, results of analysis
are displayed by the client.
• Collaborative webmaps
– Geometric features being edited by one person can not be
changed by any one else at the time.
– Quality check is needed before publication
(OpenStreetMap, Google Earth, Wiki- Mapia…).
TÁMOP-4.1.2.A/2-10/1-2010-0012
88
‘FÖMI’
• ‘Institute of Geodesy,
Cartography and
Remote Sensing’,
Hungary
• Földmérési és
Távérzékelési Intézet
fontosabb adatbázisai
(important databases of
the Institute of Geodesy,
Cartography and
Remote Sensing)
TÁMOP-4.1.2.A/2-10/1-2010-0012
89
Hungarian National Rural Network (AIR)
• To continuously inform farmers and experts, to provide
professional background knowledge for tenders and
developments.
• Its knowledge base is based on professional news, events,
articles, studies, publications-published in an organised,
updated system.
• A further aim of the site is to prepare for online data service
(logbooks, electronic submission of data of farmers working
on vulnerable areas), to give info on data in connection with
agri-environmental management, to publish relevant
thematic maps and to ensure agrar forecast.
TÁMOP-4.1.2.A/2-10/1-2010-0012
90
AIR Public Map Library
• 1:200.000 scale
genetic soil map of
Hungary
• 40 soil types, 80 sub
types, with colours and
colour shades
• Physical soil kinds (9
categories) with striping
• Soil formation rock (28
categories) betűjelekkel
TÁMOP-4.1.2.A/2-10/1-2010-0012
91
MARS (Monitoring Agriculture by Remote Sensing)
terményhozam-előrejelző rendszer
• Obtain, process and store weather data
• Apply weather data in the agrometeorology model of the
crop growth monitoring system (Crop Growth Monitoring
System, CGMS)
• Process NOAA-AVHRR and SPOT-VEGETATION satellite images
using CORINE land coverage data (CORINE Land Cover, CLC)
• Common Research centre
– Statistic analysis of data
– Quantity forecast
– Short time crop yield forecast
TÁMOP-4.1.2.A/2-10/1-2010-0012
92
MARS
• Monitoring Agriculture by Remote
Sensing
TÁMOP-4.1.2.A/2-10/1-2010-0012
93
‘FÖMI NÖVMON’ (Plant Monitoring)
TÁMOP-4.1.2.A/2-10/1-2010-0012
94
IKR Precision Map Server
TÁMOP-4.1.2.A/2-10/1-2010-0012
95
Soil Data Publication (Georgikon Mapserver, Hun)
TÁMOP-4.1.2.A/2-10/1-2010-0012
96
INSPIRE Geoportal
• (Infrastructure for Spatial Information in the European
Community-INSPIRE)
• ‘The INSPIRE Geoportal provide the means to search for
spatial data sets and spatial data services, and subject to
access restrictions, view and download spatial data sets from
the EU Member States within the framework of the
Infrastructure for Spatial Information in the European
Community (INSPIRE) Directive.
• Aims at making available relevant, harmonised and quality
geographic information to support formulation,
implementation, monitoring and evaluation of policies and
activities which have a direct impact on the environment.’
• (www.inspire-geoportal.eu)
TÁMOP-4.1.2.A/2-10/1-2010-0012
97
Spatial Data Directive
• Inspire should be based on the infrastructures for spatial
information that are created by the Member States and that
are made compatible with common implementing rules and
are supplemented with measures at Community level. These
measures should ensure that the infrastructures for spatial
information created by the Member States are compatible
and usable in a Community and transboundary context.
TÁMOP-4.1.2.A/2-10/1-2010-0012
98
Inspire2008 metadata
• Member States shall ensure that metadata are created for
the spatial data sets and services corresponding to the
themes listed in Annexes I, II and III, and that those metadata
are kept up to date.
TÁMOP-4.1.2.A/2-10/1-2010-0012
99
INSPIRE Geoportal
• Online access to a collection of geographic data and services
• Does not store or maintain data
• Metadata, catalogues can be accessed with several search
options
• With the help of a map server service, maps and metadata
can be searched for and browsed.
• Personal maps can be created from existing data sources.
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
INSPIRE Geoportal Viewer
• INSPIRE Geoportal
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Mashup Mapserver Service
• ArcExplorer JEE Corine
Land Cover mash up
map from several
sources
• http://vektor.georgikon.hu
kvsz
• http://geo.kvvm.hu
clc (80%
transparency)
Mashup map: a map that
includes another (API), made
from several internet
sources.
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
WebMap and publication
Picture
MapServer
Video
Website
Web service
HTML
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Steps of Realization
• Steps of realization :
– 1. chose topic
– 2. create map, upload
data
• a. Create web album,
upload photos
• b. Upload video
– 3. create website, embed
map
– 4. publish website
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Videos and Animations for Chapter 5.
• Video
– Institute of Geodesy Cartography and Remote Sensing
– Hungarian National Rural Netvork
– Inspire Geoportal
– GoogleMaps service
• Animation
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Tasks for Chapter 5.
I.
question
Measure the length of the Belső-tó (‘Inner lake’) of Tihany,
Hungary with the help of the topographic map service of the
Georgikon Mapserver (or any other mapserver).
II. question
Create a GoogleMaps map in any agricultural topic with at least 5
objects, inserted images and embed it into a website of the same
topic.
III. question
Embed further mapserver services (Bingmaps, YahooMaps…) into
the website you have created.
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Plant protection database
• Prepared by:
– Dr. Máté Csák
TÁMOP-4.1.2.A/2-10/1-2010-0012
10
Plant Protection Information
Plant protection
database
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Plant protection’s databases: Topics
• Database management theory
– Information, data
– Database models, databases
– Database Management Systems
• Relation model
– Base of theory
– Normalized database
– Catalog, data-dictionary
• Plant protection’s databases
– Practical problems and their solutions
10
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Database management - Information
• Information technology concepts, words of
Latin origin, which is intelligence, news,
messages, information does.
• Definitions:
1) In general, the data information, news of
which we consider relevant, and lack of
knowledge has decreased.
2) Knowledge gains, the growth of knowledge,
and it means reduce uncertainty.
3) The information provided is new data, news
which removes uncertainty and
consequences.
Wikipedia
SH Atlas
Kalamár-Csák
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Theoretical - Information
The information is the same
physical reality of the universe as
matter and energy.
pure information
DNA-molecule
Computer data input
Information
processing
Meaningful
information
protein
Calculation results
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Theoretical - Information
Manifestations:
• Clearly pronounced– Explicit
– When the information is completely clear to
everyone, not in need of explanation.
– For example: the Balaton water at 28 °C
• Hidden – Implicit
– The data connection between a method can
be displayed.
– For example: statistical calculation (average)
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Theoretical - Data
• The data of an object (any thing that
relates to the data), to a specific value
(character state, completed forms) for
the variable (properties, attributes,
characteristic, character).
– Therefore be considered as a specific data
are defined, you define what kind of object
that is variable, what value are added. The
figures represented the value unit is always
connected.
• For example: Name: Arvalin LR; Agent: Zinc
phosphate; Volume: 4 %
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Theoretical – Data model
• A collection of concepts, which clearly
describe the structure of a database.
– The structure includes the data type and
their relationship to the restrictive
conditions for the data.
– The database conceptual level, logical
structure description.
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Entity-Relationship-ER basic elements of data model
ENTITIES
ATTRIBUTES
RELATIONSHIPS
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER - Entities
• Entities :
are the principal data objects, which all other
things to distinguish, and information is to be
collected.
– Procedures at issue, and whom we want to store
data.
– For example: Citizens, Workers, Patients,
Custumers; Plants, Agents, Phenological phase,
Harmful; Cars, Goods, Accounts ...
– The entity to a specific value of the occurrence.
·
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER - Attributes
Attributes:
• Internal structure of the entities
• are characteristics of entities that provide
descriptive detail about them.
– Plants of the named individual characteristic
such as : name, Latin name, ...
• The property values of an individual's actual
value is determined.
• For example: Peach, Prunus persica, …
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER – Attributes - Key
• If a property or properties to a group
of clearly specifies, that the value which the
individual is involved, together they are
called keys.
– For example: name in Plants
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER - Relationship
The relationships:
• the external structure of entities,
• the represent real-world associations
among one or more entities.
• are described in terms of degree,
connectivity, and existence.
– For example: Plants-Harmful, Accounts-Goods, ...
• A particular occurrence of a relationship is
called relationship instance.
11
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Datamodel – ER – Relationship - Types
The types of relationships:
•Independent connectivity
•1:1 connectivity
•1:N connectivity
•N:M
connectivity
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Adatmodell – ER – Kapcsolatok 1.
1. Independent connectivity
– The two entities independent of each
other, if one set of instances, nothing is
linked to a single element or another
entities.
• For example:
• Agent’s Id: Employe’s account
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER – Relationship 2.
2. One –to – one connectivity (1:1):
• One of the elements of each set of
instances of another entity set exactly one
element is linked.
– For example:
Agent’s Id: Agent’s name
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER – Relationship - 1:1 Connectivity
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Adatmodell – ER – Kapcsolatok 3.
3. One-to-many connectivity:
• A set of instances of each element of the B
element within the multi-set of instances.
– For example: Aetiologies: Diseases
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER – Relationships - 1:N connectivity
TÁMOP-4.1.2.A/210/1-2010-0012
Adatmodell – ER – Kapcsolatok 4.
4. Many-to-many connectivity:
• A set of intstances of all elements of the B
element within the multi-set of instances,
vice versa.
– Például: Plants : Diseases
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model – ER – Relationship - N:M connectivity
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model - ER definition
• The data model is
a finite number set of entity,
their finite number set of properties
and their set of relationship.
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Data model - Types
Depending on the core 3 is based on storing the physical
data model exist.
entity
•net, hierarchical +
property
-
connectivity
+
•relation
+
+
-
•Object oriented
+
+
+
•
+ object-relational (mixid data model)
12
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Databases
• Database: some relation to each other in a
structured set of data, stored so that
multiply users can access, typically digital
form.
• The database is a finite number of entities
occur, their are a finite number of property
value, and the relationship of the presence
data model orgonized as a combination.
• Benefit: you can use many at once. The
data are stored "single" only.
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Integrated database
• Linked to all data that are used by
different users in different groupings.
• The physical placement of data,
centrally, redundancy-free or minimal,
controlled redundancy occurs .
• Centrally controlled
– data protection,
– entering the new data, and
– change existing data.
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Database Management System (DBMS)
• A softvare, which provides the
connection to the database.
• Allows databases
– creation,
– query the data,
– modification,
– maintenance,
– large amounts of data on long-term safe
storage.
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Database Management System (DBMS)
• Grouping
– According to the number of users
• Single-user
• Multi-user
– Job sharing as
• A tasking
• Client-Server
– Number of storage locations
• A stored
• Split /shared
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Database Management System (DBMS)
• The system components
– Data Definition Language (DDL)
• User level
• Conceptual level
• Physical storage level
– Data Manipulation Language (DML)
– Data Control Language (DCL)
13
TÁMOP-4.1.2.A/210/1-2010-0012
Database Management System (DBMS) – Operating concept
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
DBMS – Operating concept - Explanation
1
Request for information from the database (Application
program)
2
Request the interpretation and analysis (DBMS: syntax,
existence, rights)
3a Executeable→ to operating system
3b can not execute → to program
4
Contact the exterior container (operating system)
5
The transfer of the requested data (OS, from storage into
buffer)
6
The passing of data, feedback for a program
7
The receipt of data into a program.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Database Management System (DBMS)
• Two types:
– Has a autonomous languages
•
•
•
•
•
Oracle (1977)
DB/2 (1983)
SyBase (1987)
Informix (1981)
Ingres (1980)
– Plug-in type
• IDMS (1983)
• SQL (1986)
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation database model – Theoretical basis
• In 1970 Dr. Edgar F. Codd (IBM) create the
Relation Database Model.
• The data model describes the various types
of data, their relation, connections, and their
privacy procedures.
• The collected data are logically separate
entity types, entities (table). Determine that
the individual entities, whereas we can
clearly identify, and also what additional
features (attributes).
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – Relation diagram
In VirKor database has seven tables
and their properties and relations.
13
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – Relational mode of representation
• Relation of entities
(special tables) shows.
• They describe the real
world, different entities
and their properties.
• Plants table
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - Relational mode of representation
• The connection between
the entities can be
depicted in relations.
• The data management
comes true with relational
operations.
• Plants – Pests relation
14
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Benefits and disadvantages
Benefits:
• Mathematical (set theoretical) based on models
• Very close to everyday thinking,
• Most flexibly modifiable,
• Well-separable, can be made independent the
three level.
Disadvantages:
• The power delivery is less effective.
– This is not so big trouble already today.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – The properties of relations
•
Is a clear relation in the database;
•
The specimens were characterized by rows and columns of entity properties;
•
The same number of colums in each row;
•
Columns within a clear relation to the name;
•
Any column in a row add up to a value (if no value is NULL);
•
Columns in any order;
•
Not two are the same place;
•
There are least a combination of columns that uniquely identifies the row.
This is the primery key.
•
Identify any data:
–
Relation name
–
+ column name
–
+ value of primery key
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Table, views
We do not store each entity value of each property physically.
•
Base relation. Table
Physically stored.
•
Virtual relation. View
Contains no data. We create from tables with relational operations.
•
Materialized view.
Physically stored. We create from tables with relational operations. Change when
you change the default tables.
•
Snapshot
Physically stored. Value of tables, views in a certain moment.
•
Queries, the selection result.
Relation is not true and only temporarily exist.
•
Temporarily tables
Temporarily need that operation, task.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Keys
Ensure data integrity, consistency and
relation exemption.
The system automatically checks:
• The primary key and foreign key
relations between entities(eg., key of
plants in a diseases of plants)
– matching
– Cascading change
– Cascading delete
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Keys
PRIMARY KEY
• Clearly identify a relation the rows.
• The primary key (or part thereof) may
not be (or not) null value, and should
not contain unnecessary columns.
• It is important to decide what should be
the primary key if you have more
options (eg, person identity: identity
card number, tax identification, social
insurance number)
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Integrity
Integrity additional options:
• Define a unique index (this columns
will not add the same value in two
rows)
• Given specific field conditions must be
satisfied(eg, the check number only
value possibility)
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Indexes
• To expedite the indexed column in the
direct and sequential access.
– Auto maintenance,
– You can always be created, deleted,
– Slows down the change,
– Space is needed.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Keys
FOREIGN KEY
• A column (combination) in relation to
the link only to add value, eithers as a
NULL value, or the referenced tablaóe
with one of the primary key values are
equal.
• Establish connections between the 1:N
relationship. Shall remain valaid for all
the changes, data input, deleting.
Plant Protection Information
Relation model – Foreign key relationship
TÁMOP-4.1.2.A/210/1-2010-0012
Plant Protection Information
Relation model – Foreign key relationship
TÁMOP-4.1.2.A/210/1-2010-0012
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalization
• Normalization is a formal algorithmic
process in which the initial data, the
negative pattern of consistent
application of appropriate rules of
succession is logically more
transparent better shape form.
15
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalization
• The previous steps of design entities well
manageable, received a standard take
forms.
• Algorithmizable.
• Result of:
– The data will be less need for storage;
– The elementary data faster and less errorprone to change;
– The database will be logically clearer.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalization – Functional dependence
•
Any relation of attributes values depend on the values of other attributes.
•
If one of the attributes of the relation R (X), the independent variable is clearly
identified by another attribute (Y), the dependent variable, then we say that Y is
functionally dependent on X from the relation R.
•
Naturally this is a clear relation to the actual content of R is not only valid, but
independent of time, for the whole duration of its existence constraint database.
•
Both the X and Y attributes can be complex, that is consist of several columns as
well.
•
Functional dependence of the usual marked with
R.X
•

R.Y
Maybe this even a functional diagram, dependency diagram is represent with a
different name.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Representation of functional dependence
The arrow Z is from.points to an independent attribute of the dependent
attribute.
Y and Z in the diagram is functionally dependent from X,Y and
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Full functional dependence
• General terms, it R in relation it Y attribute
functionally if and only if X (composite) is
complete attribute, if it is functionally
dependent on X from, but does not depend on
X has only a real component of his.
– If X is not complex, then the functional and the
full functional dependence is the same.
– strong
– weak
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Full dependence
•Be P, Q  A and P  Q.
•Q full dependent (functionally) from P,
P only if Q does not depend on the part of
set
•Otherwise, the dependence is partial.
– For example:
– ORDERITEM {order_id, goods_id, piece}
– REPAYMENTS{deptor_id, month, amount, date}
– VISIT{visitor_id, date, time, subject, period}
15
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relációs modell – Tranzitív függőség
•Depends on the P to S is transitive,
if there exists Q  A, and P  Q,
Q  S, but the reverse is not true
dependecies.
– Példák: P 
Q
S
– WORKER {perid, name, class_code, class_name}
– ORDERHEADER {order_id, custcode, custname,
custaddres, date, deadline, totalvalue}
– VISITOR{id, name, firm, firmname, firmaddres, …}
15
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms
•
•
•
•
•
•
The entity’s structural state
Plant
Latin name
Deseases
0NF 
1NF
Apple
Malus
Applemosaic,
domestica
Impetigo, Apple
2NF
powdery mildew,
…
3NF
Potato
Solanum
Staining virus
tuberosum
reticulated,
4NF
Blight, Black
…
rotting, …
15
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – First normal form
• First normal form (1NF) is the relation
which
– Each column has one and only one attribute
is present,
– Each row is different,
– The order of attributes in each row is same,
– There are not repeating fields,
– Belongs to each line (at least) a unique key,
from which all the other attributes are
functionally dependent.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normál forms - 1NF - example
Plant
Latin name
Disease
Athology
Apple
Malus domestica
Applemosaic
virus
Apple
Malus domestica
Impetigo
mushrooms
Apple
Malus domestica
Apple powdery mildew
mushrooms
Potato
Solanum
tuberosum
Staining virus reticulated
virus
Potato
Solanum
tuberosum
Blight
mushrooms
Potato
Solanum
tuberosum
Black stemrotting
bacterium
Wheat
Triticum vulgare
Pitch staining
mushrooms
…
16
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – 1NF - Anomalies
It can be seen a lot of redundancy (eg Plant and Latin name).
Hidden error possibilities (anomalies of change):
•
Erase anomaly:
– If we erase the removal of the wheat disease Pitch staining
•
Modify anomaly:
– If the potato into the new name blight disease are renamed, you
can either „new” plants will should be modified or anywhere.
•
Enter anomaly:
– New disease can be entered only if a plant is already ill (primary
key can not be part of a NULL value).
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – 2NF
• A relation R is in second normal form (2NF) if and only if it
is in 1NF and every non-key attribute is fully dependent
on the primary key.
– Elementary primary key 1NF relations are also automatically
in2NF. Key relations are complex, however, in order to eliminate
anomalies int the change we need to2NF.(This is not to removes
all the amomalies, but could significantly reduce their number).
This is called decomposition of relations.
– The decomposition happens so, that it 1NF from relation with a
projection like that 2 NF, we manufacture relations, the primary
keys of which the primary key of the original relation, or parts
therefor, are those and can only those column that are fully
dependent in the new primary key.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – 2NF - Example
Plan
Latin name
Apple
Malus domestica
Potato
Solanum tuberosum
Wheat
Triticum vulgare
Plan
Desease
Latin name
Apple
Apple mosaic virus
Apple mosaic virus
Apple
Impetigo
Venturia inaequalis
Apple
Apple powdery mildew
Podosphaera leucotricha
Potato
Virus networking staining
Potato leafroll
Potato
Black stem rotting
Erwinia carotovora subsp.
atroseptica
Potato
Blight
Phytophthora infestans
Wheat
Pitch staining
Lidophia graminis
Picture
16
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – Decomposition
(1NF2 NF)
• R(A,B,C,D) before decomposition(1NF)
– PRIMARY KEY(A,B)
– R.A  R.D
• After decomposition(2NF)
– R1(A,D)
– PRIMARY KEY (A)
• and
– R2(A,B,C)
– PRIMARY KEY (A,B)
– FOREIGN KEY(A), refers to R1
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – 3NF
• A relation R is in third normal form (3NF) if and
only if it is in 2NF and every non-key attribute is
non-transitively dependent on the primary key.
• In other words, the 3NF means that only the
functional dependence of the primary and the
alternative keys can start up.
• Employee of the 2NF relation is not in 3NF,
because for example, the class (CLASS) is not
the primary or alternate key and other columns
(CLASS-NAME), BOSS) is functionally dependent
on it.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – Decomposition
(2NF3NF)
• The decomposition happens so, that the 2 NF
relation, we take the projection, which includes
only those attributes that are exclusivly
dependent on the primary key. This is primarily
key will remain the same. The other new relation
(or relations, if more than one relationship), the
primary key attribute of an independent relation
dismantled, and the columns of his dependent
attributes.
Plant Protection Information
Relation model – 3NF - Example
TÁMOP-4.1.2.A/210/1-2010-0012
Disease
Latin name
Aetiology
Plan
Latin name
Apple mosaic
virus
Apple mosaic
virus
virus
Apple
Malus domestica
Impetigo
fungus
Potato
Solanum tuberosum
Venturia
inaequalis
Apple powdery
mildew
Podosphaera
leucotricha
fungus
Potato leafroll
virus
Erwinia
carotovora subsp.
atroseptica
fungus
Phytophthora
infestans
bacteria
Lidophia graminis
fungus
Wheat
Triticum vulgare
Apple
Virus
networking
Disease
kép
Apple mosaic virus staining
Blight
Impetigo
Apple
Apple powdery mildew
Potato
Potato
Black stem
Virus networking staining
Black stem rotting rotting
Potato
Blight
Wheat
Pitch staining
Plan
Apple
Pitch staining
…
16
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – Decomposition (2NF3 NF)
•
General terms, if the A, B, C, D columns (any compound can be) of 2 NF
relation
– R(A,B,C,D)
• PRIMARY KEY(A)
• R.B  R.C
•
3NF is the decompositions of the re-establishment relations of the following
means:
– R1(B,C)
• PRIMARY KEY(B)
és
– R2(A,B,D)
• PRIMARY KEY(A)
• FOREIGN KEY(A), refers to R1
•
The relation R can be set back at any time is clearly a combination of R1 and R2 (B). It
is however, that the splitting is done according to the principle set out above.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Relation model – Normalforms – Decomposition - 3 NF
Notes:
• Not always appropriate to the 3NF
shape (e.g., address and zip code).
• Most database management system
enough to 1 NF, and even the primary
key is not required!!!
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Catalog, data dictionary
• The database
– definition,
– relationships,
– storage,
– how to use maintaining tables,
• views of all.
• System administration carry out
tasks.
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Plant protection database
• Pesticides Register
• VirKor – assistant educational
material
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Records of pesticides
• The aim of a database
Task of the ER-chemistry Co. register of
pesticides manufactured by the Planning
• The register should include:
– the origin of certain pesticides,
– the elements needed to produce the drug,
– the possible application areas.
• It is assumed that:
– a pesticide may be single or multi-component,
– more may be used against the pest,
– one component can be derived from multiple
suppliers.
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Entity types
• Pesticides(id, name, degree of hazard,
price)
• Factories (factory code, name)
• The fields of application of drugs (pest
, type)
• The drug components (compnent
name)
• The Transporters (Transporter code,
date, name, address)
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Entity type of ER-model
•
•
•
•
•
Pesticides(id, name, hazard, price)
Factories(id, name)
Pests(pestname, type)
Components(name)
Transporters(id, date, name, address)
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - - ER-diagram
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Relations
• Where can produce ?
Factories:Pesticides (1:N)
• What do you apply?
Pesticides:Pest (M:N)
– A pesticide plant produces only one, but a plant can
produce more gain.
– A pesticide may be used against several pests, and in a
pest can destroy more times.
• What are the ingredients?
(M:N)
Pesticides:Components
– A pesticide consists of several components, but other
substances may also be a component creator.
• Where did it come from?
(M:N)
Components:Transport
– Carry more of a component supplier, but a number of
component suppliers will also be distributed
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Relation model
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Relations
Only the primary keys
• Pests(pname, type)
• Factories(fid, name)
• Components (cname)
• Transports(tid, tdate,
tname, taddress)
Primary keys and foreign keys
• Pesticedes(pid, name,
hazard, price, fid)
• Applies(pid, pname,
term)
• Elements(pid, cname,
volume%)
• Origines(cname, tid,
tdate, quantity)
17
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Pest table
18
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Pesticide form
18
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Pests form
18
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Pesticides Register - Factories form
18
Plant Protection Information
VirKor program
assistant educational material
TÁMOP-4.1.2.A/210/1-2010-0012
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - assistant educational material
The Virkor is an assistant educational
program, that helps students
understand how to recognize diseases
of different plants.
• Demonstration boards are modern.
• Educational resource for students of
plant doctor.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How does it works?
• Stored in the database:
– plants,
– diseases,
– these relations.
• The displayed images are stored in a
folder (locally or server).
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – Diseases of plant: Apple
Apple proliferation phytoplasma
Podosphera leuchotricha
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - Diseases of plant: Apple
Apple mosaic virus
Monolinia fructigena
Plant Protection Information
VirKor program – One disease on different Plants:
Mosaic virus on cucumber and apple.
TÁMOP-4.1.2.A/210/1-2010-0012
Plant Protection Information
VirKor program - Symptoms: Necrosis of tissue
TÁMOP-4.1.2.A/210/1-2010-0012
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made?
• Photographed hand-capture
demonstration boards and then were
cleaned.
• The boards in the data recorded in an
Excel spreadsheet.
• Created the relational database
model.
• Developed the application.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made? - Data collection - Digitization
Digitization of the demonstration boards
Original
Cleaned
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made? - Data collection - Stored
Store the data in a worksheet (1NF)
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made? - Modify and correct data
structure
In this case we supplemented the data
with some other properties, for
example: add plant parts.
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made? - Data collection - Analyze the
relationship between data
• Functional dependencies:
– The Latin names of the plants are
dependants of the Hungarian names.
– The same refers to the disease, the
symptoms and the aetiology (e.g.
virus).
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program – How it’s made? – Create a Relation database
model
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Plants
• Apple
and
it diseases
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Diseases
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Plants’
diseases
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
VirKor program - How it’s made? - Develop tutor program
• Form of Plants
• Setting the properties of each tool
• Programming each event
– For example
• load image file in the picture box
• Change the status of checkboxes
• Etc.
Thank you for your kind
attention.
Made by:
Máté Csák PhD.
AZ ELŐADÁS LETÖLTHETŐ:
-
Plant Protection Information
TÁMOP-4.1.2.A/210/1-2010-0012
Bibliography
• Quittner P. - Baksa-Haskó G. (2008): Adatbázisok,
Adatbázis-kezelő rendszerek, DE ATC AVK
• KUPCSIKNÉ FITUS I. (2004): Adatbáziskezelés, AIFSZ képzés
tananyaga
• TÍMÁR L. ET AL. (2007): Építsünk könnyen és lassan
adatmodellt!, Pannon Egyetemi Kiadó, 46/2007, pp. 23-99.
• HERNANDEZ, M. J. – Viescas, J. L. (2009): SQL-lekérdezések
földi halandóknak, Kiskapu.
• ULLMAN, J. D. – Widom, J. (2008): Adatbázisrendszerek
Alapvetés 2. átdolgozott kiadás, Panem Kiadó.
• CZENKY M. (2005): Adatmodellezés - SQL és ACCESS
alkalmazás - SQL Server és ADO, ComputerBooks.
20
Bioinformatics
• Prepared by:
– Sándor Nagy
TÁMOP-4.1.2.A/2-10/1-2010-0012
20
Information Technology in Plant
Protection
Bioinformatics
Bioinformatics Databases and homology
searching
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Contents
•
•
•
•
•
What does Bioinformatics mean?
Structure and operation of DNA
Bioinformatical databases
Using databases
Exercise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Definition
• Bioinformatics derives knowledge from computer
analysis of biological data. These can consist of
the information stored in the genetic code, but
also experimental results from various sources,
patient statistics, and scientific literature.
Research in bioinformatics includes method
development for storage, retrieval, and analysis
of the data. Bioinformatics is a rapidly
developing branch of biology and is highly
interdisciplinary, using techniques and concepts
from informatics, statistics, mathematics,
chemistry, biochemistry, physics, and linguistics.
It has many practical applications in different
areas of biology and medicine.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Fields of Bioinformatics
• Superindividual Bioinformatics
uses systematical modelling in order
tp know biological systems
• Molecular Bioinformatics does
protein and nucleotid analysis and
planning
• Computing Bioinformatics is
focusing on utilization of biological
systems
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Aim of Bioinformatics
Is to decipher the genetically encoded
information, which lead us information
on the followings:
• 3D sturcture,
• Function,
• Evolutionary relations.
DNA
Protein
Function
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Questions answered by Bioinformatics
• In which other creature can we find the
actual sequences? (→ ortholog
searching)?
• What kind of variatons can occure in a
certain creature? (→paralog searching)?
• What is the rate of heterogeity in a
certain paralog (→searching
polymorphism)?
• Which positions are important in a given
sequency (→ evolutionary conserved) ?
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Basics: Structure of the DNA
•
•
•
•
•
•
•
•
Double helix, in which nucleotide bases on the two
strands are connected by Hydrogene bonds: A:T - 2, G:C
– 3 H-bonds
Base pairing: complemeter nucleotide bases within the
long polymer are: A:T and G:C
replication,
Genetic code- isn’t monotone
Two helical chains each coiled round the same axis, and
each with a pitch of 34 Ångströms (3.4 nanometres) and
a radius of 10 Ångströms
These two strands run in opposite directions to each
other and are therefore anti-parallel, 5′ (five prime) and
3′(three prime) ends
It containes four bases: adenine (A), cytosine (C),
guanine (G), thymine (T)
Structure of DNA:
– http://www.youtube.com/watch?v=qy8dk5iS1f0&feature=player
_embedded
Information Technology in Plant Protection
Bases: replication of DNA
• Following from the rule of Base
pairing: Hydrogene bonds within
the double helix can be pulled
apart, both strands are
templates for the synthesis of a
new strand. Result of this
process: same structure
• Genetically coded information
ensured by the order of
nucleotides
• DNA replication:
– http://www.youtube.com/watch?v
=E8NHcQesYl8&feature=related
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Történeti áttekintés
• Early 50thies– publishing insuline
sequence
• 1953 Watson-Crick: Structure of DNA
• Early 70thies – creating algorithms for
sequenal analysis:
– Dot matrix
– Local and Global Sequence Alignment
– BLAST algoritmus
• 1972 first computer stored databases of
proteine sequences
• 1979 GenBank prototype
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Bioinformatical databases
• Gene Bank: NCBI (National Center
for Biotechnology Information)
– http://www.ncbi.nlm.nih.gov/
• European Molecular Biology
Laboratory – European Bioinformatics
Institute - EMBL-EBI
– http://www.ebi.ac.uk/
• DNA DataBank of Japan – DDBJ
– http://www.ddbj.nig.ac.jp/
Information Technology in Plant Protection
Choosing database
Searching keywords
http://www.ncbi.nlm.nih.gov/
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Navigation in Database - Gene Bank
• Choosing database:
– Pubmed – database of
publications
– Protein – database of
proteines
– Nucleotide – database of
nucleic acid
– Genom – database of whole
genomes
– Gene – database of genes
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Navigation in Database - Gene Bank
• Exercise:
– Looking for information on Phytophthora
infestans
• Database: Taxonomy, Keyword:
Phytophthora infestans
• Result of our search (next slide)
– Taxonomic classification
– Databse results from GeneBank
– References to other resources
• Choosing the following reference link on the
result page we can reach all sequences in the
Database: Nucleotid – Dirket
• Looking for INF2A gene within the results
Information Technology in Plant Protection
Eligazodás az adatbázisban Gene Bank
• Eredmény tábla:
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Azonosítók
Information on publishing
Information on structure
Gén azonosítás
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Aminoacid order
Nukleotide
sequence
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Navigation in Database- GenBank
• Dataformats: :
– Summary – short
description, important
information
– GenBank – own format of
Genbank, detailed data
– FASTA –
name+identifiers+sequence
most common used format
– ASN.1 – international format
22
Information Technology in Plant Protection
Navigation in Database- GenBank
• FASTA
format:
– Advantage:
• commonly used
• simple
• small
– Disadvantage:
• less information
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Navigation in Database- GenBank
• Accession number:
– AY693804 - Phytophthora infestans
INF2A (Inf2A) gene, complete cds
– Accepted international identifier for
nukelic acids and protein sequences
• GI (Genbank Identification) number:
– GI:51832280 - - Phytophthora infestans
INF2A (Inf2A) gene, complete cds
– Identifier especially used only by
Genbank
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Use for … ?
• To quest genetic information of a
given organism
• To compare and check our results
• Basis of comparing experiments
• Collection of papers and publication
• Base of researches
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Exercise:
– Look for the genetic code and proteine
sequence of a chosen important
causative agent and examine the
availability of its’ genome.
– Save the given result in FASTA format,
textfile. Keep the saved file, it is required
for the exercise next time.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Content
• Comparing two sequences
• Searching homologue sequences –
BLAST
• Nucleotide BLAST - BLASTN
• Proteine BLAST – BLASTP
• Use for what?
• Exerxise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
How to compare two sequences – Dot matrix
• Dot Matrix method (Gibbs
and McIntyre, 1970): It
compares two amino acid or
nucleotide sequences in a
way of placing the two
sequences in a matrix in
both vertical and horizontal
direction and it draws a dot
in case of parity.
• Exceedingly suitable for
visual demonstration of
mutations, deletions and
insertions.
Information Technology in Plant Protection
How to compare two sequences – Global
Sequence Alignment
TÁMOP-4.1.2.A/210/1-2010-0012
• Other well known analytical method
is the Global Sequence Alignment
which uses dynamical programming.
• Essence of the process: examining
analogy of the sequences with the
help of a scoring system on the
whole sequence.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
How to compare two sequences – Local
Sequence Alignment
• Using also the dynamic programming
process.
• Essence of the process: examining
analogy of the sequences with the
help of a scoring system. It tries to
create the best alignment.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Pair Sequence Similarity Search
• Basic Local Alignment Search Tool
(BLAST) the most effective and common
process of searching similarity
– Peculiarities:
• Fast
• Effective sensibility
– Types:
• Blastn – for nucleotide sequences
• Blastp – for proteins
• Blastx – for translated nucleotid sequences
• http://www.ncbi.nlm.nih.gov/blast
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Pair Sequence Similarity Search
• Types:
BLAST
Kereső szekvencia
Adatbázis
Blastn
Nucleotide
Nucleotide
Blastp
Proteine
Proteine
Blastx
6 frame translated nucleotide
Proteine
Tblastn
Proteine
6 frame translated nucleotide
Tblastx
6 frame translated nukleotide
6 frame translated nucleotide
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Entering sequences
- copy
- upload
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Most important settings: Blastn – Searching Databases
• The most commonly used one is the notredundant nucleotide database (chosen one)
• It is possible to narrow searching in case we
add taxonomical data in section „Organism”.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Most important settings: Blastn – program optimalisation
• Megablast: searching analogies with
95% or bigger similarity, very fast.
• D megablast: exceedingly suitable for
comparison of species, bit slower.
• Suitable for the comparison of any
sequences, it indicates little
similarities, slow.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Most important settings: Blastp – Searching Databases
• The most commonly used one is the notredundant protein database (chosen
one)
• It is possible to narrow searching in case
we add taxonomical data in section
„Organism”.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Most important settings: Blastp – program optimalisation
• Blastp: simple searching in protein
database.
• PSI-BLAST: searching algorithm with
position-specific scoring
• PHI-BLAST: searching with patternspecific scoring system.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
BLAST result evaluation
• Look for those nucleotide
sequences which are similar to
AY693804 - Phytophthora
infestans INF2A gene.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
BLAST result evaluation
Searching parameters
Garphical demonstration of the result
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
BLAST result evaluation
• Result summary table
– Max score
• Bigger value means bigger similarity
– Query coverage
• Bigger value means bigger similarity
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
BLAST result evaluation
• Result summary table
– E value – Expected value
• Lower value means higher similarity.
– Max ident – maximal query alignment
• Higher value means higher similarity
Information Technology in Plant Protection
BLAST result evaluation
• Detailed sequence alignment:
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
BLAST on local environment
• It is possible to run BLAST program in local
environment.
– It is useful in the following cases:
• Comparing sequences to local databases
• Operations requiring large number of calculations
• ftp://ftp.ncbi.nih.gov/blast/
– Command line running with parameter inputs.
– Supporting many operating systems (also 32 and 64
bits architectures)
– Detailed help
– First step is the database formatting, next step is
similarity analysis.
– It is able to create a lot of output formats
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
What can we do with it?
• We have an unknown sequence from an
unknown source.
– What can be the source?
– To which gene is similar?
– What can be the function of the protein
coded by this sequence?
• Use the sequence in FASTA format as
query parameter in the BLAST program.
From the result we can answer the
questions above.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Exercise
– Using of the FASTA format sequence
saved in the previous presentation search
relatives with similarity analysis (BLAST).
Information Technology in Plant Protection
Contents
•
•
•
•
•
Multiple sequence alignments
Examining protein sequences
Protein 3D models
Use for what?
Exercise
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple Sequence Alignment
• Essence:
– Trying to fit more sequences at the same
time. Possible differences are estimated
by penalties.
• Use:
– Searching common peculiars and
parameters
– Inserting new sequences - taxonomy
– Protein structures
– Phylogenetical analysis
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple Sequence Alignment - An example
TTGACATG CCGGGG---A AACCG
TTGACATG CCGGTG--GT AAGCC
TTGACATG -CTAGG---A ACGCG
TTGACATG -CTAGGGAAC ACGCG
TTGACATC -CTCTG---A ACGCG
******** ?????????? *****
• What is the consensus sequence?
• In case of differences it is difficult to
detect common patterns. That’s why we
use alignment software.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple sequence alignment
• Types of alignment:
– Manual: hand-made, laborious and long
process. Human faults can occur.
– Automata: faster, sometimes it doesn’t
consider biological requirement.
– Combinated: we gain the best result by
using together the manual and the
automata processes. First use the
computer then refine it by hand.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple sequence alignment
• Most used program for multiple
sequence alignment: CLUSTAL W
– Downloadable local version:
http://www.clustal.org
– WWW version:
http://www.ebi.ac.uk/clustalw
– Use progressive alignment method
– Fast, low memory usage application
– More sequence alignment effective
– Able to use drawing simple phylogenetic
trees
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple sequence alignment
• Exercise: Do the multiple alignment
on the following sequence:
– Sequence file
http://align.bmr.kyushuu.ac.jp/mafft/online/server/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Multiple sequence alignment
• Example:
– Examine the catalase sequences of the
plants below:
•
•
•
•
Paprika (Capsicum annuum)
Tobacco (Nicotiana tabacum)
Tomato (Solanum lycopersicum)
Potato(Solanum tuberosum)
• Sequence file
• http://align.genome.jp/
Information Technology in Plant Protection
Multiple sequence alignment
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Multiple sequence alignment
The result with Jalview program.
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Phylogenetics
• The study of evolutionary relatedness
among various groups of organisms
through molecular sequencing data
and morphological data matrices.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Phylogenetics
• Relation of phylogenetical analysis and
sequence alignments.
– Sequence alignment determine the similarity
and difference of the aligned sequences.
– In case of ortholog sequences: differences of
sequences from different species arises from the
mutations collected during their different
evolution.
– Number of mutations, namely the rate of the
difference between the sequences is connected
to the evolutionary distance between the two
species: the longer ago the two species
separated, the higher sequence difference occur
within their ortholog genes.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Phylogenetics
• Steps of phylogenetic analysis:
– 1.
– 2.
– 3.
– 4.
Sequence alignment
Definition of evolutionary model
Tree build
Examination of the tree(s)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Phylogenetics
• Example:
– Do the multiple sequence alignment with
the sequence file below and draw the
phylogenetic tree.
– Sequence file
http://align.bmr.kyushuu.ac.jp/mafft/online/server/
Information Technology in Plant Protection
Multiple sequence alignment
• 1. Sequence alignment
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Multiple sequence alignment
TÁMOP-4.1.2.A/210/1-2010-0012
4. Examination of the tree
Information Technology in Plant Protection
Phylogenetics
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Use for what?
• Function prediction
• Relative finding
• Identification
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Exercise
– Look for resistance genes with known
sequences. Do multiple sequence
alignments on them. Evaluate similarities
and create a phylogenetic tree.
– What are the consequences of the result?
Information Technology in Plant
Protection
Bioniformatics multiple sequence
alignment
Protein sequence analysis
Information Technology in Plant Protection
Content
•
•
•
•
Characteristics of the proteins
Protein sequence analysis
Protein 3D models
Practical task
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Characteristics of the proteins
Proteins are biochemical compounds consisting of one or more
polypeptides typically folded into a globular or fibrous form,
facilitating a biological function
Properties:
•
•
•
•
•
•
•
•
•
•
20 amino acid coding triplets
Four types of bases 43 = 64 type of triplets  enough 20 amino
acid coding
61 triplet coding amino acids
3 stop sign (stop codon)
UAA, UAG, UGA
1 codon sign the start of translation (start codon, methionine
AUG)
One triplet coding only one type of amino acid, but the same amino
acid can determinate more triplets  degeneration
synonymous triplets: coding the same amino acid
The gene and it has coded protein chain is coo linear
The code is zero overlapped
The genetically code is universal on the living resources.
Information Technology in Plant Protection
Characteristics of the proteins
DNA nucleotide order
RNA nucleotide order
Protein amino acid
order
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Characteristics of the proteins
The code table. (Griffiths et al., An Introduction of Genetic
analysis, 8th Ed. Fig. 9-8.)
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Group of proteins:
TÁMOP-4.1.2.A/210/1-2010-0012
Basis of biological activity
• Enzymes (pepsin)
• Protection proteins (immunoglobulin)
• Transport proteins (hemoglobin, mioglobin,
transpherin )
• Hormones (insulin, ACTH)
• Structure proteins (collagen, elastin,
keratin)
• Toxins (snake poison)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Characteristics of protein sequences
• Protein synthesis video:
– http://www.youtube.com/watch?v=NJxo
bgkPEAo
• Protein structure video:
– http://www.youtube.com/watch?v=lijQ3a
8yUYQ
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Caracteristics of protein sequences
• Structures:
– Primary structure: order of amino acids
• „MSASSSSALPPLVPALYRWK”
– Secondary structure: spatial structure regularly
repeating local structures. The most common
examples are the:
• Alpha helix and Beta sheet
– Tertiary structure: the overall shape of a single
protein molecule; the spatial relationship of the
secondary structures to one another.
– Quaternary structure: several protein molecules
(polypeptide chains), usually called protein
subunits in this context, which function as a
single protein complex
Information Technology in Plant Protection
Characteristics of protein sequences
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Characteristics of protein sequences
• Protein databases:
– Protein database: UniProt http://www.uniprot.org/
– Protein structure database: ProteinDataBank
http://www.rcsb.org/pdb/home/home.do
– Protein interaction database: String http://string.embl.de/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Protein sequence analysis
• Analysis primary structure
– First of all we examine the distribution
and physical-chemical qualities of
aminoacids.
• Example: HCA - Hydrophobic Cluster Analysis
acetyl-transpherase protein sequence from
fusarium.
– http://mobyle.rpbs.univ-paris-diderot.fr/cgibin/portal.py?form=HCA
Information Technology in Plant Protection
Fehérje szekvenciák vizsgálata
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Protein sequence analysis
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Protein sequence analysis
• For general similarity trials
• We can gain information by general
physical and chemical examinations.
• It is a big challenge for today’s technology
to determine the structure and the function
of a protein from the DNA sequence.
• For instance several diseases can be
defeated if we are able to solve this
problem.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Protein sequence analysis
• Prediction according to the dimension:
– 1D: amino acid properties, which are
able to write as 1D string. Example:
sequence, secondary structure,
hydrophobicity
– 2D: distance and contacts between
amino acid pairs
– 3D: configuration prediction on the basis
of all atom coordinates
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Protein sequence analysis
• Common analysis for protein structures:
– 3D configuration visualization → we can see
some important properties.
– 3D configuration aligning→ similar structure –
similar function.
– 3D configuration classifying → line condition,
similar function.
– 3D configuration predicting → secondary,
tertiary, quaternary structure prediction.
– Small molecules docking → medical product
candidate molecule for known structure
molecule.
– Protein structure behavior→ molecule-dynamic
simulation.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Protein 3D models
• In order to understand the function of a
proteine, we have to know its 3D
structure and quaternary structure. Then
we are able to conclude its linking
possibilities to other molecules,
proteines, enzymes.
• Let’s see an example for acetyltranspherase protein sequence from
fusarium :
– http://www.rcsb.org/pdb/explore/explore.do
?structureId=3FP0
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Analysis of protein synergy
• Database of Interacting Proteins
– It collect data about proteines interacting
(bonding) with each others by experimental
results.
– It describes about 11 000 interaction of
about 6200 proteine.
– Specific details of one interaction: one
proteine, the other proteine, interacting
regions, experimental methods, dissociation
constant, references.
– Example: we can show how interactionnetwork graphs build (nodes clickable).
Information Technology in Plant Protection
Analysis of protein synergy
• http://dip.doe-mbi.ucla.edu/
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Exercise
– Look for the proteine sequence and (if it
exists) the 3D structure of an important
causative agent in an optional sequence
database. Examine the possible linking
points.
Information Technology in Plant
Protection
BioinformaticsGenomes
Information Technology in Plant Protection
Content
•
•
•
•
•
What is genome?
Genome projects
Genome browser software
Use for what?
Exercise
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Definition of genome?
• The genome: „The genome is the
entirety of an organism's hereditary
information. It is encoded either in
DNA or, for many types of virus, in
RNA. The genome includes both the
genes and the non-coding sequences
of the DNA/RNA.” (Wikipedia)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genomics is the discipline of Genome
• Genomics: encompasses a broader scope of
scientific inquiry associated technologies
than when genomics was initially considered.
A genome is the sum total of all an individual
organism's genes. Thus, genomics is the
study of all the genes of a cell, or tissue, at
the DNA (genotype), mRNA (transcriptome),
or protein (proteome) levels.
– Functional genomics: attempts to make use of
the vast wealth of data produced by genomic
projects
– Structural genomics: attempts to determine the
structure of every protein encoded by the
genome
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Tools of genomics
• Microarray (v. chip): It is a 2D array on a
solid substrate (usually a glass slide or silicon
thin-film cell) that assays large amounts of
biological material using high-throughput
screening methods.
• Types:
–
–
–
–
–
DNA microarrays (oligonucleotids or cDNA)
Protein microarrays
Cellular microarrays
Tissue microarrays
Antibody microarrays
Information Technology in Plant Protection
Tools of genomics
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genome projects
• In the last decade several genome
projects has started in the world. The
aim of these projects is the entire
recognition of the genetic code of more
and more creatures. In August 2010
genomes of nearly 2500 species are
known entirely or partly.
• Important projects:
– http://genome.ucsc.edu
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genome projects
• First successful genome project was the
Hemophilus Influenzae in 1995, done by
Fleischmann et al. The first plant
genome was Arabidopsis thaliana in
2000. The entire exploration of the
humane genome was completed in
2003.
• Tools of bioinformatics play an important
role in the analytical work following
genome sequencing. This time happen
assembling the genetically code.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genom browsers
• Trough genome browsers we are able to
view clear and arranged format of
genomic data.
• In some case genome is the starting
point of genetically analysis.
• Some genome browsers:
–
–
–
–
http://genome.ucsc.edu
http://www.ensembl.org
http://ecrbrowser.dcode.org
http://www.ncbi.nlm.nih.gov/mapview/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genome browsers
• Properties of UCSC Genome browser
– News and information on the start page
– Genome grouping on the basis of class
and genome.
– Selectable sequence assembly by date
(tracking able)
– Searching by test
Information Technology in Plant Protection
Position on Chromosome
Chromosome number
Base position
Known genes
Mammals conservations
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Genome browsers
• Options under the graphical interface:
– Gene and gene probability options
– mRNA and EST options
– Expressions and regulation
– Comparison options
– Variations and duplicates
Information Technology in Plant Protection
Genome browsers
• NCBI MapViewer properties
– Four level system
•
•
•
•
Start page
Genome View
Map View
Sequence View
– Ability for keyword searching
– Access to 800 species genomes
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
Genome browsers
• NCBI MapViewer
TÁMOP-4.1.2.A/210/1-2010-0012
Information Technology in Plant Protection
TÁMOP-4.1.2.A/210/1-2010-0012
Exercise
– With the help of genome browser look for
the genome of an important causative
agent in an optional sequence database.
Examine what kind of genes can occur in
the surroundings of the one-millionth
nucleotide on the second chromosome of
this organism.
Information Technology in Plant
Protection
Prepared by:
Dr. János Busznyák - Dr. Máté Csák– Sándor Nagy
Georgikon Faculty
PRESENTATION
-
CAN BE DOWNLOADED FROM:
Download