cabin_trainingSQL

advertisement
INTRODUCTION TO SQL
• SQL stands for « Structured Query Language »
• Progamming language for database closer to natural English than the
other (based on « sentence » instead of « procedure »)
• Aim is to ease the querying of data by the human and the programmation
of interfaces
• Powerful functions for text recognition
• Powerfull extensions for GIS (PostGIS, Oracle)
• Standardized and recognized by most of the recent relational database
BUT
1)...minor differences of syntax between vendors and
enhanced functions prevent easy interoperability between products
2) SQL databases often imply that the development of the
interfaces is a distinct from the development of the core of the
database
Interoperability problem between
vendors
– possible solutions
• use an intermediate layer between the database and
the interface
– ODBC/JDBC (connectors used by other software by
Windows/Java)
– use ORM (Object Relational Mapper) software that allows the
programmer to use the same syntax when developing
interfaces
e.g : Doctrine (Open-Source)
SQL and NoSQL
• SQL is pretty useful for normalized database where the control of data
integrity is important (scientific value)
• ...but it is not scalable : huge amount (> 300 000) of data lower
peformances)
• since 4/5 years, with the explosion of the Internet there is a trend in
NoSQL database; fastr databases that can handle huge amount of data
raplidly, e.g: Solr (to index Words and PDF),MongoDB, Cassandra etc...
• NoSQL offers speed, fast replication between locations, flexible structure
but no control on integrity. It doesn’t replace SQL but complements it
(SQL=> control of the integrity and of the completness of data is
more important than speed + good interaction with GIS
NoSQL: high availability of data on the Internet but no schema to
validate integrity and not yet GIS plug in )
Problem: scientic information network requires both quality control and high
availiability
SQL: 4 parts
• Data Query Language (DQL)
– Search and display data matching specific criteria
• Data manipulation language (DML):
– modify data (insert, update, delete)
– lock (atomicity of data: two user cannot modify the ame data in
parallel)
– use transation (rollback to the previous state of the database if a
modification fails)
• Data Definition language (DDL)
– create the schema of the database (the normalised structure, the
index): you can defined yourselve how to check the integrity of the
database
• Data Control language (DCL)
– create authorization and access rule for users
Vocabulary
Table
Field
Field
name
Field
Type
Record
(or tuple)
Recommandations
To ease the manipulation with SQL
when creating a database:
– Avoid uppercase letters in field names
– Avoid accented characters in field names (but you must keep
them in the content of course!)
– replace white spaces with underscore
– avoid at any price other non alphabetical or numerical
characters
– avoid giving the same name to two fields in different tables (not
always possible...)
– table name in plural
– field name singular
– use descriptive field name (e.g: not ‘dc’ but ‘date_collected’)
Querying
Pattern: SELECT <comma-separated list of
fields> FROM <name of Table> ;
e.g.
SELECT Locality FROM localities;
SELECT Locality, Country FROM localities;
SELECT * FROM Localities;
« * »=> all fields
(wildcard)
Querying II
Pattern: SELECT <comma-separated list of
fields> FROM <name of Table> WHERE
[condition] ;
e.g.
SELECT pk_locality, latitude_decimals,
longitude_decimals FROM localities WHERE
Locality =‘Tienen’;
Querying II
Pattern: SELECT <comma-separated list of
fields> FROM <name of Table> WHERE
[condition] ;
e.g.
SELECT * FROM localities
WHERE
latitude_decimals >50.80
AND latitude_decimals<50.85
Querying III (boolean)
Compare the result
SELECT * FROM localities
WHERE
latitude_decimals >50.80
AND latitude_decimals<50.85
SELECT * FROM localities
WHERE
latitude_decimals >50.80
OR latitude_decimals<50.85
Querying IV (boolean)
Compare the result
SELECT * FROM localities
WHERE
locality=‘Tienen’
AND locality=‘Bunsbeek’;
SELECT * FROM localities
WHERE
locality=‘Tienen’
OR locality=‘Bunsbeek’;
Querying II
Pattern: SELECT <comma-separated list of fields>
FROM <name of Table> WHERE [condition] ;
e.g.
SELECT * FROM localities
WHERE
locality <> ‘Hensberg’;
SELECT * FROM localities
WHERE
locality IS NULL;
JOINING (I)
SELECT *
FROM specimens
JOIN
scientific_names
ON
specimens.fk_scientific_name
=
scientific_names.pk_scientific_na
me
[+ WHERE CONDITION] ;
Joining II
• Exercice
– Find the collectors of ‘Agostis’
Joining II
• Exercice
• Find the collectors of ‘Agostis’
SELECT collector_name, genus
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
where genus='Agrostis';
Joining III
• Exercice
– Find the scientific names having been collected in
Tienen
Joining III
• Exercice
– Find the scientific names having been collected in Tienen
SELECT scientific_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
localities ON
specimens.fk_locality=localities.pk_locality
where locality='Tienen';
Joining III (ordering)
• Exercice Find the scientific names having been collected in
Tienen
SELECT scientific_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
localities ON
specimens.fk_locality=localities.pk_locality
where locality='Tienen‘
ORDER BY scientific_name;
Joining III
• Exercice Find the collectors of ‘Balsaminaceae’
– Find the collectors of ‘Balsaminaceae’
Joining III
• Exercice
– Find the collectors of ‘Balsaminaceae’
SELECT collector_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
families ON
scientific_names.fk_family=families.pk_family
where family='Balsaminaceae'
;
Views
‘Save’ and make complex queries permanent in the database
(useful for programming of filtering)
CREATE VIEW v_specimen_names_localities
AS
SELECT scientific_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
localities ON
specimens.fk_locality=localities.pk_locality
Search on Text Patterns (I)
a) match one position: '_';
‘_’ means any character present one time
b) match several positions: '%';
‘%’ means the absence or repetition of any
character
Note: white space counts for one character
Search on Text Patterns (II)
• SQL Syntax
SELECT ...WHERE field LIKE 'pattern';
• PostgresSQL Syntax
SELECT ...WHERE field SIMILAR TO 'pattern';
Search on Text Patterns (III)
Example:
find the scientific names having «’e’ » as second
letter of genus:
SELECT scientific_name
FROM scientific_names
WHERE genus SIMILAR TO '_e%';
Search on Text Patterns (IV)
Example:
Pattern:
Response:
'_e%';
‘Aegopodium’
‘Aethusa’
‘Bellis’
‘Betula’
...
Search on Text Patterns (V)
Example:
Pattern:
Response:
'_e%';
‘Aegopodium’
‘Aethusa’
‘Bellis’
‘Betula’
...
Search on text pattern (VI)
• Interval of characters
• Use brackets
[a-z]: any lower case letter
[A-Z]: any uppercase letter
[0-9]: any numer
[aA]: ‘a’ or ‘A’
Search on text pattern (VII)
• Useful to control nomenclature!!
• Exercice: Search the species containing
uppercase characters:
Search on text pattern (VII)
• Useful to control nomenclature
• Exercice: Search the species containing
uppercase characters:
SELECT *
FROM scientific_names
WHERE species SIMILAR TO '%[A-Z]%';
Search on text pattern (VIII)
• Useful to control nomenclature
• Exercice: Search the genus containing
uppercase letters after the first one:
Search on text pattern (VIII)
Exercice: Search the genus containing uppercase
letters after the first letter:
SELECT *
FROM scientific_names
WHERE genus SIMILAR TO ‘_%[A-Z]%';
Search on text pattern (IX)
• Useful to control nomenclature
• Exercice: Search the genus containing more
than one word:
Search on text pattern (IX)
Exercice: Search the genus containing more than
one word
SELECT *
FROM scientific_names
WHERE genus SIMILAR TO '%[a-z]% %[a-z]%';
Search on text pattern (X)
• PostgreSQL is also compliant with an even
more powerfull mechanism called « regular
expression »
– standard syntax shared by several programming
languages
– allow matching complex patterns
– can perform replacements and extractions
<optional if somebody ask how to
group information in one row>
Group specimen collected in Tienen per Collector
SELECT array_to_string(array_agg(scientific_name), ','), collector_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
localities ON
specimens.fk_locality=localities.pk_locality
where locality='Tienen'
GROUP BY collector_name
ORDER BY collector_name;
<optional if somebody ask how to
group information in one row>
Group localities per collectors
SELECT array_to_string(array_agg(locality), ','), collector_name
FROM specimens
JOIN
scientific_names ON
specimens.fk_scientific_name=
scientific_names.pk_scientific_name
JOIN
localities ON
specimens.fk_locality=localities.pk_locality
GROUP BY collector_name
ORDER BY collector_name;
Download