Bioinformatics 101

advertisement
Eco (Bio)informatics
Website Development 101
A primer on Creating Biologicallybased Websites
1
Introduction
Modern, biologically oriented websites
have evolved rapidly in the last ten years;
and will continue to evolve at least as
rapidly for the foreseeable future.
Web 2.0: wikis, crowd sourcing, blogs, Flickr, YouTube
Interactive queries and graphing.
Static tables and ‘click here to download’ interactivity
Flash & Java powered interactivity and effects
The next big thing…?
2
Fundamentals
• Know your site’s mission
• Know your audience
• Taxonomy
• Discovery
• Web 2.0
• Copyright and ownership issues
• Getting Started
3
What is Your Mission?
• What is the purpose(s) of your website
• Outreach
• Research
• Communication
• Data portal
• Analytical tool
• Education
• Which leads us to the next question…
4
Who is your Audience?
• Identify your audience(s)
• Major audience groups include:
• Elementary school
• High school
• University
• Researcher
• General public
• Decision makers
5
Who is your Audience?
• Each group requires different language sets,
different assumptions about prior knowledge,
different font/color styles, tools
• Targeting multiple audiences with one
website is possible, but difficult to do well
• Multiple entrance paths and templates can be
used
6
Taxonomy
• The single, most difficult issue for
managing biological data sets
• Species name <> Species Concept
• A species name is a particular label that
someone has applied to a particular species
concept
• A given species concept may have many
names (synonyms)
7
Taxonomy
A species name is defined by a Latin binomial and an
authority.
The authority is critical as it
defines who originally
described the species.
Currently Accepted name:
Incilius melanochlorus Cope, 1877
Synonyms:
Bufo melanochlora Cope, 1877
Ollotis melanochlora Cope, 1877
8
Taxonomy
A species name may also include lower rank names
that define a variety and/or sub-species.
The lower rank name(s) also have authorities
Swartzia simplex (Sw.) Spreng. var. continentalis Urb.
9
Taxonomy
Ideally, there should be a one-to-one match between
name and concept, unfortunately, the world is not
ideal…
• Many names are being revised due to misspellings
of the original Latin name
• Disagreements about what constitutes a species
(lumpers vs. splitters, geneticists vs. naturalists,
ecologists vs. taxonomists)
• Disagreements among major plays such as ITIS,
GBIF, Species 2000 and group-specific sites
10
Taxonomy
DNA and the new Taxonomy
• DNA analysis has led to major revisions of all
major kingdoms.
• Changes are being made at all taxonomic levels,
from phylum on down
• Angiosperm Phylogenetic Group (APG) pretends to
reorganize the entire Angiosperm group
• Next 10 years will see major shake-ups of many
major taxonomic groups
11
Taxonomy
Out of Chaos, order…
• Rule #1. Accept that there is no universal
agreement and move on
• Rule #2. Pick a source or sources and stick with
them
• Rule #3. Manage taxonomy separately from
metadata
• Rule #4. Use species catalog numbers rather
than names to link objects together
12
Taxonomy
Synonyms can be handled using four variables:
• Spnumber
Species name catalog #
• Taxstat Taxonomic status of name
• Accepted
• Synonym
• Excluded
• Incomplete
• Synof
• Synonyms
Spnumber of Accepted name for
this species synonym
Synonym(s) for this accepted
name
13
Taxonomy
I want a picture of
Algus grenus
Algus Grenus
Taxonomic Database
Name
Spnumber Taxstat
Algus Grenus
5212
.
.
Algus verdus 1234
ACC
Synof Synonyms
SYN 1234
5212
Spnumber = 5212, 1234
Photo Database
14
Discovery
Build for Discovery…
Putting something on the web has little value if no
one can find it
• Metadata
• Optimizing Site Navigation
• Understand how search engines work
• Understand how your audience thinks
15
Discovery
Metadata
It’s more than just information about photos…
• it’s information about every object that you want people
to know about in the future
• it’s the primary method to rigorously document the who,
what, when, where and how of an object and to make it
machine searchable
• The best metadata takes the available information and
atomizes it as much as possible
16
Discovery
This thing found here doing this on this date
by this person
&
verified by
17
Discovery
Only atomized information can be efficiently be searched.
Reserve free text information for unsearched titles and
comments.
Control the vocabulary of the information used in databases.
Any spelling difference, no matter how minor, will be
interpreted as different.
Controlled vocabulary makes it easier for the user to search for
and discover information.
18
Discovery
Navigation: Multiple access routes
Traditional Linear Navigation Design
Project 1
Project 2
Home Page
"He's intelligent, but
not experienced. His
pattern indicates two
dimensional
thinking…"
Project 3
Project 4
19
Discovery
Navigation: Multiple access routes
Multidimensional Navigation Design
• Use persistent tabs and menus
• Search boxes
• Embedded hyperlinks
• Anticipate user navigation behavior
Homepage
20
Discovery
Navigation: Minimizing Clicks
Always aim to minimize the average number of clicks that a
user should need to go from any page on your web site to
any other.
Ideally, a user should not need more than 3-4 clicks to go
from anywhere to anywhere else.
21
Web 2.0
What is this Web
2.0 thing?
The answer
depends on
who you ask
22
Web 2.0
“Web 2.0” refers to the second generation of
web development and web design that
facilitates information sharing and collaboration
on the World Wide Web.
Examples include social-networking sites, videosharing sites, wikis, blogs, mashups and
folksonomies.
(Wikipedia)
23
Web 2.0
• Web 2.0 websites allow users to do more than just
retrieve information.
• Users can own the data on a Web 2.0 site and exercise
control over that data.
• These sites may have an "Architecture of participation"
that encourages users to add value to the application as
they use it. This stands in contrast to traditional websites,
the sort that limited visitors to viewing and whose content
only the site's owner could modify.
• Web 2.0 sites often feature richer, user-friendly interfaces
24
Web 2.0
Popular examples of Web 2.0 websites include:
Wikipedia
Flickr
YouTube
eBuddy
Digg
TravBuddy
25
Web 2.0
• Search. The ease of finding information through keyword
search.
• Links. Ad-hoc guides to other relevant information.
• Authoring. The ability to create constantly updating
content over a platform that is shifted from being the
creation of a few to being constantly updated, interlinked
work. In wikis, the content is iterative in the sense that
users undo and redo each other's work. In blogs, content
is cumulative in that posts and comments of individuals
are accumulated over time.
26
Web 2.0
• Tags. Categorization of content by creating tags: simple,
one-word user-determined descriptions to facilitate
searching and avoid rigid, pre-made categories.
• Extensions. Powerful algorithms that leverage the Web as
an application platform as well as a document server.
• Signals. The use of RSS* technology to rapidly notify users
of content changes.
*(most commonly translated as "Really Simple Syndication," but sometimes
"Rich Site Summary")
27
Copyright & Ownership
• Copyright is an important issue
• Copyright law is complex, often vague, and
varies considerably between countries
• Ignorance is not an excuse – get informed
28
Copyright & Ownership
Who owns this file?
• Anything produced using US Federal funds is
considered to be Public Domain and not subject to
copyright. In general, the funding agent usually has
copyright.
• Otherwise, copyright is automatic (under US law)
29
Copyright & Ownership
Objects can be re-copyrighted by
others only if and when
‘significant new’ artistic content
has been added
Contrast enhancement, color
corrections, sharpening, etc., do
NOT constitute new artistic
content
30
Copyright & Ownership
Creative Commons Licenses
Creative Commons is a nonprofit corporation
dedicated to making it easier for people to share
and build upon the work of others, consistent
with the rules of copyright.
CC provides free licenses and other legal tools to
mark creative work with the freedom the creator
wants it to carry, so others can share, remix, use
commercially, or any combination thereof.
31
Copyright & Ownership
There are Six current License agreements
1.
2.
3.
4.
5.
6.
Attribution
Attribution, No derivatives
Attribution, Non-commercial, No derivatives
Attribution, Non-commercial
Attribution, Non-commercial, Share-alike
Attribution, Share-alike
32
Copyright & Ownership
Attribution:
You let others copy, distribute, display, and perform your copyrighted work and derivative works based upon it - but only if they give you credit.
Noncommercial:
You let others copy, distribute, display, and perform your work - and
derivative works based upon it - but for noncommercial purposes only.
No Derivative Works:
You let others copy, distribute, display, and perform only verbatim copies of
your work, not derivative works based upon it.
Share Alike:
You allow others to distribute derivative works only under a license identical
to the license that governs your work.
33
Copyright & Ownership
Fair use is a doctStates copyright law that
allows limited use of copyrighted material without requiring
permission from the rights holders, such as use for scholarship
or review. It provides for the legal, non-licensed citation or
incorporation of copyrighted material in another author's work
under a four-factor balancing test.
The term "fair use" originated in the United States, but has
been added to Israeli law as well; a similar principle, fair dealing,
exists in some other common law jurisdictions. Civil law
jurisdictions have other limitations and exceptions to copyright.
(Wikipedia)
rine in United
34
Copyright & Ownership
In determining whether the use made of a work in any
particular case is a fair use, the factors to be considered include:
1. the purpose and character of the use, including whether such use is of a
commercial nature or is for nonprofit educational purposes;
2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used in relation to the
copyrighted work as a whole; and
4. the effect of the use upon the potential market for or value of the
copyrighted work.
35
Getting Started
Once you know your site’s mission & have an idea about who
your audience is, do the following:
1. Sketch out the major logical blocks of your site
2. Search the web for similar sites and make a list of what
works and what doesn’t
3. Imitate and copy the good stuff(people usually like it when
you ‘steal’ their design ideas)
4. Avoid their mistakes
5. If you are creating a website for someone else, frequently
check with the PI regarding design and programming
36
decisions
Getting Started
If you are going to make many websites:
1. Invest time in creating tools that can be shared between sites
2. Invest time in adopting a Content Management System (CMS)
3. Design your websites so that they can share data
4. Select a style and stick with it (CSS)
37
Getting Started
No matter how many websites you have:
1. Document what you are doing
•
•
•
Internal programming comments (you can never put too much)
External programming documentation listing all major program
blocks, procedural calls, parameters passed, etc.
Database documentation: variables (types & definitions) and general
content
2. Back up often or bad things will happen to you
3. For really big projects, consider implementing roll-back
technology
38
Getting Started
What tools should you use?
The most common suite of tools for low-budget, non-commercial operations
include:
• MySQL databases
• PhP programming language (and/or PERL)
• Flash and/or Java script
• Linux operating system
• Apache server
39
Getting Started
What other tools might you use?
There are many good tools available, both commercial and non-commercial
(open source):
• ArcGIS by ESRI, Grass, Mininesota MapServer
• Drupal CMS
• Wiki software
• Blogging software
• Graphing applications
There are lots of arguments for and against commercial and open
source software. There is also the possibility of creating your own
software tools. Mixed models often work well.
40
Questions and Comments
41
Download