Full-text index

advertisement
SQL Server 2008 for Developers
UTS Short Course
Course Website

Course Timetable & Materials

Resources

http://sharepoint.ssw.com.au/Training/UTSSQL/
Course Overview
Session Date
Time
Topic
1
Tuesday
06-03-2012
18:00 - 21:00
SQL Server 2008 Management Studio
2
Tuesday
13-03-2012
18:00 - 21:00
T-SQL Enhancements
3
Tuesday
20-03-2012
18:00 - 21:00
High Availability
4
Tuesday
27-03-2012
18:00 - 21:00
CLR Integration
5
Tuesday
03-04-2012
18:00 - 21:00
Full-Text Search
What we did last week
CLR Integration

.NET

.NET FX

CLR
What we did last week
CLR Integration

Stored Proc

Functions

Triggers

Bottom Line


Use T-SQL for all data operations
Use CLR assemblies for any complex calculations and
transformations
Homework?








Find all products that have a productnumber starting with BK
Find all products with "Road" in the name that are Silver
Find a list of products that have no review
Find the list price ([listprice]) of all products in our shop
What is the sum of the list price of all our products
Find the product with the maximum and minimum listprice
Find a list of products with their discount sale (hint see
Sales.SalesOrderDetail)
Find the sum of prices of the products in each subcategory
Session 5
SQL Server Full-Text Search
using Full-Text search in SQL Server 2008
Agenda

What is Full text search

The old way 2005

The new way 2008

How to

Querying
What is Fulltext search

SELECT *
FROM [Northwind].[dbo].[Employees]
WHERE Notes LIKE '%grad%‘
What is REAL Fulltext search

Allows searching for text/words in columns



Based on special index


Similar words
Plural of words
Full-text index (Full text catalog)
SELECT *
FROM [Northwind].[dbo].[Employees]
WHERE FREETEXT(*,'grad‘)
Theory
Full-Text Search Terminology 1/3


Full-text index

Information about words and their location in columns

Used in full text queries
Full-text catalog


Group of full text indexes (Container)
Word breaker

Tokenizes text based on language
Full-Text Search Terminology 2/3

Token


Stemmer


Generate inflectional forms of a word (language specific)
Filter


Word identified by word breaker
Extract text from files stored in a varbinary(max) or image
column
Population or Crawl

Creating and maintaining a full-text index.
Full-Text Search Terminology 3/3

Stopwords/Stoplists



not relevant word to search
e.g. ‘and’, ‘a’, ‘is’ and ‘the’ in English
Accent insensitivity

cafè = cafe
Fulltext search – Under the hood
The old way! SQL 2005
The new way! SQL 2008
How to
Administration
Administering Full-Text Search
Full-text administration can be separated into three main
tasks:

Creating/altering/dropping full-text catalogs

Creating/altering/dropping full-text indexes

Scheduling and maintaining index population.
Administering Full-Text Search
sp_fulltext_catalog
sp_help_fulltext_catalogs_cursor
sp_fulltext_column
sp_help_fulltext_columns
sp_fulltext_database
sp_help_fulltext_columns_cursor
sp_fulltext_service
sp_help_fulltext_tables
sp_fulltext_table
sp_help_fulltext_tables_cursor
sp_help_fulltext_catalogs
Index vs. Full-text index
Full-text indexes
Stored in the file system, but
administered through the database.
Stored under the control of the
database in which they are defined
Only 1 full-text index allowed per
table
Regular SQL Server indexes
Stored under the control of the
database in which they are defined
Several regular indexes allowed per
table
Addition of data to full-text indexes,
called population, can be requested Updated automatically when the
through either a schedule or a specific data upon which they are based is
request, or can occur automatically
inserted, updated, or deleted
with the addition of new data
Administering Full-Text Search

Automatic update of index


Manually repopulate full text index


Slows down database performance
Time consuming
Asynchronous process in the background


Periods of low activity
Index not up to date
How to
Creating a Full Text Catalog

SQL 2005 Only

SQL 2008 is smart
SQL 2005
Creating a Full-Text Catalog (SQL 2005)
Syntax
CREATE FULLTEXT CATALOG catalog_name
[ON FILEGROUP filegroup ]
[IN PATH 'rootpath']
[WITH <catalog_option>]
[AS DEFAULT]
[AUTHORIZATION owner_name ]
<catalog_option>::=
ACCENT_SENSITIVITY = {ON|OFF}
Example
USE AdventureWorks_FulllText
CREATE FULLTEXT CATALOG AdventureWorks_FullTextCatalog
ON FILEGROUP FullTextCatalog_FG
WITH ACCENT_SENSITIVITY = ON AS DEFAULT
AUTHORIZATION dbo
Creating a Full-Text Catalog
Step by step
1.
Create a directory on the operating system named C:\test
2.
Launch SSMS, connect to your instance,
and open a new query window
3.
Add a new filegroup to the AdventureWorks_FulllText
USE Master
GO
ALTER DATABASE AdventureWorks_FulllText
GO
ALTER DATABASE AdventureWorks_FulllText ADD FILE (NAME = N’
AdventureWorks_FulllText _data’, FILENAME=N’C:\TEST\
AdventureWorks_FulllText _data.ndf’, SIZE=2048KB, FILEGROTH=1024KB
) TO FILEGROUP [FTFG1]
GO
4.
Create a full-text catalog on the FTFG1 filegroup by executing the following
command:
USE AdventureWorks_FulllText
GO
CREATE FULLTEXT CATALOG AWCatalog on FILEGROUP FTFG1 IN PATH
‘C:\TEST’ AS DEFAULT;
GO
SQL 2008
SQL 2008
How to
Creating Full Text Indexes
Property of column
Full-text Index property window
How to
Index and Catalog Population
Populating a Full-Text Index
Because of the external structure for storing full-text indexes, changes
to underlying data columns are not immediately reflected in the full-text
index. Instead, a background process enlists the word breakers, filters
and noise word filters to build the tokens for each column, which are
then merged back into the main index either automatically or manually.
This update process is called population or a crawl. To keep your fulltext indexes up to date, you must periodically populate them.
Populating a Full-Text Index
You can choose from there modes for full-text population:

Full

Incremental

Update
Populating a Full-Text Index

Full



Incremental



Read and process all rows
Very resource-intensive
Automatically populates the index for rows that were modified since the last
population
Requires timestamp column
Update


Uses changes tracking from SQL Server (inserts, updates, and deletes)
Specify how you want to propagate the changes to the index
•
•
AUTO automatic processing
MANUAL implement a manual method for processing changes
Populating a Full-Text Index
Example
ALTER FULLTEXT INDEX ON
Production.ProductDescription
START FULL POPULATION;
ALTER FULLTEXT INDEX ON Production.Document
START FULL POPULATION;
Populating a Full-Text Catalog
Syntax
ALTER FULLTEXT CATALOG catalog_name
{ REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ] |
REORGANIZE | AS DEFAULT }

REBUILD deletes and rebuild


ACCENT_SENSITIVITY change
REORGANIZE merges all changes


Performance
Frees up disk and memory
Populating a Full-Text Catalog
Example
USE AdventureWorks_FulllText;
ALTER FULLTEXT CATALOG AdventureWorks_FullTextCatalog
REBUILD WITH ACCENT_SENSITIVITY=OFF;
-- Check Accentsensitivity
SELECT FULLTEXTCATALOGPROPERTY('AdventureWorks_FullTextCatalog',
'accentsensitivity');
Managing Population Schedules

In SQL 2000, full text catalogs could only be populated on
specified schedules

SQL 2005/2008 can track database changes and keep the
catalog up to date, with a minor performance hit
Querying SQL Server Using Full-Text
How to
Search
Querying SQL Server Using Full-Text
Search
Full-Text query keywords

FREETEXT

FREETEXTTABLE

CONTAINS

CONTAINSTABLE
FREETEXT

Fuzzy search (less precise )


Inflectional forms (Stemming)
Related words (Thesaurus)
FREETEXT

Fuzzy search (less precise )


Inflectional forms (Stemming)
Related words (Thesaurus)
SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE [Description] LIKE N'%bike%';
SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE FREETEXT(Description, N’bike’);
FREETEXTTABLE

+ rank column


Value between 1 and 1,000
Relative number, how well the row matches the search criteria
SELECT
PD.ProductDescriptionID,
PD.Description,
KEYTBL.[KEY],
KEYTBL.RANK
FROM
Production.ProductDescription AS PD
INNER JOIN FREETEXTTABLE(Production.ProductDescription,
Description, N’bike’)
AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY]
CONTAINS
•
Lets you precise what fuzzy matching algorithm to use
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N‘”bike*”'):
FORMSOF
•
Lets you precise what fuzzy matching algorithm to use
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, ride) ');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (THESAURUS, ride) ');
INFLECTIONAL
Consider word stems in search
“ride“  “riding", “riden", ..
THESAURUS
Return Synonyms
"metal“  "gold", "aluminium"," steel", ..
Word proximity
NEAR ( ~ )
How near words are in the text/document
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'mountain NEAR bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'mountain ~ bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, 'ISABOUT (mountain weight(.8), bikes
weight (.2) )');
Querying SQL Server Using Full-Text Search

Full-text search much more powerful than LIKE


More specific, relevant results
Better performance
• LIKE for small amounts of text
• Full-text search scales to huge documents


Provides ranking of results
Common uses



Search through the content in a text-intensive,
database driven website, e.g. a knowledge base
Search the contents of documents stored in BLOB
fields
Perform advanced searches
• e.g. with exact phrases - "to be or not to be" (however needs
care!)
• e.g. Boolean operators - AND, OR, NOT, NEAR
Writing FTS terms

The power of FTS is in the expression which is passed to the
CONTAINS or CONTAINSTABLE function

Several different types of terms:





Simple terms
Prefix terms
Generation terms
Proximity terms
Weighted terms
Simple terms

Either words or phrases

Quotes are optional, but recommended

Matches columns which contain the exact words or phrases specified

Case insensitive

Punctuation is ignored

e.g.




CONTAINS(Column,
CONTAINS(Column,
CONTAINS(Column,
CONTAINS(Column,
'SQL')
' "SQL" ')
'Microsoft SQL Server')
' "Microsoft SQL Server" ')
Prefix terms

Matches words beginning with the specified text

e.g.


CONTAINS(Column, ' "local*" ')
• matches local, locally, locality
CONTAINS(Column, ' "local wine*" ')
• matches "local winery", "locally wined"
Generation terms

Inflectional

FORMSOF(INFLECTIONAL, "expression")

"drive“  "drove", "driven", .. (share the same stem)
When vague words such as "best" are used
"good" is OK



Thesaurus

FORMSOF(THESAURUS, "expression")

"metal“  "gold", "aluminium"," steel", ..
Both return variants of the specified word, but variants are determined
differently
Thesaurus

Supposed to match synonyms of search terms – but the
thesaurus seems to be very limited

Does not match plurals

Not particularly useful
http://technet.microsoft.com/enus/library/cc721269.aspx#_Toc202506231
Proximity terms

Syntax
CONTAINS(Column, 'local NEAR winery')
CONTAINS(Column, ' "local" NEAR "winery" ')

Important for ranking

Both words must be in the column, like AND

Terms on either side of NEAR must be either simple or proximity
terms
Weighted terms

Each word can be given a rank

Can be combined with simple, prefix, generation and proximity terms

e.g.


CONTAINS(Column, 'ISABOUT(
performance weight(.8),
comfortable weight(.4)
)')
CONTAINS(Column, 'ISABOUT(
FORMSOF(INFLECTIONAL, "performance") weight (.8),
FORMSOF(INFLECTIONAL, "comfortable") weight (.4)
)')
Pro
Contra
Pros?
Cons?
Disadvantages

Full text catalogs




Disk space
Up-to-date
Continuous updating  performance hit
Queries



Complicated to generate
Generated as a string
Generated on the client
Advantages

Backing up full text catalogs

SQL 2005




Included in SQL backups by default
Retained on detach and re-attach
Option in detach dialog to include keep the full text catalog
In SQL2008 you don’t have to worry about this
Advantages

Much more powerful than LIKE



Specific
Ranking
Performance

Pre-computed ranking (FREETEXTTABLE)

Configurable Population Schedule

Continuously track changes, or index when the CPU is idle
Quick tips - Podcasts
Pluralcast - SQL Server Under the Covers
http://shrinkster.com/1ff4
Dotnetrocks - Search for SQL Server
http://www.dotnetrocks.com/archives.aspx
RunAsRadio - Search for SQL Server
http://www.runasradio.com/archives.aspx
Session 5 Lab

Full text search
Download from Course Materials Site (to copy/paste scripts) or
type manually:
http://sharepoint.ssw.com.au/Training/UTSSQL/
3 things…

EricPhan@ssw.com.au

http://ericphan.info

twitter.com/ericphan
3 things…

mehmet@ssw.com.au

http://blog.ozdemir.id.au

twitter.com/mozdemir_au
Thank You!
Gateway Court Suite 10
81 - 91 Military Road
Neutral Bay, Sydney NSW 2089
AUSTRALIA
ABN: 21 069 371 900
Phone: + 61 2 9953 3000
Fax: + 61 2 9953 3105
info@ssw.com.au
www.ssw.com.au
Download