Distributed Relational Database Service Product Introduction

advertisement
Distributed Relational Database Service
Product Introduction
Distributed Relational Database Service/Product Introduction
Product Introduction
DRDS Summary
DRDS product summary
As the most common software in present service, stand-alone database can easily meet users'
requirements for relational query. However, for most applications, stand-alone database will hit the
performance ceiling of single machines at last. The stand-alone database will suffer from various
limits in TPS/QPS/memory capacity/disk capacity and other system sources.
The primary objective of the DRDS is to help you resolve different problems in this field. It has two
main functions: read/write splitting and database sharding.
Read/write splitting allows you to write data in a machine and read data on several machines, which
can solve the system bottle-neck at extreme low cost for the read more and write less applications.
Database sharding is an ultimate solution of solving the storage bottle-neck of the system. In fact,
dividing and ruling is the very simple core concept of database sharding. Various bottle-necks of
services can be solved at extreme low cost by dispersing data to several machines and assuring that
requests can be distributed to these machines averagely.
Of course, there is cost in sharding. The most obvious cost is that the distributed database will limit
scenarios of several original stand-alone databases. For such operations, the delay or efficiency in the
distributed environment is very low. It is not available for performance even though they can be
implemented.
For such problems, the Alibaba middleware team has helped nearly 300 business application systems
implement database sharding. Therefore, we have accumulated a lot of experiences and mature
products in this field, and hope that such products can give you the same service, so that you can not
worry about the performance of your database.
Horizontal Data Sharding
The DRDS helps you implement sub-database and sub-table so that the SQL that can be executed
1
Distributed Relational Database Service/Product Introduction
only on single node can be transformed into the SQL executed on multiple notes, which is similar as
the individual database experience. The dynamic horizontal expansion of data storage is
implemented by the support system, and at present, the DRDS has been internally used by hundreds
of application systems for more than 5 years efficiently, safely and stably.
However, the distributed database and stand-alone database have certain differences in application
habits in itself, such as low-efficiency distributed transaction and distributed join. For these problems,
DRDS chooses performance and stability as priority, then the compatibility of the software.
Compared with the commercially available open-source database sharding tools, DRDS can analyze
your SQL in a more intelligent manner. We can provide mature solutions in merging of result sets,
optimization of distributed join and other key fields, and can help you solve most problems in the
distributed database scenarios.
Smooth Expansion
DRDS can help users implement the smooth online database expansion, so users can increase and
decrease databases as needed to implement elastic application of database clusters as required.
The "online" is the key of on-line database expansion, which means that users can directly add new
RDS nodes to a cluster without stopping the service system for cutover operation, thus achieving
seamless free expansion. The DRDS divides the entire expansion process into several phases,
including full migration, incremental synchronization, database switchover and other steps. The data
will be migrated in advance, and experience incremental parallel synchronization for a period, so we
can complete the final expansion and switchover of the database in a very short period (seconds),
and it will not affect your service.
Table Broadcasting
After some large service tables are split, there are always some tables with less data and original
information table with less updated information. These tables will always experience join operation
together with the sharded big tables. Such operation will result in distributed join query physically at
very low overall efficiency.
For such distributed join scenarios, we develop OETL special tools to help you execute short table
broadcasting. All the data (including incremental updating) in the original information table will be
broadcasted automatically to the machine of the long table. Hence, the original distributed query can
be changed into the stand-alone local query.
2
Distributed Relational Database Service/Product Introduction
Globally Unique ID
In a distributed environment, the original mysql sequence generation mechanism can not efficiently
generate the globally unique sequence. We refer to the sequence generation sample of oracle and
implement an efficient sequence generator based on mysql. Without single-point performance
bottle-neck, the generator has the property of great concurrent access and low latency.
The objective of the DRDS sequence function is to guarantee the global unique of data. Although it is
obtained by time sequence, it is not globally organized.
DRDS User Guide
Traversal Operation Of Full Table
The DRDS supports statistical summary of aggregate functions in full table scan. By default, the full
table scan is closed at present, and you can open the full table scan needed to be displayed through
configuration. The reason is that we think default closing is more controllable for you in terms of
performance. For specific configuration method, you can access our operation management platform
to find the corresponding table.
1) Where there is no sub-database and sub-table of the target table, the DRDS can support any
aggregate function, because factually the DRDS directly transmits the original SQL to the backend
MySQL for execution.
2) Non-full table scan: A SQL statement is sent to a single backend MySQL database directly for
execution after routing of the SQL statements by the DRDS.If the split key is = relation in the WHERE
condition, it is common. In the case of non-full table scan, any aggregate function can be supported
as well.
3) Full table scan: currently, the existing supported aggregate functions include COUNT, MAX, MIN
and SUM. In addition, syntaxes of LIKE, ORDER BY and LIMIT are also supported, while GROUP BY
syntax is not supported.
4) Concurrent full table scan: sometimes you may wish to directly dump data from all databases to
other place. We also provide methods, which allow you to recognize how many databases are under
current lower layer and also allow you to operate these bases by yourself.
Step 1: Get the total number of current tables in the database
mysql:> show topology from tddl_users;
3
Distributed Relational Database Service/Product Introduction
+------+--------------------------+---------------------+
| ID | GROUP_NAME | TABLE_NAME |
+------+--------------------------+---------------------+
| 0 | DRDS_00_RDS
| drds_users
|
| 1 | DRDS_01_RDS
| drds_users
|
+------+--------------------------+--------------------+
Step 2: Traverse single table for technology
For example: if I want to query the first table, then run
/!+TDDL({'type':'direct','dbid':'DRDS_00_RDS'})*/ select * from drds_users;
So it is available to run SQL in first base.
/!+TDDL({'type':'direct','dbid':'DRDS_01_RDS'})*/ select * from drds_users;
So it is available to run SQL in the first database.
Please note that the number of databases may change along with expansion and other situation. We
can not make sure that names of these GROUPS are never changed. Therefore please run show
topology from table statements firstly to acquire latest table topology.
DRDS hint support
Manually specify the split key of SQL statements.
Example 1: specify single field value, and then split key is id.
/!drds: $partitionOperand=('id'=['2']),$table='table'*/
SELECT * FROM table WHERE date > '2014/04/25';
Example 2: specify multiple field values, and then split key are id, type.
/!drds: $partitionOperand=(['id','type']=['2','a']),
$table='table'*/
SELECT * FROM table WHERE date > '2014/04/25'
Use parameter (?) to be defined as Hint parameter.
Example 1: specify single field value, and then split key is id.
4
Distributed Relational Database Service/Product Introduction
/!drds: $partitionOperand=('id'=[?]),$table='table'*/
SELECT * FROM table WHERE date > ?;
Example 2: specify multiple field values, and then split key are id, type.
/!drds: $partitionOperand=(['id','type']=[?,?]),
$table='table'*/
SELECT * FROM table WHERE date > ?
ORDER BY in DRDS
When both ORDER BY and LIMIT pages are present in a statement, DRDS will automatically order by
merging. The detail implementation method is as follows:
DRDS will keep one record for each database in the memory, and then complete multiple page
processing using order by merging. The advantage of this way is that the memory will not be broken.
But it is not a magic, so compute will not be very efficient. Therefore, if you need merging multiple
pages from multiple machines, and M in Limit M,N is not large, then you can use this method. In
accordance with our experience, if M> 10000, commonly the result will not be checked out.
5
Distributed Relational Database Service/Product Introduction
Shard-Database and Shard-Table
shard-database and shard-table is a significant concept in DRDS. DRDS horizontally splits databases
to each backend RDS database. Those databases are shard-database and corresponding tables are
shard-table. DRDS allows each shard-database to take charge of read/write operation of the data so
that the overall access pressure is efficiently dispersed. In the case of system expansion, horizontally
increasing shard-databases and migrating relevant data can improve the global capacity of DRDS
system.
Relevant concepts of shard-database and shard-table
Split Key
it is the field in table which choosen for shard. DRDS horizontally splits the database to each backend
RDS shard-database in accordance with the value of split key. In other words, data with the same key
value must be in same RDS database. Each shard-table in the DRDS can define its split key. The split
key may be single field and also be the combination of multiple fields.
Full Table Scan
Complicated SQL statements will be distributed to all databases for execution, and be compared and
merged in DRDS. Full table scan consumes performance considerably, so it should be avoided as
practical as you can in the service.
Single-Field sharding
1) INSERT/PERPLACE statement must include fields of shard-database and shard-table (split key). For
example:sharding field is id.
INSERT INTO table VALUES ('name1', 'value2')
Error
INSERT INTO table (id, name, value) VALUES (1, 'name1', 'value2')
Correct
2) If there is no split key in WHERE condition of SELECT/UPDATE/DELETE statement, run full table
scan.
Multi-Field sharding
1) If there is no split field in WHERE condition of SELECT/ UPDATE/DELETE statement or the split field
is not complete, full table scan can be run; and only when SQL includes all split keys, the route
6
Distributed Relational Database Service/Product Introduction
computing can be run.
For example: split key is id + date.
SELECT * FROM table WHERE id = 1 AND date > 3;
route computing
SELECT * FROM table WHERE id = 1;
need full table scan
2) If all split fields are in WHERE condition of SELECT/ UPDATE/DELETE statement, -- the logical
relationship among split fields must be AND not OR. For the combination of conditions for the same
split field, support either AND or OR.
For example: split key is id + date.
SELECT * FROM table WHERE id = 1 AND date > '2014/1/30';
AND -- correct combination. 。
SELECT * FROM table WHERE id = 1 OR date > '2014/1/30';
OR -- wrong combination
SELECT * FROM table WHERE id > 1 AND date > '2014/1/30' AND date < '2014/3/1';
AND -- correct combination
SELECT * FROM table WHERE id > 1 AND date < '2014/2/1' OR date > '2014/2/28';
AND -- correct combination. The condition combination of date field is OR
3) For the same split field, only 2 conditions can be connected by AND, while the number of the
conditions connected by OR is not limited.
For example: split key is date.
SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1';
Correct, the number of conditions connected by AND in date field is 2
SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1' AND date < '2014/2/28';
7
Distributed Relational Database Service/Product Introduction
Wrong, the number of conditions connected by AND in date field is 3.
4) The condition of the same split field can contain more than one values, while each value only
corresponds to one comparison.
For example: split key is date.
SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1';
Correct, the values of two conditions in date field are '2014/1/30' and '2014/3/1'.
SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/1/30'
Wrong, the condition value '2014/1/30' in date field corresponds to greater than and less than relationship.
8
Download