Distributed Relational Database Service Product Introduction Distributed Relational Database Service/Product Introduction Product Introduction DRDS Summary DRDS product summary As the most common software in present service, stand-alone database can easily meet users' requirements for relational query. However, for most applications, stand-alone database will hit the performance ceiling of single machines at last. The stand-alone database will suffer from various limits in TPS/QPS/memory capacity/disk capacity and other system sources. The primary objective of the DRDS is to help you resolve different problems in this field. It has two main functions: read/write splitting and database sharding. Read/write splitting allows you to write data in a machine and read data on several machines, which can solve the system bottle-neck at extreme low cost for the read more and write less applications. Database sharding is an ultimate solution of solving the storage bottle-neck of the system. In fact, dividing and ruling is the very simple core concept of database sharding. Various bottle-necks of services can be solved at extreme low cost by dispersing data to several machines and assuring that requests can be distributed to these machines averagely. Of course, there is cost in sharding. The most obvious cost is that the distributed database will limit scenarios of several original stand-alone databases. For such operations, the delay or efficiency in the distributed environment is very low. It is not available for performance even though they can be implemented. For such problems, the Alibaba middleware team has helped nearly 300 business application systems implement database sharding. Therefore, we have accumulated a lot of experiences and mature products in this field, and hope that such products can give you the same service, so that you can not worry about the performance of your database. Horizontal Data Sharding The DRDS helps you implement sub-database and sub-table so that the SQL that can be executed 1 Distributed Relational Database Service/Product Introduction only on single node can be transformed into the SQL executed on multiple notes, which is similar as the individual database experience. The dynamic horizontal expansion of data storage is implemented by the support system, and at present, the DRDS has been internally used by hundreds of application systems for more than 5 years efficiently, safely and stably. However, the distributed database and stand-alone database have certain differences in application habits in itself, such as low-efficiency distributed transaction and distributed join. For these problems, DRDS chooses performance and stability as priority, then the compatibility of the software. Compared with the commercially available open-source database sharding tools, DRDS can analyze your SQL in a more intelligent manner. We can provide mature solutions in merging of result sets, optimization of distributed join and other key fields, and can help you solve most problems in the distributed database scenarios. Smooth Expansion DRDS can help users implement the smooth online database expansion, so users can increase and decrease databases as needed to implement elastic application of database clusters as required. The "online" is the key of on-line database expansion, which means that users can directly add new RDS nodes to a cluster without stopping the service system for cutover operation, thus achieving seamless free expansion. The DRDS divides the entire expansion process into several phases, including full migration, incremental synchronization, database switchover and other steps. The data will be migrated in advance, and experience incremental parallel synchronization for a period, so we can complete the final expansion and switchover of the database in a very short period (seconds), and it will not affect your service. Table Broadcasting After some large service tables are split, there are always some tables with less data and original information table with less updated information. These tables will always experience join operation together with the sharded big tables. Such operation will result in distributed join query physically at very low overall efficiency. For such distributed join scenarios, we develop OETL special tools to help you execute short table broadcasting. All the data (including incremental updating) in the original information table will be broadcasted automatically to the machine of the long table. Hence, the original distributed query can be changed into the stand-alone local query. 2 Distributed Relational Database Service/Product Introduction Globally Unique ID In a distributed environment, the original mysql sequence generation mechanism can not efficiently generate the globally unique sequence. We refer to the sequence generation sample of oracle and implement an efficient sequence generator based on mysql. Without single-point performance bottle-neck, the generator has the property of great concurrent access and low latency. The objective of the DRDS sequence function is to guarantee the global unique of data. Although it is obtained by time sequence, it is not globally organized. DRDS User Guide Traversal Operation Of Full Table The DRDS supports statistical summary of aggregate functions in full table scan. By default, the full table scan is closed at present, and you can open the full table scan needed to be displayed through configuration. The reason is that we think default closing is more controllable for you in terms of performance. For specific configuration method, you can access our operation management platform to find the corresponding table. 1) Where there is no sub-database and sub-table of the target table, the DRDS can support any aggregate function, because factually the DRDS directly transmits the original SQL to the backend MySQL for execution. 2) Non-full table scan: A SQL statement is sent to a single backend MySQL database directly for execution after routing of the SQL statements by the DRDS.If the split key is = relation in the WHERE condition, it is common. In the case of non-full table scan, any aggregate function can be supported as well. 3) Full table scan: currently, the existing supported aggregate functions include COUNT, MAX, MIN and SUM. In addition, syntaxes of LIKE, ORDER BY and LIMIT are also supported, while GROUP BY syntax is not supported. 4) Concurrent full table scan: sometimes you may wish to directly dump data from all databases to other place. We also provide methods, which allow you to recognize how many databases are under current lower layer and also allow you to operate these bases by yourself. Step 1: Get the total number of current tables in the database mysql:> show topology from tddl_users; 3 Distributed Relational Database Service/Product Introduction +------+--------------------------+---------------------+ | ID | GROUP_NAME | TABLE_NAME | +------+--------------------------+---------------------+ | 0 | DRDS_00_RDS | drds_users | | 1 | DRDS_01_RDS | drds_users | +------+--------------------------+--------------------+ Step 2: Traverse single table for technology For example: if I want to query the first table, then run /!+TDDL({'type':'direct','dbid':'DRDS_00_RDS'})*/ select * from drds_users; So it is available to run SQL in first base. /!+TDDL({'type':'direct','dbid':'DRDS_01_RDS'})*/ select * from drds_users; So it is available to run SQL in the first database. Please note that the number of databases may change along with expansion and other situation. We can not make sure that names of these GROUPS are never changed. Therefore please run show topology from table statements firstly to acquire latest table topology. DRDS hint support Manually specify the split key of SQL statements. Example 1: specify single field value, and then split key is id. /!drds: $partitionOperand=('id'=['2']),$table='table'*/ SELECT * FROM table WHERE date > '2014/04/25'; Example 2: specify multiple field values, and then split key are id, type. /!drds: $partitionOperand=(['id','type']=['2','a']), $table='table'*/ SELECT * FROM table WHERE date > '2014/04/25' Use parameter (?) to be defined as Hint parameter. Example 1: specify single field value, and then split key is id. 4 Distributed Relational Database Service/Product Introduction /!drds: $partitionOperand=('id'=[?]),$table='table'*/ SELECT * FROM table WHERE date > ?; Example 2: specify multiple field values, and then split key are id, type. /!drds: $partitionOperand=(['id','type']=[?,?]), $table='table'*/ SELECT * FROM table WHERE date > ? ORDER BY in DRDS When both ORDER BY and LIMIT pages are present in a statement, DRDS will automatically order by merging. The detail implementation method is as follows: DRDS will keep one record for each database in the memory, and then complete multiple page processing using order by merging. The advantage of this way is that the memory will not be broken. But it is not a magic, so compute will not be very efficient. Therefore, if you need merging multiple pages from multiple machines, and M in Limit M,N is not large, then you can use this method. In accordance with our experience, if M> 10000, commonly the result will not be checked out. 5 Distributed Relational Database Service/Product Introduction Shard-Database and Shard-Table shard-database and shard-table is a significant concept in DRDS. DRDS horizontally splits databases to each backend RDS database. Those databases are shard-database and corresponding tables are shard-table. DRDS allows each shard-database to take charge of read/write operation of the data so that the overall access pressure is efficiently dispersed. In the case of system expansion, horizontally increasing shard-databases and migrating relevant data can improve the global capacity of DRDS system. Relevant concepts of shard-database and shard-table Split Key it is the field in table which choosen for shard. DRDS horizontally splits the database to each backend RDS shard-database in accordance with the value of split key. In other words, data with the same key value must be in same RDS database. Each shard-table in the DRDS can define its split key. The split key may be single field and also be the combination of multiple fields. Full Table Scan Complicated SQL statements will be distributed to all databases for execution, and be compared and merged in DRDS. Full table scan consumes performance considerably, so it should be avoided as practical as you can in the service. Single-Field sharding 1) INSERT/PERPLACE statement must include fields of shard-database and shard-table (split key). For example:sharding field is id. INSERT INTO table VALUES ('name1', 'value2') Error INSERT INTO table (id, name, value) VALUES (1, 'name1', 'value2') Correct 2) If there is no split key in WHERE condition of SELECT/UPDATE/DELETE statement, run full table scan. Multi-Field sharding 1) If there is no split field in WHERE condition of SELECT/ UPDATE/DELETE statement or the split field is not complete, full table scan can be run; and only when SQL includes all split keys, the route 6 Distributed Relational Database Service/Product Introduction computing can be run. For example: split key is id + date. SELECT * FROM table WHERE id = 1 AND date > 3; route computing SELECT * FROM table WHERE id = 1; need full table scan 2) If all split fields are in WHERE condition of SELECT/ UPDATE/DELETE statement, -- the logical relationship among split fields must be AND not OR. For the combination of conditions for the same split field, support either AND or OR. For example: split key is id + date. SELECT * FROM table WHERE id = 1 AND date > '2014/1/30'; AND -- correct combination. 。 SELECT * FROM table WHERE id = 1 OR date > '2014/1/30'; OR -- wrong combination SELECT * FROM table WHERE id > 1 AND date > '2014/1/30' AND date < '2014/3/1'; AND -- correct combination SELECT * FROM table WHERE id > 1 AND date < '2014/2/1' OR date > '2014/2/28'; AND -- correct combination. The condition combination of date field is OR 3) For the same split field, only 2 conditions can be connected by AND, while the number of the conditions connected by OR is not limited. For example: split key is date. SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1'; Correct, the number of conditions connected by AND in date field is 2 SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1' AND date < '2014/2/28'; 7 Distributed Relational Database Service/Product Introduction Wrong, the number of conditions connected by AND in date field is 3. 4) The condition of the same split field can contain more than one values, while each value only corresponds to one comparison. For example: split key is date. SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/3/1'; Correct, the values of two conditions in date field are '2014/1/30' and '2014/3/1'. SELECT * FROM table WHERE date > '2014/1/30' AND date < '2014/1/30' Wrong, the condition value '2014/1/30' in date field corresponds to greater than and less than relationship. 8