Postgres-XC/XL Scale-out Approach in PostgreSQL

advertisement
Postgres Conference
HangZhou, China
Postgres-XC/XL
Scale-out Approach in PostgreSQL
July 25th, 2015
NTT DATA INTELLILINK Corporation
Koichi Suzuki
Copyright © 2015 NTT DATA INTELLILINK Corporation
Introduction
Copyright © 2015 NTT DATA INTELLILINK Corporation
2
About the Speaker
●
●
Fellow at NTT DATA Intellilink Corporation
Principal, Technology Professionals at NTT DATA Group
In Charge Of
●
●
●
General Database Technology
Database in huge data warehouse and its design
PostgreSQL and its cluster technology
In The Past
●
●
●
●
Character Set Standard (Extended Unix Code, Unicode, etc)
Heisei-font development (Technical Committee)
Oracle Porting
Object-Relational Database
Copyright © 2015 NTT DATA INTELLILINK Corporation
3
Motivation
●
●
●
●
Growing Database Workload both in OLTP (OnLine Transaction Processing)
and OLAP (OnLine Analytical Processing) applications.
Shared-Nothing Approach
●
Performance with commodity hardware/software
Extension to existing PostgreSQL
Transparent API
●
Internal API could be different
●
●
Transparent libpq Interface
No significant restriction to transaction ACID properties and SQL
language.
Copyright © 2015 NTT DATA INTELLILINK Corporation
4
Scale-out approach
●
Distribution/Replication of table rows among different database “nodes”
●
Parallelism
●
Local join operation
●
SQL planning for row distribution/replication
●
Consistent and synchronous transaction management among “nodes”
●
Performance with commodity hardware/software
Copyright © 2015 NTT DATA INTELLILINK Corporation
5
Read Scale-out in PostgreSQL Master/Slave
Read/Write
Transactions
Possible time delay
Read-only Transactions
Master
Slave
WAL (or Redo Log)
Copyright © 2015 NTT DATA INTELLILINK Corporation
6
Scaling Out in Postgres XL/XC
Read/Write Transactions
No Delay in Update Visibility
Local
Disk
Local
Disk
Local
Disk
Local
Disk
Backend Transaction Synchronization
Copyright © 2015 NTT DATA INTELLILINK Corporation
7
OLTP Workload Scalability and Table Design
Copyright © 2015 NTT DATA INTELLILINK Corporation
8
DBT-1 Workload Scalability
DBT-1 (Rev)
Copyright © 2015 NTT DATA INTELLILINK Corporation
9
Table Design in DBT-1 Benchmark
CUSTOMER
ORDERS
ORDER_LINE
ITEM
SHOPPING_CART
C_ID
C_UNAME
C_PASSWD
C_FNAME
C_LNAME
C_ADDR_ID
C_PHONE
C_EMAIL
C_SINCE
C_LAST_VISIT
C_LOGIN
C_EXPIRATION
C_DISCOUNT
C_BALANCE
C_YTD_PMT
C_BIRTHDATE
C_DATA
O_ID
O_C_ID
O_DATE
O_SUB_TOTAL
O_TAX
O_TOTAL
O_SHIP_TYPE
O_BILL_ADDR_ID
O_SHIP_ADDR_ID
O_STATUS
OL_ID
OL_O_ID
OL_I_ID
OL_QTY
OL_DISCOUNT
OL_COMMENTS
OL_C_ID
I_ID
I_TITLE
I_A_ID
I_PUB_DATE
I_PUBLISHER
I_SUBJECT
I_DESC
I_RELATED1
I_RELATED2
I_RELATED3
I_RELATED4
I_RELATED5
I_THUMBNAIL
I_IMAGE
I_SRP
I_COST
I_AVAIL
I_ISBN
I_PAGE
I_BACKING
I_DIMENASIONS
SC_ID
SC_C_ID
SC_DATE
SC_SUB_TOTAL
SC_TAX
SC_SHIPPING_COST
SC_TOTAL
SC_C_FNAME
SC_C_LNAME
SC_C>DISCOUNT
CC_XACTS
CX_I_ID
CX_TYPE
CX_NUM
CX_NAME
CX_EXPIRY
CX_AUTH_ID
CX_XACT_AMT
CX_XACT_DATE
CX_CO_ID
CX_C_ID
Distributed with
Customer ID
Distributed with
Shopping Cart ID
SHOPPING_CART_LINE
ADDRESS
SCL_SC_ID
SCL_I_ID
SCL_QTY
SCL_COST
SCL_SRP
SCL_TITLE
SCL_BACKING
SCL_C_ID
STOCK
ADDR_ID
ADDR_STREET1
ADDR_STREET2
ADDR_CITY
ADDR_STATE
ADDR_ZIP
ADDR_CO_ID
ADDR_C_ID
ST_I_ID
ST_STOCK
Replicated
COUNTRY
AUTHOR
CO_ID
CO_NAME
CO_EXCHANGE
CO_CURRENCY
OL_ID
OL_O_ID
OL_I_ID
OL_QTY
OL_DISCOUNT
OL_COMMENTS
OL_C_ID
Copyright © 2015 NTT DATA INTELLILINK Corporation
Distributed with
ItemID
10
MPP Performance – DBT-3 (TPC-H)
By courtesy of Mason Sharp, Postgres-XL leader
Copyright © 2015 NTT DATA INTELLILINK Corporation
11
Scale Out Approach (1): Table Distribution/Replication
Categorize tables into two groups:
Large and frequently-updated tables
→ Distribute rows among nodes (Distributed Tables)
→ Based on a column value (distribution key)
→ Hash, modulo or round-robin
→ Parallelism among transactions (OLTP) or in SQL processing (OLAP)
Smaller and stable tables
→ Replicate among nodes (Replicated Tables)
→ Join Pushdown
Avoid joins between Distributed Tables with join keys different from distribution
key as possible.
Copyright © 2015 NTT DATA INTELLILINK Corporation
12
Scale Out Table Design in DBT-1
Three distribution keys:
●
Customer ID
●
Shopping Cart ID
●
Item ID
Some transactions involve joins across distributed tables with non-distribution
join keys.
Copyright © 2015 NTT DATA INTELLILINK Corporation
13
Some More in XL/XC Node Configuration
Copyright © 2015 NTT DATA INTELLILINK Corporation
14
Node Configuration: Two-Tier Approach
Coordinator:
●
Maintains global catalog information
●
Build global SQL plan and SQL statements for datanodes
●
Interact with datanode to execute local SQL statements and accumulate
the result
Datanode
●
●
Maintains actual data (local data)
Run local SQL statement from Coordinator
(In XL, datanode may ask other datanodes for their local data)
Copyright © 2015 NTT DATA INTELLILINK Corporation
15
Coordinator and Datanode
Read/Write Transactions
Coordinator
Datanode
Copyright © 2015 NTT DATA INTELLILINK Corporation
16
Node Configuration: Yet Another Node: GTM
GTM: Global Transaction Manager
Synchronizes each node's transaction status
Copyright © 2015 NTT DATA INTELLILINK Corporation
17
Why GTM? Two-Phase Commit Protocol doesn't work?
Two-Phase Commit Protocol Does:
●
Maintain database consistency in transactions updating more than one
node.
Two-Phase Commit Protocol Doesn't:
●
Maintain Atomic Visibility of Updates to other transactions (next slide)
Copyright © 2015 NTT DATA INTELLILINK Corporation
18
Atomic Visibility and GTM
Node B
Node A
TXN 1
Updates A
and B
Inconsistent
Read!
Prepares A
and B
TXN 2
Reads B and
gets old value
Commits A
and B
Reads A and
gets new value
GTM monitors TXN
activity and make
new value available
at this timing.
Copyright © 2015 NTT DATA INTELLILINK Corporation
19
Final Configuration: GTM, Coordinator and Datanode
Read/Write Transactions
Coordinator
GTM
Datanode
Copyright © 2015 NTT DATA INTELLILINK Corporation
20
Configuration in Practice
Just like configuring many database servers to talk each other
●
Many pitfalls
●
Pgxc_ctl provides simpler way to configure the whole cluster
●
●
–
Provide only needed parameters
Pgxc_ctl will do the rest to issue needed commands and SQL
statements.
Visit
http://sourceforge.net/p/postgres-xc/xc-wiki/PGOpen2013_Postgres_Open_2013/
Copyright © 2015 NTT DATA INTELLILINK Corporation
21
Scalability in OLTP Workloads
Copyright © 2015 NTT DATA INTELLILINK Corporation
22
OLTP Workload Characteristics
Number of Transactions: Many
Number of Involved Table Rows: Small
Locality of Row Allocation: High
Update Frequency: High
Copyright © 2015 NTT DATA INTELLILINK Corporation
23
Scaling Out OLTP Workload
Read/Write Transactions
Run Transactions in Parallel
Coordinator
GTM
High workload
Datanode
Copyright © 2015 NTT DATA INTELLILINK Corporation
24
Scalability in OLAP (Analytic) Workloads
Copyright © 2015 NTT DATA INTELLILINK Corporation
25
OLAP Workload Characteristics
Number of Transactions: Small
Number of Involved Table Rows: Huge
Locality of Row Allocation: Low
Update Frequency: Low
Copyright © 2015 NTT DATA INTELLILINK Corporation
26
Scaling Out OLAP Workload
SQL
Coordinator
Top level
aggregation
May need less
coordinators
GTM
Low workload
Datanode
Run Small Local SQLs for each
Datanode in Parallel
Copyright © 2015 NTT DATA INTELLILINK Corporation
27
Join Offloading
Copyright © 2015 NTT DATA INTELLILINK Corporation
28
Join Offloading: When row allocation is available
●
Replicated Table and Partitioned Table
– Can determine which datanode to go from WHERE clause
Copyright © 2015 NTT DATA INTELLILINK Corporation
29
Join Offloading: When row allocation is available
●
Replicated Table and Partitioned Table
–
When the coordinator cannot determine which datanode to go from WHERE clause
Copyright © 2015 NTT DATA INTELLILINK Corporation
30
Parallel Aggregation
Copyright © 2015 NTT DATA INTELLILINK Corporation
31
Aggregate Functions in PostgreSQL
Finalize Function
Copyright © 2015 NTT DATA INTELLILINK Corporation
State Transition
Function
32
Aggregate Functions in Postgres-XC/XL
Finalize Function
AVG ← (Sum, Count)
(Sum, Count)
Collector Function
State Transition
State Transition
State Transition
Function
Function
Function
Datanode
Coordinator
Similar to Map Reduce!
Copyright © 2015 NTT DATA INTELLILINK Corporation
33
Specific statements
●
CREATE BARRIER
– Synchronize all node's WAL for restoration.
●
CREATE|ALTER|DROP NODE
– Maintenance of cluster node
●
Caution! – not automatically propagated. Issue to each coordinator.
●
CREATE/DROP NODE GROUP
– Alias for group of node
●
EXECUTE DIRECT
–
Run SQL locally
–
Read operation only
●
If you are superuser, turn xc_maitenance_mode to on by set
statement to allow write operations.
●
You must be responsible to any inconsistencies and side effects!
Copyright © 2015 NTT DATA INTELLILINK Corporation
34
Specific catalogs
●
pgxc_class
– Definition of table distribution
●
pgxc_node
– Postgres-XC node information
●
pgxc_group
–
Node group
Copyright © 2015 NTT DATA INTELLILINK Corporation
35
Specific functions
●
pgxc_version()
– Show version
●
pgxc_pool_check()
– Check if connection pooler is consistent with pgxc_node catalog.
●
pgxc_pool_reload
–
●
Reload cached connection data and synchronize pooler connection
information with pgxc_node.
pgxc_lock_for_backup
–
Only for adding new nodes.
–
Locks DDL execution to make catalog stable for backup and copy to new
node.
Copyright © 2015 NTT DATA INTELLILINK Corporation
36
Specific statements, catalogues, functions and
parameters
http://postgres-x2.github.io/reference/1.2/html/sql-commands.html
for details
Copyright © 2015 NTT DATA INTELLILINK Corporation
37
Specific parameters (planner parameters not included)
●
gtm_backup_barrier (bool)
– Enable CREATE BARRIER statement.
●
persistent_datanode_connections (bool)
– If “true”, session never releases connections.
●
xc_maintenance_mode
●
–
Enable write operation in “EXECUTE DIRECT” statement.
–
Only allowed to root users.
min_pool_size
–
●
max_pool_size
–
●
Max pooled connection size.
pooler_port
–
●
Threashold for pooler to create new connection.
Port number for the pooler (pgxc_ctl takes care of it)
gtm_port
–
GTM port number (pgxc_ctl takes care of it)
Copyright © 2015 NTT DATA INTELLILINK Corporation
38
Specific parameters (cont.)
●
●
●
●
max_datanodes
max_coordinators
pgxcnode_cancel_delay
–
Timeout to wait cancel operation in millisconds.
–
Mainly for automatic test.
gtm_host
–
●
GTM host name/IP address. Pgxc_ctl takes care of this.
pgxc_node_name
–
Node name of the self. Pgxc_ctl takes care of this.
Copyright © 2015 NTT DATA INTELLILINK Corporation
39
Community status and future
Copyright © 2015 NTT DATA INTELLILINK Corporation
40
Specific statements
●
CREATE/DROP NODE GROUP
– Alias for group of node
●
Unified again?
Copyright © 2015 NTT DATA INTELLILINK Corporation
41
XC and XL community
●
●
Postgres-XC is the original community
– Based upon PostgreSQL 9.3
–
Tested more for OLPT workload
–
Now community activity as Postgres-X2
–
Stabilization
●
Participated by many Chinese engineers
●
Next minor release are planned in this August
Postgres-XL was became separate community for more product-oriented and better
stability
– Based upon PostgreSQL 9.2
–
Shares most of XC code base
–
Tested more for OLAP workload
●
●
Direct data capture between datanodes
–
Provide many fixes. Most of them apply to XL as well
–
Just finished merge with Postgres 9.5 alfa
Unified again?
Copyright © 2015 NTT DATA INTELLILINK Corporation
42
Product status
●
Source code inherits all the PostgreSQL repository (at some point)
●
Fundamental features are all available
●
–
Global transaction management
–
SQL statements
–
Utilities
Further challenges
–
Subtransaction (needed for full function support)
–
Catching up PostgreSQL (needed?)
–
Copyright © 2015 NTT DATA INTELLILINK Corporation
43
XC and XL community
●
●
Both communities need much more resource to move forward
–
Developer
–
Tester
–
Real workload
Now several Chinese farms are working together.
–
Much more active members are welcome!
Copyright © 2015 NTT DATA INTELLILINK Corporation
44
XC and XL community
●
●
Both communities need much more resource to move forward
–
Developer
–
Tester
–
Real workload
Now several Chinese farms are working together.
–
Much more active members are welcome!
Copyright © 2015 NTT DATA INTELLILINK Corporation
45
XC and XL community sites
Postgres-XC
https://github.com/postgres-x2
https://postgres-x2.github.io
https://groups.google.com/forum/#!forum/postgres-x2-dev
https://groups.google.com/forum/#!forum/postgres-x2-general
koichi.dbms@gmail.com
galylee@gmail.com
Postgres-XL
http://www.postgres-xl.org/
Copyright © 2015 NTT DATA INTELLILINK Corporation
46
Configuring Postgres-XC
Copyright © 2015 NTT DATA INTELLILINK Corporation
47
Pgxc_ctl
●
●
Postgres-XC contrib module
Postgers-XC configuration and operation tool
– A kind of Postres-XC shell
–
Builtin commands
–
Can invoke any bash commands
●
●
●
●
●
Does not expand $(variable).
Simple configuration
Avoid many pitfalls in manual configuration and operation
Bash-based configuration file
You can write your favorite bash-script for your configuration
Copyright © 2015 NTT DATA INTELLILINK Corporation
48
Pgxc_ctl builtin commands (major ones)
●
prepare
– Creates configuration file template
●
deploy
– Deploys postgres-xc binaries to necessary nodes
●
Init [all]
–
Initialize postgres-xc cluster
●
Run initdb and initgtm at necessary nodes
●
Do additional configuration
●
Initialize node configuration
●
Start/stop
– Cluster and node start/stop
●
Clean
– Cleanup existing resource
●
Monitor
– See what node is running
Copyright © 2015 NTT DATA INTELLILINK Corporation
49
Pgxc_ctl builtin commands (major ones)
●
Createdb
– Similar to createdb but select one coordinator to do it.
●
Psql
–
●
Add
–
●
Similar to psql but select one coordinator or specify coordinator
name to connect to.
Add gtm_proxy, coordinator and datanode (master and slave)
Remove
–
Remove gtm_proxy, coordinator and datanode (master and
slave)
Copyright © 2015 NTT DATA INTELLILINK Corporation
50
Demonstration
Copyright © 2015 NTT DATA INTELLILINK Corporation
51
Copyright © 2015 NTT DATA INTELLILINK Corporation
Download