Query Processing and Optimization in the Distributed

advertisement
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Query Processing and Optimization in the Distributed Architecture
using NoSql: a Banking System
1
Piyush Gupta, 2Chandelkar Kashinath K.
Dept. of Computer Science & Engineering, Birla Institute of Technology, Mesra, Jaipur Campus
Email: 1piyush.bitjpr@gmail.com, 2Kashinath45@gmail.com
Abstract -The banking sector is one of the best examples
who create Big Data with increasing transactions per
second across the globe. Each bank is administered by the
DBA (Database Administrator) as a technical in charge.
DBA makes an arrangement in the branch to divert query
from authorized user to the appropriate data repository.
The SQL core Engine accepts input stream, processes and
optimized to increase efficiency across the network. QO
(Query Optimiser) and QE (Query Executor) is an
important module that helps in the process. Due to the
existing limitations of the system, a cloud based distributed
architecture comprising of IaaS (Infrastructure as a
Service), PaaS (Platform as a Service) and SaaS (Software
as a Service) is used with NoSql. Hadoop and MangoDB
are an example that still retains ACID (Atomicity,
Consistency, Isolation, Durability) features of cloud
environment.
Keywords- DBA, NoSql, QO, QE, ACID, IaaS, PaaS, SaaS,
MangoDB, Hadoop.
I. INTRODUCTION
the repository, it is filtered and cleaned before delivery
in specified format. The data repository is a collection of
relational database having tables and their relations that
result into problem specified is section 4.
II. PROBLEM
The banking system carries financial information across
the channel having approximately thousand transactions
per minutes per branch. Each session has data stream is
diverted to main server. The server denies the request if
it has insufficient amount of both physical and logical
resources like RAM and Buffer Size. The increasing
structured and unstructured data lead to a new concept
called “Big Data” [3]. This huge chunk of data in
network consumes resources and cost if not utilized for
decision making.
Given system is said to be efficient if it scales with
demand [4]. In case of banking, query processing and
optimization is a challenge with increasing requests
from dynamic nodes for information retrieval on a voice
based system.
III. PROPOSED SOLUTION
Fig-1: Banking System
Considering the banking system as shown in above Fig1, each branch has its own servers connected to different
users with deployed policies managed by system
administrators [1] [10]. Each authorized bank employee
communicates to the server and data repository as per
the need of a transaction. Every time the query is sent to
server in need of information or verification, the data
stream has to pass through the server core engine that
optimizes and executes the query as per the user
requirement [11]. Once the information is collected from
Query processing and optimization shall be made
efficient by using 3 layers. At layer first all physical and
logical components are identified. The data input stream
in traditional system is given as input to the SQL core
engine [11] as shown in Fig-2. This is one of the
identified areas for query processing and optimization
that needs to explore to enhance.
The DBA (Database Administrator) or an end user who
provides query as an input is in layer second. Minor
changes in the query input shall save large amount
resources in the long run.
Instead of using traditional database [5], using instances
of NoSql [1] [3] database system in distributed
architecture resolves the issues if left any in layer third.
________________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014
25
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Fig-2: SQL Core Engine Query processing
Layer 1
The organization or banking sector that needs time to
migrate to new system needs to focus on the SQL core
engine for the time being. The core engine consists of a
traditional storage engine sandwiched between SQL
operating system and LPE (Language Processing
Execution). It is supported by QE (Query Optimiser) and
QE (Query Executor) in the process. Query optimisers
being the responsible module for efficiency, the
following components are identified.





Rewriter – Applies transformation to the given
query and produces equivalent query that is more
efficient.
Planner – it is a main module of ordering stage. It
examines all possible executions for each query
produced in the previous stage and selects the
overall cheapest one to generate answers of query.
The searching strategy is employed. The
determined cost is the running time of the
optimizer itself, which should be as low as
possible.
Algebraic space – it determines the execution
order. A series of action produces the same query
answer which differs in performance.
Method Structure- it is used to join available
methods like loop, hash join, universal join etc.
Cost Model- used to estimate the cost of the
execution plan for every distinct step. Formula to

estimate the cost is dependent on buffer
management, Disk, CPU overlaps, sequential and
random I/O and size of the buffer pool.
Size distribution Estimator- specifies how the size
(frequency distribution and attributes value) of
database relations and (sub) query results are
estimated. The DBA has to concentrate on
minimizing the optimizer running time by
increasing the size of the buffer pool.
Layer 2
The query optimization is possible by writing efficient
queries as shown in table-1.
General Tip
Specific Column
Names instead of
* in SELECT
Query
Try to avoid
HAVING Clause
in
Select
statements
Write query
SELECT col_1,
col_2,
col_3,
FROM
table_name;
SELECT Col_1,
count (Col_1)
FROM
table_name
WHERE col_1!
=
testvalue1‘AND
col_1!
=
=testvalue1‘
GROUP
BY
Instead of
SELECT
FROM
table_name;
*
SELECT Col_1,
count
(Col_1)
FROM
table_name
GROUP
BY
Col_1 HAVING
Col_1!
=
‗testvalue1‘
AND Col_1! =
‗testvalue2‘;
________________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014
26
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
We should try to
carefully
use
conditions
in
WHERE clause.
We should try to
use
NONColumn
expression
on
one side of the
SQL query as it
will be processed
before any other
clause.
col_1;
SELECT
id,
col1,
col2
FROM
table
WHERE col2 >
10;
SELECT
id,
Col1,
Col2
FROM
table
WHERE Col2 <
25000;
SELECT
id,
col1,
col2
FROM
table
WHERE col2!=
10;
SELECT
id,
Col1,
Col2
FROM
Table
WHERE Col2 +
10000 < 35000;
Cloud computing [1] [2] [4] is an upcoming technology
that allows migration of software using virtualization.
Cloud support IaaS (Infrastructure as a Service), PaaS
(Platform as a Service) and SaaS (Software as a
Service). The existing software shall be migrated in
cloud environment which is more secure and efficient
than traditional systems. The instance of software is a
virtual image placed inside the virtual box or VMware
that is managed by the hypervisor. Public, private or
hybrid cloud supports running of multiple virtual
instances as software. Since the service is pay per use,
the end user does not have to invest in physical
resources.
For a banking sector, the available system shall be
migrated using NoSql instance like Hadoop, MangoDB
or similar software as a service. The following features
about the NoSql instance.
Table-1: Efficient Query Writing
Layer 3
IV. VALIDATION OF PROPOSED SOLUTION
Fig 3 : Hadoop Instance running in the cloud
Fig-3 shows an instance of Hadoop running in Bitnami
cloud [16], which is one of the examples of NoSql. After
migrating Hadoop in the cloud, a set of queries was
given as an input. The above figure discloses an output
for a single query Running multiple Hadoop instances is
also possible. Running NoSql not only increa efficiency
and security in the banking sector, it becomes more
economical supporting ACID (Atomicity, Consistency,
Isolation, Durability) factors.






Does not use SQL as a query language.
Cloud friendly
Load balancing is by distributing itself over lots
of ordinary chips and Intel based servers.
It is not a relational database.
Economic
Supports key value store, document store, big
table and graph dat

Uses an array that reduces the need of joining.
V. CONCLUSION
Query processing and optimization issues for banking
sector are dealt with using NoSql, due to the limitations
of existing database systems. Query optimiser being the
key module in the existing system is elaborated to
enhance efficiency. A 3 layered solution having physical
and logical resources in first, human query input as
second and using instance of NoSql Software in
distributed architecture shall definitely overcome
existing banking limitations.
REFERENCES
[1]
Rahul Prasad Kanu , Shabeera T P , S D Madhu
Kumar (2014) Dynamic Cluster Configuration
Algorithm in MapReduce Cloud, International
________________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014
27
International Journal of Advanced Computer Engineering and Communication Technology (IJACECT)
________________________________________________________________________________________________
Journal of Computer Science and Information
Technologies, Vol. 5 (3), 2014, 4028-4033.
[9]
Gautam
Vemuganti
(2013)
Metadata
Management in Big Data, Infosys lab Briefing.
[2]
Mr. Kulkarni, N. N., Dr. Pawar V. P., Dr. K.K
Deshmukh (2014) Evaluation of Information
Retrieval in Cloud computing based services,
Asian Journal of Management Sciences 02 (03
(Special Issue))
[10]
N. Samatha, K. Vijay Chandu, P. Raja Sekhar
Reddy (2012) Query Optimization Issues for
Data Retrieval in Cloud Computing, International
Journal Of Computational Engineering Research
(ijceronline.com) Vol. 2 Issue. 5.
[3]
Brian Hellig, Stephen Turner, rich color, long
Zheng-2014- beyond map educe: the next
generation of big data analytics HAMR.Eti.com.
[11]
[4]
Ismail Hmeidi, Maryan Yatim, Ala’ Ibrahim, Mai
Abujazouh, 2014 - Survey of Cloud Computing,
Web Services for Healthcare Information
Retrieval Systems, International conference on
Computing
Technology and
Information
Management, Dubai, UAE.
Navita Kumari (2012) SQL Server Query
Optimization Techniques - Tips for Writing
Efficient and Faster Queries, International
Journal of Scientific and Research Publications,
Volume 2, Issue 6, June 2012.
[12]
Yannis E. Ioannidis (2012) Query Optimization,
yannis@cs.wisc.edu
[13]
Pawel Jurczyk and Li Xiong (2009) Dynamic
Query Processing for P2P Data Services in the
Cloud, Springer-Verlag Berlin Heidelberg.
[14]
William Hersh -2008 Future perspectives
Ubiquitous but unfinished: grand challenges for
information retrieval, Department of Medical
Informatics and Clinical Epidemiology, Oregon
Health and Science University, Portland, Oregon,
USA.
[15]
Ravishankar Ramamurthy David J. DeWitt
(2005) Buffer Pool Aware Query Optimization,
Department of Computer Sciences, University of
Wisconsin- Madison, USA.
[16]
www.Bitnami.com
[5]
Oracle.com (2014) Query
optimization in DBMS.
[6]
Dr. Sanjay Mishra, Dr. Arun Tiwari 2013- A
Novel Technique for Information Retrieval Based
on Cloud Computing, International Journal of
information technology.
[7]
Nicolas Bruno Microsoft Corp., Sapna Jain IIT
Bombay, Jingren Zhou Microsoft Corp. (2013)
Continuous Cloud- Scale Query Optimization
and Processing, the 39th International
Conference on Very Large Data Bases, August
26th - 30th 2013, Italy
[8]
processing
and
Michael L. Rupley,(2013) Introduction to Query
Processing and Optimization, mrupleyj@iusb.edu

________________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014
28
Download