International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________________________ Query Processing and Optimization in the Distributed Architecture using NoSql: a Banking System 1 Piyush Gupta, 2Chandelkar Kashinath K. Dept. of Computer Science & Engineering, Birla Institute of Technology, Mesra, Jaipur Campus Email: 1piyush.bitjpr@gmail.com, 2Kashinath45@gmail.com Abstract -The banking sector is one of the best examples who create Big Data with increasing transactions per second across the globe. Each bank is administered by the DBA (Database Administrator) as a technical in charge. DBA makes an arrangement in the branch to divert query from authorized user to the appropriate data repository. The SQL core Engine accepts input stream, processes and optimized to increase efficiency across the network. QO (Query Optimiser) and QE (Query Executor) is an important module that helps in the process. Due to the existing limitations of the system, a cloud based distributed architecture comprising of IaaS (Infrastructure as a Service), PaaS (Platform as a Service) and SaaS (Software as a Service) is used with NoSql. Hadoop and MangoDB are an example that still retains ACID (Atomicity, Consistency, Isolation, Durability) features of cloud environment. Keywords- DBA, NoSql, QO, QE, ACID, IaaS, PaaS, SaaS, MangoDB, Hadoop. I. INTRODUCTION the repository, it is filtered and cleaned before delivery in specified format. The data repository is a collection of relational database having tables and their relations that result into problem specified is section 4. II. PROBLEM The banking system carries financial information across the channel having approximately thousand transactions per minutes per branch. Each session has data stream is diverted to main server. The server denies the request if it has insufficient amount of both physical and logical resources like RAM and Buffer Size. The increasing structured and unstructured data lead to a new concept called “Big Data” [3]. This huge chunk of data in network consumes resources and cost if not utilized for decision making. Given system is said to be efficient if it scales with demand [4]. In case of banking, query processing and optimization is a challenge with increasing requests from dynamic nodes for information retrieval on a voice based system. III. PROPOSED SOLUTION Fig-1: Banking System Considering the banking system as shown in above Fig1, each branch has its own servers connected to different users with deployed policies managed by system administrators [1] [10]. Each authorized bank employee communicates to the server and data repository as per the need of a transaction. Every time the query is sent to server in need of information or verification, the data stream has to pass through the server core engine that optimizes and executes the query as per the user requirement [11]. Once the information is collected from Query processing and optimization shall be made efficient by using 3 layers. At layer first all physical and logical components are identified. The data input stream in traditional system is given as input to the SQL core engine [11] as shown in Fig-2. This is one of the identified areas for query processing and optimization that needs to explore to enhance. The DBA (Database Administrator) or an end user who provides query as an input is in layer second. Minor changes in the query input shall save large amount resources in the long run. Instead of using traditional database [5], using instances of NoSql [1] [3] database system in distributed architecture resolves the issues if left any in layer third. ________________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014 25 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________________________ Fig-2: SQL Core Engine Query processing Layer 1 The organization or banking sector that needs time to migrate to new system needs to focus on the SQL core engine for the time being. The core engine consists of a traditional storage engine sandwiched between SQL operating system and LPE (Language Processing Execution). It is supported by QE (Query Optimiser) and QE (Query Executor) in the process. Query optimisers being the responsible module for efficiency, the following components are identified. Rewriter – Applies transformation to the given query and produces equivalent query that is more efficient. Planner – it is a main module of ordering stage. It examines all possible executions for each query produced in the previous stage and selects the overall cheapest one to generate answers of query. The searching strategy is employed. The determined cost is the running time of the optimizer itself, which should be as low as possible. Algebraic space – it determines the execution order. A series of action produces the same query answer which differs in performance. Method Structure- it is used to join available methods like loop, hash join, universal join etc. Cost Model- used to estimate the cost of the execution plan for every distinct step. Formula to estimate the cost is dependent on buffer management, Disk, CPU overlaps, sequential and random I/O and size of the buffer pool. Size distribution Estimator- specifies how the size (frequency distribution and attributes value) of database relations and (sub) query results are estimated. The DBA has to concentrate on minimizing the optimizer running time by increasing the size of the buffer pool. Layer 2 The query optimization is possible by writing efficient queries as shown in table-1. General Tip Specific Column Names instead of * in SELECT Query Try to avoid HAVING Clause in Select statements Write query SELECT col_1, col_2, col_3, FROM table_name; SELECT Col_1, count (Col_1) FROM table_name WHERE col_1! = testvalue1‘AND col_1! = =testvalue1‘ GROUP BY Instead of SELECT FROM table_name; * SELECT Col_1, count (Col_1) FROM table_name GROUP BY Col_1 HAVING Col_1! = ‗testvalue1‘ AND Col_1! = ‗testvalue2‘; ________________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014 26 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________________________ We should try to carefully use conditions in WHERE clause. We should try to use NONColumn expression on one side of the SQL query as it will be processed before any other clause. col_1; SELECT id, col1, col2 FROM table WHERE col2 > 10; SELECT id, Col1, Col2 FROM table WHERE Col2 < 25000; SELECT id, col1, col2 FROM table WHERE col2!= 10; SELECT id, Col1, Col2 FROM Table WHERE Col2 + 10000 < 35000; Cloud computing [1] [2] [4] is an upcoming technology that allows migration of software using virtualization. Cloud support IaaS (Infrastructure as a Service), PaaS (Platform as a Service) and SaaS (Software as a Service). The existing software shall be migrated in cloud environment which is more secure and efficient than traditional systems. The instance of software is a virtual image placed inside the virtual box or VMware that is managed by the hypervisor. Public, private or hybrid cloud supports running of multiple virtual instances as software. Since the service is pay per use, the end user does not have to invest in physical resources. For a banking sector, the available system shall be migrated using NoSql instance like Hadoop, MangoDB or similar software as a service. The following features about the NoSql instance. Table-1: Efficient Query Writing Layer 3 IV. VALIDATION OF PROPOSED SOLUTION Fig 3 : Hadoop Instance running in the cloud Fig-3 shows an instance of Hadoop running in Bitnami cloud [16], which is one of the examples of NoSql. After migrating Hadoop in the cloud, a set of queries was given as an input. The above figure discloses an output for a single query Running multiple Hadoop instances is also possible. Running NoSql not only increa efficiency and security in the banking sector, it becomes more economical supporting ACID (Atomicity, Consistency, Isolation, Durability) factors. Does not use SQL as a query language. Cloud friendly Load balancing is by distributing itself over lots of ordinary chips and Intel based servers. It is not a relational database. Economic Supports key value store, document store, big table and graph dat Uses an array that reduces the need of joining. V. CONCLUSION Query processing and optimization issues for banking sector are dealt with using NoSql, due to the limitations of existing database systems. Query optimiser being the key module in the existing system is elaborated to enhance efficiency. A 3 layered solution having physical and logical resources in first, human query input as second and using instance of NoSql Software in distributed architecture shall definitely overcome existing banking limitations. REFERENCES [1] Rahul Prasad Kanu , Shabeera T P , S D Madhu Kumar (2014) Dynamic Cluster Configuration Algorithm in MapReduce Cloud, International ________________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014 27 International Journal of Advanced Computer Engineering and Communication Technology (IJACECT) ________________________________________________________________________________________________ Journal of Computer Science and Information Technologies, Vol. 5 (3), 2014, 4028-4033. [9] Gautam Vemuganti (2013) Metadata Management in Big Data, Infosys lab Briefing. [2] Mr. Kulkarni, N. N., Dr. Pawar V. P., Dr. K.K Deshmukh (2014) Evaluation of Information Retrieval in Cloud computing based services, Asian Journal of Management Sciences 02 (03 (Special Issue)) [10] N. Samatha, K. Vijay Chandu, P. Raja Sekhar Reddy (2012) Query Optimization Issues for Data Retrieval in Cloud Computing, International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 5. [3] Brian Hellig, Stephen Turner, rich color, long Zheng-2014- beyond map educe: the next generation of big data analytics HAMR.Eti.com. [11] [4] Ismail Hmeidi, Maryan Yatim, Ala’ Ibrahim, Mai Abujazouh, 2014 - Survey of Cloud Computing, Web Services for Healthcare Information Retrieval Systems, International conference on Computing Technology and Information Management, Dubai, UAE. Navita Kumari (2012) SQL Server Query Optimization Techniques - Tips for Writing Efficient and Faster Queries, International Journal of Scientific and Research Publications, Volume 2, Issue 6, June 2012. [12] Yannis E. Ioannidis (2012) Query Optimization, yannis@cs.wisc.edu [13] Pawel Jurczyk and Li Xiong (2009) Dynamic Query Processing for P2P Data Services in the Cloud, Springer-Verlag Berlin Heidelberg. [14] William Hersh -2008 Future perspectives Ubiquitous but unfinished: grand challenges for information retrieval, Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA. [15] Ravishankar Ramamurthy David J. DeWitt (2005) Buffer Pool Aware Query Optimization, Department of Computer Sciences, University of Wisconsin- Madison, USA. [16] www.Bitnami.com [5] Oracle.com (2014) Query optimization in DBMS. [6] Dr. Sanjay Mishra, Dr. Arun Tiwari 2013- A Novel Technique for Information Retrieval Based on Cloud Computing, International Journal of information technology. [7] Nicolas Bruno Microsoft Corp., Sapna Jain IIT Bombay, Jingren Zhou Microsoft Corp. (2013) Continuous Cloud- Scale Query Optimization and Processing, the 39th International Conference on Very Large Data Bases, August 26th - 30th 2013, Italy [8] processing and Michael L. Rupley,(2013) Introduction to Query Processing and Optimization, mrupleyj@iusb.edu ________________________________________________________________________________________________ ISSN (Print): 2319-2526, Volume-3, Issue-4, 2014 28