I. Visualization and SQL Answer multiple choice questions below. Index 1 2 Question Which chart is suitable for this type of data. In Detail: BOP_Bal: Before of Period Balance EOP_Bal: End of Period Balance In_Bal: Money In-flow movement Out_Bal: Money Out-flow movement Region managers want to track their performance over months. What kind of chart that we should use? Multiple choices A. B. C. D. Column Chart Line Chart Waterfall Chart Pie Chart A. B. C. D. Pie chart Line chart Scatter chart Water fall chart 3 What is the normal use of histogram and the normal use of bar chart? A. Histogram is for count of data while bar chart is for categorical data B. Histogram is for categorical data while bar chart is for count of data C. Histogram is for continuous data while bar chart is for categorical data D. Histogram is for categorical data while bar chart is for continuous data E. None is correct Answer Index 4 Question Which method shows hierarchical data in a nested format? 5 Which of the following graph present the relationship between 2 quantitative variables: How to create table 3 from table 1 and table 2 ? 6 7 8 9 What is right result for this query: select dateadd(month, -1, day) from X where day = '20210106' If the SQL query shows an error, where the database system holds error diagnostics in In SQL server, how to find out if duplicates happen or not in column C of table T ? Multiple choices A. B. C. D. Treemaps Scatter plots Population pyramids Area charts A. B. C. D. A. Histogram Bar chart Pie chart Scatter chart Select * from table1 left join table2 on table1.contract_no =table2.contract_no B. Select * from table1 right join table2 on table1.contract_no =table2.contract_no C. Select * from table1 left join table2 on table1.contract_no like '%'||table2.contract_no ||'%' D. Select * from table1 left join table2 on table2.contract_no like '%'||table1.contract_no ||'%' A. B. C. D. A. B. C. D. 202101 20210105 20200106 20201206 Communication area variables Connection area variables SQL area variables Programming area variables A. Select T, count(*) from C group by T having count(*) > 1 Select T, count(distinct T) from C group by T having count(distinct T) > 1 B. Answer Index Question Multiple choices C. D. 10 When you want to execute commit work transaction, its effects cannot be undone by A. B. C. D. Select C, count(*) from T group by C having count(*) > 1 Select C, count(distinct C) from T group by C having count(distinct C) > 1 Trace work Transmit work Rollback work Traceback work Answer II. Analytic and modelling: Please produce python code which can be run on this exact data with reproducible results; standard packages such as pandas, numpy, etc. allowed; The code will show your python coding level and also your data science understanding. The below table contains ratings for restaurants from different testers. You will be given 2 tasks dealing with analysis and prediction related to this data. For all challenges please produce python code which can be run on this exact data with reproducible results. Restaurant Scores out of ten Critic A Asia Grand Critic B Critic C Critic D 4 7 9 6 5 6 10 5 Cathay Restaurant 3 Cherry Garden 3 Crystal Jade 3 Hua Ting 5 5 Imperial Treasure 8 8 Jade Palace 5 Jiang Nan Chun 3 New Majestic 10 5 Critic E Critic F 9 6 7 7 7 5 7 4 5 6 5 3 5 6 4 7 7 4 7 8 Peach Garden 4 6 Summer Palace Wah Lok 8 6 4 7 8 5 3 7 4 2 7 Analytics: 1. Please compute average rating for each restaurant Take into consideration that different testers are more severe than others Explain clearly what your method is and why you chose it 2. Please create 1 slide to summarize your finding from the data Simple model: 1. Fill in the missing values using expectation maximization by following step: Normalize the values for the testers and transform values to z-scores Compute the mean and covariance matrix for the critics using normalized values Fill in the missing values by maximizing log likelihood of the joint distribution 2. Suggest the best restaurant, and provide the rationale 6 4 III. Designing: Keep it very simple, clean [architecture and code], don’t forget about testing, documentation, instructions, focus on a full deliverable and strictly on what matters, nothing more nothing less; you will be evaluated on these. Your solution should be sent in a source code file. Deliverable: Write a REST API with two POST endpoints First POST endpoint receives a JSON in the form of a document with two fields: a pool-id (numeric) and a pool-values (array of values) and is meant to append (if pool already exists) or insert (new pool) the values to the appropriate pool (as per the id): e.g. { "poolId": 123546, "poolValues": [ 1, 7, 2, 6 ] } Second POST is meant to query a pool, the two fields are pool-id (numeric) identifying the queried pool, and a quantile (in percentile form) e.g. { "poolId": 123546, "percentile":99.5 } The response from the append is a status field confirming "appended" or "inserted". The response from the query has two fields: the calculated quantile and the total count of elements in the pool Please do not use a library for the quantile calculation if a pool contains less than 100 values. Focus on high performance if possible (time permitting) and resiliency Reasoning about high-availability and scalability is a nice-to-have No database; no connection to anything needed. Keep it simple. Your preferred language. The programming language does not need to be a systems language (that performs by definition), so no C/C++/Rust needed (unless this is your preference), really up to you (Python, Go, Java, Scala, ...).