Uploaded by Michael An

Technical test TFG Data

advertisement
I.
Visualization and SQL
Answer multiple choice questions below.
Index
1
2
Question
Which chart is suitable for this type of data.
In Detail:
BOP_Bal: Before of Period Balance
EOP_Bal: End of Period Balance
In_Bal: Money In-flow movement
Out_Bal: Money Out-flow movement
Region managers want to track their performance over months.
What kind of chart that we should use?
Multiple choices
A.
B.
C.
D.
Column Chart
Line Chart
Waterfall Chart
Pie Chart
A.
B.
C.
D.
Pie chart
Line chart
Scatter chart
Water fall chart
3
What is the normal use of histogram and the normal use of bar
chart?
A. Histogram is for count of data while bar chart is for categorical
data
B. Histogram is for categorical data while bar chart is for count of
data
C. Histogram is for continuous data while bar chart is for
categorical data
D. Histogram is for categorical data while bar chart is for
continuous data
E. None is correct
Answer
Index
4
Question
Which method shows hierarchical data in a nested format?
5
Which of the following graph present the relationship between
2 quantitative variables:
How to create table 3 from table 1 and table 2 ?
6
7
8
9
What is right result for this query:
select dateadd(month, -1, day) from X where day = '20210106'
If the SQL query shows an error, where the database system
holds error diagnostics in
In SQL server, how to find out if duplicates happen or not in
column C of table T ?
Multiple choices
A.
B.
C.
D.
Treemaps
Scatter plots
Population pyramids
Area charts
A.
B.
C.
D.
A.
Histogram
Bar chart
Pie chart
Scatter chart
Select * from table1 left join table2 on table1.contract_no
=table2.contract_no
B.
Select * from table1 right join table2 on table1.contract_no
=table2.contract_no
C.
Select * from table1 left join table2 on table1.contract_no like
'%'||table2.contract_no ||'%'
D.
Select * from table1 left join table2 on table2.contract_no like
'%'||table1.contract_no ||'%'
A.
B.
C.
D.
A.
B.
C.
D.
202101
20210105
20200106
20201206
Communication area variables
Connection area variables
SQL area variables
Programming area variables
A.
Select T, count(*) from C
group by T
having count(*) > 1
Select T, count(distinct T) from C
group by T
having count(distinct T) > 1
B.
Answer
Index
Question
Multiple choices
C.
D.
10
When you want to execute commit work transaction, its
effects cannot be undone by
A.
B.
C.
D.
Select C, count(*) from T
group by C
having count(*) > 1
Select C, count(distinct C) from T
group by C
having count(distinct C) > 1
Trace work
Transmit work
Rollback work
Traceback work
Answer
II. Analytic and modelling:
Please produce python code which can be run on this exact data with reproducible results; standard
packages such as pandas, numpy, etc. allowed;
The code will show your python coding level and also your data science understanding.
The below table contains ratings for restaurants from different testers.
You will be given 2 tasks dealing with analysis and prediction related to this data.
For all challenges please produce python code which can be run on this exact data with reproducible
results.
Restaurant
Scores out of ten
Critic A
Asia Grand
Critic B
Critic C
Critic D
4
7
9
6
5
6
10
5
Cathay Restaurant
3
Cherry Garden
3
Crystal Jade
3
Hua Ting
5
5
Imperial Treasure
8
8
Jade Palace
5
Jiang Nan Chun
3
New Majestic
10
5
Critic E
Critic F
9
6
7
7
7
5
7
4
5
6
5
3
5
6
4
7
7
4
7
8
Peach Garden
4
6
Summer Palace
Wah Lok
8
6
4
7
8
5
3
7
4
2
7
Analytics:
1. Please compute average rating for each restaurant
 Take into consideration that different testers are more severe than others
 Explain clearly what your method is and why you chose it
2. Please create 1 slide to summarize your finding from the data
Simple model:
1. Fill in the missing values using expectation maximization by following step:
 Normalize the values for the testers and transform values to z-scores
 Compute the mean and covariance matrix for the critics using normalized values
 Fill in the missing values by maximizing log likelihood of the joint distribution
2. Suggest the best restaurant, and provide the rationale
6
4
III. Designing:
Keep it very simple, clean [architecture and code], don’t forget about testing, documentation,
instructions, focus on a full deliverable and strictly on what matters, nothing more nothing less; you
will be evaluated on these.
Your solution should be sent in a source code file.
Deliverable:
 Write a REST API with two POST endpoints
 First POST endpoint receives a JSON in the form of a document with two fields: a pool-id (numeric)
and a pool-values (array of values) and is meant to append (if pool already exists) or insert (new
pool) the values to the appropriate pool (as per the id):
e.g.
{
"poolId": 123546,
"poolValues": [
1,
7,
2,
6
]
}
 Second POST is meant to query a pool, the two fields are pool-id (numeric) identifying the
queried pool, and a quantile (in percentile form)
e.g.
{
"poolId": 123546,
"percentile":99.5
}







The response from the append is a status field confirming "appended" or "inserted".
The response from the query has two fields: the calculated quantile and the total count of elements
in the pool
Please do not use a library for the quantile calculation if a pool contains less than 100 values.
Focus on high performance if possible (time permitting) and resiliency
Reasoning about high-availability and scalability is a nice-to-have
No database; no connection to anything needed. Keep it simple.
Your preferred language. The programming language does not need to be a systems language
(that performs by definition), so no C/C++/Rust needed (unless this is your preference), really up
to you (Python, Go, Java, Scala, ...).
Download