PowerPoint presentation: Template

advertisement
Hadoop Hands On: Teaching
MapReduce to Business
Students through Analogy
BDA EdCon 2015
Hadoop Hands On: Teaching
MapReduce to Business Students
through Analogy
Colin Conrad1, Hossam Ali-Hassan2, and Michael Bliemel2
1Dalhousie
University, Faculty of Computer Science/Faculty of Management, Halifax, Canada
2Dalhousie University, Faculty of Management, Rowe School of Business, Halifax, Canada
Context: courses teaching BI&A
• Undergraduate course (commerce
and management) – COMM4512:
Business Intelligence
• Graduate course (MBA and MEC)
– BUSI 6513: Business Analytics
• Non technical business students
• Focus on end-user analytics
• Conceptual (e.g. Big Data) and
practical (e.g. IBM Cognos Insight,
SAP Predictive Analysis…)
• Need to simplify complex or very
technical concepts
BDA EdCon– Puerto Rico, August 12, 2015
3
Introduction
• MapReduce paradigm
•
•
•
Processing very large datasets (“big data”)
Uses computer clusters
Google (early 2000s)
• Apache Hadoop
•
•
•
Popular open source application of MapReduce
“The Apache™ Hadoop® project develops opensource software for reliable, scalable, distributed
computing” (https://hadoop.apache.org/ )
Increasing popularity and adoption
• Need to teach Hadoop/MapReduce to a broad
audience
BDA EdCon– Puerto Rico, August 12, 2015
4
Problem: complexity…and attention span!
“When a user calls the MapReduce function, the user
program triggers a multi-step process invoking the nodes of
the cluster. The program begins by splitting the input files
into manageable sizes, which are then assigned to various
“worker” machines by a special “master” node. The master
then assigns map and reduce tasks to the workers.
Workers assigned with map tasks proceed to identify data.
As the map workers make progress, the master notifies
reduce works of the location and nature of the processed
data. The reduce workers iterate over the sorted data and
eventually pass the results of the reduce function to the
master node, completing the MapReduce call.” (Conrad et al., 2015)
BDA EdCon– Puerto Rico, August 12, 2015
5
Problem: still complex…
BDA EdCon– Puerto Rico, August 12, 2015
6
Source: http://opensource.com/life/14/8/intro-apache-hadoop-big-data
Solution
• Analogy
•
•
Used to describe a complex or abstract subject
by drawing on students’ prior knowledge of a
different subject matter
Significant role in the teaching and learning of
science (Treagust & Duit, 2015)
• Engagement
•
•
Absorption: heightened attention (Tellegen & Atkinson, 1974)
Flow: losing track of time (Csikszentmihalyi, 2014)
• Games and simulations: cognitive absorption
and active learning (Agarwal & Karahanna, 2000)
BDA EdCon– Puerto Rico, August 12, 2015
7
Using playing cards to explain MapReduce
•
•
•
•
Students are computers or nodes
Groups of students/nodes make-up clusters
Cards are the data
Stickers to assign “Task Tracker” and “Job
Trackers” roles
• Multiple decks of cards depending on class
size and exercise (we used 6 in class)
BDA EdCon– Puerto Rico, August 12, 2015
8
Data Meaning
The multiple decks of
cards represent raw data.
Some of which is useful.
For example these could
represent product reviews
on webpages – where the
number is the stars rating,
the suit is the product,
and the other cards A, J,
K, Q, Jokers are text on
the page that is not useful
BDA EdCon– Puerto Rico, August 12, 2015
Exercise 1: Which product has best
review?
Randomly remove
10-20 cards from the
pile
What product has the
best reviews? Hearts,
Spades, Clubs or
Diamonds?
You need to sum all
the points of all the
cards of each suit
BDA EdCon– Puerto Rico, August 12, 2015
Add these
The Hadoop Distributed File System
One student is the “Job
Tracker”
Each student at the end
of a row is a “Task
Tracker”
The rest of the class are
“Worker Nodes” that will
do the data processing
BDA EdCon– Puerto Rico, August 12, 2015
Job Tracker
The Job Tracker will
fairly distribute the work
(cards) to each Task
Tracker (student at the
end of the row). Each
row in class represents
a cluster of nodes.
BDA EdCon– Puerto Rico, August 12, 2015
The Task Tracker then
distributes the data to
the Worker Nodes in
their row so that the
workload is evenly
balanced (the task
tracker can also do
work in this instance,
but some HDFS
implementations have
trackers only tracking.)
Map Process
Each Worker Node now
maps the data – by
Product and Review
– Sort the cards into
piles by suit and
discard the non
number cards
BDA EdCon– Puerto Rico, August 12, 2015
Map Process
Job Tracker
Task Trackers
BDA EdCon– Puerto Rico, August 12, 2015
Worker Nodes
Task Trackers
Reduce Process
Now the Task Tracker
moves cards from
nodes to recombine into
suits so that each
Worker Node has one
or two suits, and all
cards of that suit
BDA EdCon– Puerto Rico, August 12, 2015
Worker Nodes now sum
up the total points and
report them to the Task
Tracker
The Job Tracker asks
each Task Tracker for
their totals and then
determines which
products (suit) ranks
highest
Reduce Process
Task Trackers
Worker Nodes
Task Trackers
Job Tracker
(60)
(54)
(96)
(228)
(72)
(90)
(198)
(84)
(120)
(300)
(84)
(78)
(60)
(216)
(84)
BDA EdCon– Puerto Rico, August 12, 2015
(60)
16
Exercise 2 (Timed Challenge): what
is the missing card?
Reshuffle all the cards
Pick one card, do not
reveal it to the class
Give the data (cards) to
the Job Tracker
The Hadoop Cluster now
has the job to find the
missing card
BDA EdCon– Puerto Rico, August 12, 2015
The Job Tracker
delegates the work to
Task Trackers
Job Tracker balances the
work between Task
Tracker nodes
Job is clear and specific –
i.e. Sort these by suit,
then by number
Data can move between
across Task Nodes
Conclusion
• Learning Outcomes
•
•
•
•
MapReduce and HDFS
Distributed computing
Open source software and adaptability
Under-performing nodes and reassignment of tasks
•
Importance of debrief at the end of any game or
simulation
• Student feedback
•
“engaging”, “exciting”, “challenging”, “interactive”, “immersive”
and “memorable”
• Pedagogical value of analogy and games
BDA EdCon– Puerto Rico, August 12, 2015
18
Thank You
BDA EdCon– Puerto Rico, August 12, 2015
19
Download