San Jose State University Creates `Data Wranglers`

advertisement
Overview
San Jose State University (SJSU) is the oldest public institution of higher education
on the United States’ West Coast. The school was founded in 1857 to train teachers for
the “developing frontier.”1 A lot has changed since then, but SJSU’s present-day tagline,
“powering Silicon Valley,” demonstrates a consistent goal: to arm students with the
knowledge and experience that will help them thrive in today’s science- and technologydriven market.
San Jose State University
Creates ‘Data Wranglers’ in
Partnership with Cloudera
One of SJSU’s adjunct professors, Peter Zadrozny, educates students on big data analytics. Zadrozny brings a wealth of real-world experience to the courses he teaches, with a
background as a software architect and developer at companies ranging from start-ups
to the Fortune 500. His goal is to offer students hands-on experience with big data
technologies that hiring managers are looking for.
The Challenge
In describing his motivation to put together a practical big data course, Zadrozny
explained, “I wanted to really teach students what big data is all about, and to help them
get beyond the buzzword. Hiring managers want people that have gone through the
steepest part of the learning curve. That’s the objective of what we’re trying to do here.”
His goal is to create “data wranglers” – people who may not have deep domain expertise,
but can apply technology and big data experience to a wide range of industry-specific
challenges. According to Zadrozny, there are two main areas that have driven today’s
need for big data:
• The human digital footprint: Society’s movement from in-person activities,
interactions, and transactions to the web has resulted in an exhaust of digital
breadcrumbs that can be ingested and analyzed for better understanding of our
needs, preferences, and behaviors. “We have Facebook, we email, we tweet, we check
in with Foursquare – that is what I call the human digital footprint,” said Zadrozny.
• Machine data: The proliferation of mobile and digital enabled devices also translates
into big data that can be ingested and acted upon for better manufacturing, supply
chain optimization, and other operational processes. Zadrozny explained, “All the
servers, data centers, cloud services – those machines are producing a ton of log files
that have to be processed.”
CUSTOMER SUCCESS STORY
1
Key Highlights
Industry
• Education
Location
• San Jose, CA, USA
Objectives
• Combine lecture-based education with
practical experience
• Maximize marketability of students
Technologies In Use
• Hadoop Platform: Cloudera
University License
• Hadoop Components: Cloudera
Manager, Hive
• Server: GoGrid
• Analytic Tool: Splunk
The Solution
In developing the curriculum for SJSU’s Big Data Analytics course, Zadrozny decided
the logical approach would be to teach Apache Hadoop and Splunk. Within the Hadoop
curriculum, students learn Hive, which leverages existing analytical skills, including SQL,
for the big data sets at the core of the emerging data economy.
A key part of the course is having students deliver a big data project that demonstrates
they know how to work with the tools. In addition to partnering with Cloudera and Splunk,
SJSU has established a partnership with GoGrid to give students a cloud-based environment on which to build their big data projects.
Zadrozny led SJSU’s participation in the Cloudera Academic Partnership (CAP) program to
streamline and accelerate the Hadoop curriculum development. He noted, “When people
think Hadoop, they think Cloudera. I have to give students something that makes them
marketable. If I don’t teach Hadoop on Cloudera, their chances of getting a job are slimmer.”
As part of the CAP program, Cloudera provides SJSU with:
• Course materials along with exercises that allow students to correlate theory with
hands-on experience
• Discounts to Cloudera University training and certification exams
• A 12-month University License to Cloudera Enterprise for the university’s research
staff
• Unlimited access to Cloudera Express or the Cloudera Quickstart VM for both
professors and students
For their projects, students gain experience working with live data from sources such as
the Federal Aviation Administration, Foursquare, IMDb, Twitter, and Yelp. They learn how
to set up a Hadoop cluster, load data, query it using Hive, verify that their queries are
running properly, and then visualize and communicate the results of their analyses.
“We encourage students to tell a story with the data,” explained Zadrozny. “As you start
digging into it, you find interesting things, unusual facts, things that you wouldn’t have
anticipated or that are historically relevant.”
CUSTOMER SUCCESS STORY
2
Impact: Improved Marketability Through Practical Education
The Big Data Analytics course at SJSU is very popular, largely due to its integration of
hands-on exercises. “The support of Cloudera with the Cloudera Academic Partnership
has been incredible,” commented Zadrozny. “It provides slides and courseware that we
can use for teaching the theory, and also allows students to get the experience they
need. It’s not only theoretical; it is also very practical.”
“Whenever I go to job fairs, if I have something on my resume about Hive, Hadoop, or big
data, that’s what hiring managers ask about,” said Tanuvir Singh, a student of the course
pursuing his master’s degree in computer science.
Another student, Jaideep Katkar, reflected, “Before taking this course, the only thing I
knew about big data was the three Vs. After the course, I know all about big data – how
Hadoop handles it, how to use Hive, all of its features. Big data is a hot topic right now
and it will be very useful in my job.”
“The support of Cloudera with the
Cloudera Academic Partnership
has been incredible. It provides
slides and courseware that we can
use for teaching the theory, and
also allows students to get the
experience they need. It’s not only
theoretical; it is also very practical.”
Peter Zadrozny, Professor, SJSU
The students particularly value having access to Cloudera Manager, which simplifies
cluster configuration and administration. “If we had done it manually, it would have taken
a very long time. But with Cloudera Manager, it was very quick – it didn’t take more than
five minutes to set up our cluster,” said graduate student Nikitha Ganesh.
Rohit Vobbilisetty, another Big Data Analytics student, enthusiastically summarized,
“Hadoop and Hive have great importance in my resume. It’s great that I learned this.”
CUSTOMER SUCCESS STORY
3
About Cloudera
Cloudera is revolutionizing enterprise data management by offering the first unified
Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers
enterprises one place to store, process and analyze all their data, empowering them to
extend the value of existing investments while enabling fundamental new ways to derive
value from their data. Only Cloudera offers everything needed on a journey to an enterprise data hub, including software for business critical data challenges such as storage,
access, management, analysis, security and search. As the leading educator of Hadoop
professionals, Cloudera has trained over 40,000 individuals worldwide. Over 1,400
partners and a seasoned professional services team help deliver greater time to value.
Finally, only Cloudera provides proactive and predictive support to run an enterprise
data hub with confidence. Leading organizations in every industry plus top public sector
organizations globally run Cloudera in production. www.cloudera.com.
cloudera.com
1-888-789-1488 or 1-650-362-0488
Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA
© 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA
and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
cloudera-casestudy-sjsu-102
Download