At this point, the data from the cameras and microphones is

advertisement
Conference Session: C7
Paper # 2338
ADVANCEMENTS IN 3-D SENSING TECHNOLOGY IMPLIMENTED BY THE
KINECT
Thomas Forsythe (tpf9@pitt.edu), Marvin Green (meg89@pitt.edi)
Abstract— On November 4th, 2010, Microsoft released a
brand new toy for their Xbox 360 system called the
“Kinect.” The Kinect, to the average consumer, was nothing
more than a special camera that could detect your motions
and reflect them on the television in the form of a game. In
reality, the Kinect is a module that houses a color camera,
an infrared light camera, an infrared light projector, and an
electronic motor. What really makes the Kinect flourish,
however, is the software written for it that enables it to
capture and interpret the 3-D environment around it. This
paper will describe and evaluate the innovative Kinect by
Microsoft. It will describe how the employment of 3-D
sensing technology in the Kinect offers innovative new ways
to interact with a variety of situations ranging from the
hospital to the living room.
The Kinect, besides being used to interact with one’s
Xbox 360 console, has been used by surgeons to map bone
structures on patients, allowed doctors to flick through
patient pictures without touching a unsterile screen, and
been used to monitor elderly patients to detect when they are
in pain in their house. The purpose of this paper is to expand
upon those subjects so the reader may get a grasp of the
different hardware and software components of the Kinect,
and who will use them.
because there have been innovative uses of the Kinect so far.
The Kinect, besides being used to interact with one on the
Xbox 360 console, has been used by surgeons to map bone
structures on patients, allowed doctors to flick through
patient pictures without touching a unsterile screen, and is
being researched for use in elderly patients to detect when
they are in pain in their house [2]. The purpose of this paper
is to expand upon those subjects so the reader may get a
grasp of the different hardware and software components of
the Kinect, and who will use them.
HARDWARE WITH SERIOUS POWER
Prior to the Kinect’s release, any device with hardware up to
par was considered highly specialized equipment. Since the
release of the Kinect, this often-costly equipment is now
simply an addition, or add-on, to the Xbox 360. The Xbox
Kinect uses a powerful combination of color and infrared
(IR) cameras that creates a new approach to 3D sensing
technology. In addition to the high-powered cameras, the
Kinect employs the use of a stereo microphone to capture
the user’s voice with a high level of clarity. Working in
conjunction, these few elements form the foundation for the
Kinect. From afar this may not seem impressive at all, but
each element has made impressive advancements in
household.
Key Words— Xbox Kinect, PrimeSense, OpenNI, medical
uses, Motion sensing technology, Microsoft
INTRODUCTION
The Kinect itself is structured light camera with a normal
camera attached. A structured light camera works by
emitting infrared light through the infrared projector, then
through complicated mathematical formulas that will
expanded upon later, is interpreted by the infrared camera as
a 3 dimensional room. When a subject moves, the infrared
camera sees the infrared light move and translates that as
motion. While the Kinect is a complex camera, it is the
software that enables it to see a room as a 3 dimensional
environment like a human does. Microsoft worked along
with the company PrimeSense to develop the coding for the
Kinect.
OpenNI
Primarily, Primesense’s “Nite” software is what allows the
Kinect the view it’s environment as 3 dimensional, but
Primesense’s “OpenNI” software is important too as it
allows developers to write coding so they can improve the
Kinect on their own [1]. It is very important that OpenNI
allows developers to write their own code for the Kinect
Figure 1 [7]
University of Pittsburgh
Swanson School of Engineering
March 1st, 2012
1
Thomas Forsythe
Marvin Green
Kinect’s ability to capture the sounds of the environment are
impressive, it has a “wide-field, conic audio capture” [3].
This accurate ability is only due to the pair of microphones
that lie on the body of the Kinect, capturing a large range of
sounds throughout the room. This method of using two
microphones together in stereo is not a new technology, but
is very new “living-room technology”. The microphones on
the Kinect accurately distinguish the ambient sounds
throughout the entire room [3]. This is important because the
Kinect would only be able to pick of sounds of the television
if this were not the case. In addition, the microphone’s
precision allows the device to distinguish between multiple
human voices simultaneously. This requires very specialized
noise-cancellation, that which is not found and most devices.
After capturing voices, the microphones convert this
information to electrical signals that are then passed on to
the PS1080 chip.
At this point, the data from the cameras and
microphones is synchronized as it gets processed in the
PS1080. From a hardware aspect, it may seem like the
camera and microphones do the hard work, but it is the
PS1080 chip that actually does all the heavy lifting [4].
These devices essentially just capture and transmit data to
the chip where it is then processed by the software
components of the Kinect. After transferring the data to the
chip, the hardware has then completed its job. The PS1080 is
still technically hardware in the Kinect, but certainly should
be evaluated on its incredible processing power.
Building a 3D Environment
Behind the sleek black design of the Kinect lies three
separate optical devices; a color camera, an infrared camera,
and an infrared light sensor. The color camera has a
resolution of 640x480 and is finished with an IR lens filter
[3].
Figure 2 [3]
Both the color and IR camera rely on CMOS
(complementary metal oxide semiconductor) image
sensors to create information that will then be passed to
computer chips in the Kinect. CMOS sensors detect and
capturing visible light then converts it into electrical signals
[3]. These electrical signals created by the sensors allow the
cameras to communicate with the innovative PS1080 chip in
the device. The PS1080 is the high-tech chip powering the
firmware on the Kinect and will further be discussed later.
The IR camera operates in conjunction with the
infrared light sensor in a way that was previously used in
very specialized 3D equipment. On the metal frame of the
Kinect is an IR light sensor, mounted closest to the edge of
the device [3]. When the Kinect is in use, the sensor projects
a highly dense array throughout the environment in front of
the device. The infrared dot arrangement essentially creates
a mesh grid that “constantly changes based on the objects
that reflect light” [4]. These dots will “change size and
position based on how far the objects are away” [4]. The
changes are detected and sent to the PS1080 chip by CMOS
sensors on the IR camera. With this information, the chip
“builds a basic shape of the room it sees through the camera”
[4], called a depth map, and then finally begins processing
this information. The color camera also passes data to the
PS1080 that is processed and used to construct a 3D model
of the environment. The camera’s on the Kinect are very
powerful and transfer very valuable information on to be
processed by the software.
THE SOFTWARE SIDE
The process used by the Kinect to generate a depth map is
not only different, but is also more accurate than most 3D
detecting devices. In the past 3D detection commonly relied
on the time-of-flight method – “infrared light (or the
equivalent: invisible frequencies of light) were sent out into
a 3D space, and then the time and wavelengths of light that
returned to the specially-tuned cameras would be able to
figure out what the space looked like from that” [3]. The
Kinect instead uses infrared light projected by the IR sensor,
to create a 3D model of then environment. This method
analyzes the curves and changes in the map which is much
more accurate than time-based calculations used in other
devices. However, this new method requires significantly
more computing power and would necessitate a powerful
processor. Thus the PS1080 was developed, an essential part
of Xbox Kinect.
THE PS1080 ON-BOARD PROCESSOR
The PS1080 is an on-board processor that sits right inside
the Kinect [4]. Algorithms process the data from the CMOS
sensors and start deciphering the image. Although the
infrared light sensor picks up on changes in the infrared dotmap, it is the on-board chip that actual turns this raw data in
to usable information. The PS1080 processes this data and is
then able to determine changes in the size and location of the
Audio Capturing
Part of the Xbox Kinect’s novelty comes from allowing the
user to interact with the Xbox through voice commands. The
2
Thomas Forsythe
Marvin Green
projected IR dots. When the chip computes this data it then
is able to create a depth map, which can then determine the
“location and position of an object with respect to the
sensors” [4].
The CMOS sensors on the visible and infrared light
camera sit next to each other, which allow the chip to easily
merge together a depth and color image [4]. To ensure these
two separate images are properly stitched together, the chip
performs a registration process that joins the color image
(RGB) and depth map (D) to provide the produce RGBD
information. The PS1080 chip simultaneously and separately
handles information from four external digital audio sources,
which is then further processed by “host”. The host refers to
the firmware running on the PS1080 that “handles all higherlevel object and action recognition” [4]. OpenNI is the host
for the Kinect and is responsible for turning advanced 3D
detection to the interactive gaming experience
chip. After reaching the chip, data from the IR camera is
used to determine location and depth of objects, then is
stitched together to form visible video. At this point all the
information collected by the PS1080 is then evaluated by
OpenNI. Natural Interaction is comprised of various
middleware components that work in conjunction to
translate real life actions, in to events or actions in a game.
With Natural Interaction, the Kinect is able to track hand
gestures, interpret body motions, and received vocal
commands. Middleware is a component of the firmware that
is used to accurately distinguish “human body parts and
joints as well as distinguish individual human faces from one
another” [1]. From the RGBD information provided by the
cameras, higher-end middleware can track the motion of the
body, or even movement of an object. Natural Interaction
can therefore precisely track movements in each part of the
body, while “knowing” exactly who is using the device [7].
NITE
NITE middleware, an open source framework for
programmers, is “what allows computers or digital devices
to perceive the world in 3D” [7]. NITE is the foundation for
the whole known as Natural Interaction, whereas is the
middleware that powers the OpenNI. Most importantly,
NITE is an open source framework that can be used and
modified by any programmer [7]. Any programmer working
with 3D technology can now work with and improve on the
most up to date motion tracking middleware. This in effect
will certainly promote people to find new ways to develop
and use this technology. For this same reason, many people
have already began modifying the Kinect to perform
different tasks or actions based on the user’s “natural
interaction” with the device.
KINECT AND THE ELDERLY
With advances in the medical field, the average life
expectancy of an American is now around 80 years. This
means that there are more and more elderly people in
American that need to be supervised in the event they should
fall in their house. One way this can be done is by having
elderly people live in a nursing home where professionals
can watch over them at all times. The problem with this is
that abuse can occur in nursing homes without the elderly
patient’s family knowing, and it is a solution that requires
several medical professionals to work at the nursing home.
Another solution to this problem is having elderly people
live in their homes, but wear a device around their neck or
hand that they can press in the case of an emergency. When
they press this device, an ambulance is then dispatched to
their house where they can get help. This is great, but not
helpful if the elderly person becomes unconscious as they
fall, a stroke renders them paralyzed, or they forget to press
the button [Rexit].
Figure 3 [5]
Natural Interaction
Thus far, the Kinect would have sense light, converted it to
electrical signals, and passed this raw data to the PS1080
3
Thomas Forsythe
Marvin Green
These are exactly the reasons why the Kinect is the
next generation in health security for elderly people. The
Kinect, when placed in the homes of these elderly people
will allow for constant, automatic monitoring of the patient.
When the Kinect system detects that a person is in trouble, it
will alert emergency staff, as well as provide the person’s
location and details of the emergency [Rexit].
For now, this Kinect system is just a concept that
has just begun to be research. When it was researched, it was
only tested with one Kinect camera in one room. Possible
future tests involving multiple Kinect’s in multiple rooms
could further prove the Kinect’s worth. The fact that it is
more reliable than the necklaces and wristbands, and that it
can give medical staff an easy workload already proves that
its potential is great [Rexit].
Natural Gestures
PAIN DETECTION
When in emergency situation, like a heart attack or a fall,
people generally tend to perform specific gestures. This is
called a “Natural Gesture” and they are the basis for the
automated Kinect home monitoring system. Basically, a
natural gesture would be a person clutching their heart
during a heart attack, or a person stumbling on the ground
during a fall. The person does not need to be trained to
perform a natural gesture, unlike pressing a button, for a
natural gesture is encoded in the human brain from birth.
With the Kinect able to recognize the severity and location
of pain, emergency services can be automatically dispatched
with better information about how to help the patient before
they even leave the hospital [Rexit].
The purpose of this section is to explain how exactly the
Kinect interprets a video feed of a person into natural
gestures, and how it determines the location and severity of a
patient’s pain as described above. This is essential for
emergency services to be able to monitor several patients at
the same time with only one medical staff [Dapkus].
The Problem
The process of monitoring several of patients is going to
involve visualization, which is a way to interpret patient
information at the medical staffer’s workstation.
Visualization is going to be the process of interpreting the
patient’s body in three dimensions as well as determining the
severity and location of the patient’s pain. The hard part
about this is presenting this information to the medical
staffer in a way that will allow him to keep tabs of dozens of
patients at once [Dapkis].
The System Explained
The way the system performs the above is simple in concept,
but complex in design. The Kinect system works by taking
many images of the person every second, and checks their
position against its database of pre-defined natural gestures.
When the Kinect system sees that a person is in a harmful
natural gesture for a pre-determined amount of time, it alerts
the authorities with information concerning the severity and
location of the pain. Some natural gestures do not take as
long as others. For example, if a person falls to the ground
and is clutching their chest, the Kinect system will alert the
medical team immediately. If a person has been clutching
their arm for a long time, the Kinect might send a message to
the authorities so they can call the patient to check up on
them [Rexit].
The process through which the Kinect monitors
pain is also very important. Several times a second, the
Kinect sends information about the person’s body position,
location, and if any parts of their body are in pain to an
outside monitoring system. This is useful because it allows
one person to monitor dozens of patients from a single
workstation, which solves the problem of not enough
professional staff being available to monitor patients [Rexit].
The Kinect’s different responses to situations and
the amount of time before it performs a response depend on
how the medical professionals tweak the Kinect code. This
is important because it allows the Kinect to be personalized
to a specific patient that may have a unique condition
[Rexit].
The Visualization Solution
There are different types of three dimensional human body
models used to display a specific patient. However, the free
“. vtk” format has been selected to be used for this system
because it supports a wide variety of visualization
algorithms and the fact that it is free means programmers
won’t have to waste time getting permission from a
company. The Kinect system visualizes pain location and
intensity by using spheres. These “pain spheres” are very
efficient because they represent different pain levels by
changing the size and color of the sphere, making it easy for
a medical staffer to notice when a patient is in trouble. The
medical staffer is also able to determine the location of the
pain since the Kinect system places the sphere on the
location of the patient’s three dimensional body
representation. The graphs below show how the .vtk format
visualizes body location tags and pain levels [Dapkis].
4
Thomas Forsythe
Marvin Green
cannot fall into the wrong hands. First, this means that the
medical staffer who is monitoring the patients should be
trusted and have background checks performed on him or
her before they should be allowed such a job. Another
possible alternative to this is only allowing the threedimensional model of the person’s body to be accessed by
the medical staffer. This would prevent the medical staffer
from observing the patient’s house and keep him focused on
what is important: the patient. Secondly, this means that if
three dimensional camera, regular camera, and microphone
data is sent to a medical staffer’s workstation, it needs secure
encryption through the Internet. If not, this could rise to a
new social problem where hackers and thieves could peer
into a person’s home and personal life. All in all, if these
pre-cautions are considered and taken into account for, the
Kinect has the potential to revolutionize personal healthcare
[Rexit].
Figure 4 [Dapkis]
KINECT, THE FUTURE IN 3D
Kinect was initially released as simply an addition to the
Xbox 360. Since then, the engineering community has taken
this living-room luxury and turned it into a device that can
potentially save lives. The hardware in the Kinect is no
breakthrough in technology; but when these components
work in conjunction with OpenNI software the Kinect
becomes a highly innovative device. In the past using IR
light sensors to detect change in motion was only used in
highly specialized equipment. Today, science has reached
the point where this specialized equipment can be offered
and tailored for any unique medical condition. It’s often
overlooked how much technology has developed in the last
few years but innovations such as the Kinect serve as a
reminder of science’s great success.
Patient Interaction
Figure 5 [Dapkis]
When the Kinect system has determined that a patient is in
pain, a unique message is sent over the Internet to the
medical staffer’s workstation. The workstation screen will
then update with patient pain location and severity on the
patient’s three-dimensional body. The .vtk format is very
convenient because it allows the staffer to explore the
patient’s three-dimensional body and use the Kinect’s
camera to see the patient’s surroundings and determine what
may have caused the pain. From there, the medical staffer
has the options of communicating with the patient through
the Kinect’s built in microphone and speaker, or calling
emergency services with specific information about the
patient’s injury [Dapkis].
REFERENCES
[1]
(2011).
“OpenNI.”
Primesense.
[Online].
Available:
http://www.primesense.com/en/openni
[2] L. Gallo, A.P. Placitelli, and M. Ciampi. (2011, August 30).
“Controller-free Exploration of Medical Image Data: Experiencing the
Kinect.” Computer-Based Medical Systems, 2011 24th International
Symposium.
[Online].Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5999138&tag=1
[3] Gil. (2010, November 16). “How does the Kinect really work?.”
[Online].
Available:
http://gilotopia.blogspot.com/2010/11/how-doeskinect-really-work.html
[4] (2011, April 7). “How Microsoft’s Primesense Based Kinect Really
Works.”
Electronic
Design.
[Online].
Available:
http://web.ebscohost.com/ehost/pdfviewer/pdfviewer?sid=2d4a3ee0-8ebc48e8-bea8-080e3e2bfcd6%40sessionmgr110&vid=4&hid=110p.28-30
[5] H. Fairhead. (2011, March 30). “Kincect’s AI breakthrough explained.”
I
Programmer.
[Online].
Available:
http://www.iprogrammer.info/news/105-artificial-intelligence/2176-kinects-aibreakthrough-explained.html
[6] J. Huang. (2011, October 24). “Kinerehab: A Kinect-based System for
Physical Rehabilitation – A Pilot Study for Young Adults with Motor
Disablities.”
ACM
ASSETS’11.
[Online].
Available:
http://delivery.acm.org/10.1145/2050000/2049627/p319huang.pdf?ip=130.49.97.207&acc=ACTIVE%2
ETHICS OF THE KINECT
So far this paper has done a good job of making the Kinect
look like a miracle in a box. With every new engineering
feat however, the pros and cons must be carefully taken into
account before it is released to the public. This section,
therefore, will explain the ethical concern of invading the
privacy of the patients it is designed to protect .
The Kinect, as described before, could have the
potential to be set up in every room of a patient’s house.
Since the Kinect has a three-dimensional camera, a regular
camera, and a microphone on it, the information it collects
5
Thomas Forsythe
Marvin Green
0SERVICE&CFID=63879703&CFTOKEN=75141256&__acm__=132771
9365_5a0b35cf291d995e27b5f765e7140a2f
[7] (2012). “Introducing Kinect for Xbox 360.” Xbox 360 + Kinect.
[Online]. Available: http://www.xbox.com/en-US/kinect
[8] E.E. Stone and M. Skubic. (2011, May 23). “Evaluation of an
inexpensive depth camera for passive in-depth home fall risk assessment.”
PervasiveHealth
2011.
[Online].
Available:
http://www.engineeringvillage2.org/controller/servlet/Controller?SEARCHI
D=1eb566613510d760912f31prod3data1&CID=quickSearchAbstractForma
t&DOCINDEX=3&database=7&format=quickSearchAbstractFormat
ADDITIONAL RESOURCES
R. Rexit. (2011, December 15). “Visualiztaion of Posture (for Kinenct Pain
Recognizer).”
[Online].
Available:
https://docs.google.com/viewer?url=http%3A%2F%2Fwww.cs.pitt.edu%2F
~chang%2F231%2Fy11%2Fproj11%2Ffinalruh.pdf
M. Dapkus. ( 2011, December 15). “Natural Gesture Recognition using the
Microsoft
Kincect
System.”
[Online].
Available:
https://docs.google.com/viewer?url=http%3A%2F%2Fveryoldwww.cs.pitt.
edu%2F~chang%2F231%2Fy11%2Fproj11%2Ffinalmyko.pdf
ACKNOWLEDGEMENTS
We want to thank Bill Neiczpiel for helping us come up with
a topic we enjoyed writing about. We really had no idea the
Kinect could be used in so many different ways and truly
appreciate your help along the way.
6
Download