Resolving Automated Perception System Failures in Bin-Picking Tasks

advertisement
Resolving Automated Perception System Failures in Bin-Picking Tasks
Using Assistance from Remote Human Operators
Krishnanand N. Kaipa, Srudeep Somnaath Thevendria-Karthic, Shaurya Shriyam, Ariyan M. Kabir,
Joshua D. Langsfeld, and Satyandra K. Gupta
Maryland Robotics Center
University of Maryland, College Park, MD 20742
Email: skgupta@umd.edu
Abstract— We present an approach to resolve automated
perception failures during bin-picking operations in hybrid
assembly cells. Our model exploits complementary strengths
of humans and robots. Whereas the robot performs binpicking and proceeds to the subsequent operation like kitting or
assembly, a remotely located human assists the robot in critical
situations by resolving any automated perception problems
encountered during bin-picking. We present the design details
of our overall system comprising an automated part recognition system and a remote user interface that allows effective
information exchange between the human and the robot that
is geared toward solutions that minimize human operator time
in resolving the detected perception failures. We use illustrative
real robot experiments to show that human-robot information
exchange leads to improved bin-picking performance.
I. I NTRODUCTION
The National Association of Manufacturers estimates that
the United States has close to 300,000 small and medium
manufacturers (SMMs), representing a very important segment of the manufacturing sector. Currently, many manufacturing operations at SMMs are largely manual. Examples include machine loading/unloading, part inspection,
part cleaning, bin-picking, and assembly. In contrast, these
manual operations are often performed by robots on mass
production lines. This clearly shows the potential of robots
in manufacturing. However, current industrial robots are not
considered useful in small production volume operations.
Hence, SMMs have largely restrained from using them. As
we move towards shorter product life cycles and customized
products, the future of manufacturing in the US will depend
upon the SMMs’ ability to remain cost competitive. High
labor costs makes it difficult for SMM to remain cost competitive in high wage markets. However, setting up purely
robotic cells is not a viable option for most SMM.
Recently several advances have been made in industrial
robots that make them safer for humans [1], [2], [3] and
hence presenting an opportunity for creating hybrid work
cells where humans and robots can collaborate in close
physical proximities [4], [5], [6]. The underlying idea behind
such cells is to decompose assembly operations into tasks
such that humans and robots can collaborate by performing
tasks that are especially suitable for them. Several new
low cost robots have been introduced in the market over
the last three years, making them cost effective in many
manufacturing applications where utilization may not be very
high. This makes the idea of hybrid cells economically viable
in small volume production.
In this paper, we present an approach to address perception
failures during bin-picking tasks. The bin-picking operation
involves identifying, locating, and picking a desired part
from a container of randomly scattered parts. Usually, this
operation is followed by either a kitting operation or an
assembly operation. Many research groups have addressed
the problem of enabling robots, guided by machine-vision
and other sensor modalities, to carry out bin-picking tasks
[7], [8], [9]. The problem is very challenging and still not
fully solved due to severe conditions commonly found in
factory environments [10], [11]. In particular, unstructured
bins present diverse scenarios affording varying degrees of
part recognition accuracies: 1) Parts may assume widely
different postures, 2) parts may overlap with other parts,
and 3) parts may be either partially or completely occluded.
The problem is compounded due to factors like background
clutter, shadows, complex reflectance properties of parts
made of various materials, and poorly lit conditions.
Our approach to address this problem primarily exploits the fact that humans and robots share complementary
strengths in performing tasks. Whereas robots can repetitiously perform routine pick-and-place operations without
any fatigue, humans excel at their perception and prediction
capabilities in unstructured environments. They are able to
recognize and locate a part from a bin of miscellaneous
parts. Their sensory and mental-rehearsal capabilities enable
humans to respond to unexpected situations. Accordingly,
a deficit-compensation model can be designed as follows:
the robot performs bin-picking under normal conditions and
subsequently proceeds to assembly, while the human bails
the robot in critical situations by resolving any perception
problems encountered during bin-picking. Figure 1 shows
a schematic of the envisioned hybrid work cell for a kitting
operation consisting of four robots and two human operators.
In this paper, we restrict ourselves to only one robot in the
work cell and only one remotely located human operator.
The collaboration is achieved by developing techniques
for effective information exchange between the human and
the robot. We assume that human operators will not have
any programming expertise and hence they will need to
exchange information with robots without writing code. We
will need to figure out the least-time consuming way to
Fig. 1.
Hybrid work cell for kitting operations with four robots and two humans
elicit the required information from human operators and
least confusing way to deliver the information to the human
operators. We mainly focus on structure of the information
and the best mode to get and deliver the information. Primary
research issues in this context include:
• What is the most convenient way for robots to seek
assistance from human operators in the assembly cell?
• What is the most convenient way for humans to provide
information to robots when robots need assistance from
humans in completing a task?
• What is the most convenient way for robots to assist
human in recovering from an error?
Primary means by which information can be delivered
to human operators include speech, text, graphics [12],
[13], [14], [15], virtual 3D environments [16], [17], and
augmented reality [18], [19], [20]. Examples of augmented
reality systems include a tracked head worn display that
augments a human operator’s view with text, labels, arrows,
and animations [19] and laser pointer mounted on a robot
highlighting where a cable must be inserted [18]. Humans
usually deliver task specific information to the robot either
by teleoperation or a graphical user interface [21].
II. A PPROACH
The robot uses an automated part recognition system to
recognize a part and estimate its posture, plans its motion
in order to grasp and transfer the part from the bin to the
assembly area. However, if the robot determines that the
part recognition is uncertain from the current scene, then
it initiates a collaboration with a remotely located human
operator. The particular bin scenario determines the specific
nature of collaboration between the robot and the human.
In particular, we address the problem of how the remote
human can extract the relevant information that can be
effectively used to resolve issues of part recognition and
posture determination. For this purpose, we have developed
an user interface with controls that allows a human operator
to provide approximate postural information. A 3D matching
algorithm uses the solution provided by the human as an
initial seed and generates better estimates. A flowchart of
the information exchange scheme is shown in Fig. 2. A brief
description of each subsystem follows.
A. Automated Perception System
The baseline automation perception system used in this
paper is built using Ensenso [22], a 3D stereo camera that
provides point clouds of observed scenes. The Ensenso camera works using “projected texture stereo vision” principle.
It has two integrated CMOS sensors and a projector that
casts a random point pattern onto objects in the scene. This
pattern enables capturing images of surfaces without any
texture. The camera is interfaced to Halcon [23], a machine
vision software that compares these point clouds with the
CAD model of a target part to find part instances and the
corresponding postures in the scene. The success rate of this
system for easy-to-perceive and difficult-to-perceive parts is
around 90% and 60%, respectively. Perception failures by
the automation system is mainly due to uncertainty present
in the sensed point cloud owing to several factors like
background clutter, occlusions, shadows, and complex reflectance properties. Moreover, different postures may result
in varying qualities of respective point clouds, especially for
parts with arbitrary geometries (Some illustrative examples
are shown in Section III). These issues make it difficult for
3D-registration algorithms to find good matches.
B. Remote User Interface
When the automated perception fails, the robot seeks help
from the human operator by sending the raw camera image
of the scene, the corresponding point cloud, and the index of
the desired part to be picked. Accordingly, the user interface
consists of the following display fields:
1) Raw camera image of the scene comprising the bin of
parts (sent by automated perception system)
2) Point cloud of the scene obtained using a 3D camera
3) Perspective view of 3D CAD model of the target part
4) Display field to visualize the match between the CAD
model and the point cloud
scene as arguments. However, in the ICP-variant that we use
in this paper, we create different subsets of the CAD model
of the target part corresponding to different (maximally
separated) views of the part and compare the cropped point
cloud with each of these subsets to find a best match. We use
the k-d tree matching type for faster computation. Currently,
we use a constant weight for all point pairs. Rejection of
certain pairs is achieved based on Euclidean distance in order
to remove any outliers. The error metric of sum-of-squared
distances between corresponding points along with singular
value decomposition was used to find the transformation that
minimized the error. The extrapolation option was used in
which the iteration direction was evaluated and extrapolated
if possible using the method as described in [24].
D. Accuracy/Time Tradeoff
Fig. 2. Flowchart of information exchange scheme between robot and
remote human operator
The human operator primarily provide two inputs:
1) Selecting the region of interest. The human initially
crops the raw image around the region containing the
desired part. This information is used by the interface
to remove as many points in the point cloud as possible
that do not correspond to the desired part.
2) Generating an initial seed for a matching algorithm.
The human adjusts the posture of the 3D model until
it lies is in the vicinity of the reduced point could of
the selected region. This will be used to initialize a
matching algorithm.
The above actions are enabled by the following user controls:
(1) Cursor-based initialization of the position of the CAD
model, (2) icon-based selection of initial orientation of the
CAD model, joy stick interface to control roll, pitch, and
yaw orientations of the CAD model, (3) cursor-based region
of interest selection, and (4) Trigger button to initiate the
back end matching algorithm. The user interface was coded
in MATLAB software.
C. 3D-Matching Algorithm
We use a variant of Iterative Closest Point (ICP) [24] as
the matching algorithm that runs in the back end of the
user interface. The ICP implementation [25] available at
MATLAB Central file exchange was used for this purpose.
Variants of ICP are usually achieved by making modifications to different stages of the algorithm including selection
of points in one or both meshes, matching type (e.g., brute
force, Delaunay, k-d tree, etc.), weighting of pairs, rejecting
certain pairs, assigning an error metric, and minimizing the
error metric [26].
The standard version of the ICP algorithm takes the full
point cloud sets of the reference model and the observed
There is a tradeoff between accuracy and time needed
to extract the data. However, orientation accuracy impacts
grasping performance. The accuracy needed to successfully
grasp a part depends on its shape complexity and its particular posture. This information is pre-determined for each
part and conveyed to the human operator so that he/she can
stop the estimation process once a good enough orientation
accuracy is obtained. For this purpose, we placed a single
instance of the target part on a tripod and used a digital
inclination meter to set the orientation of the part at a
known posture. In one sample experiment, we used a nominal
orientation of 30 degrees about the longitudinal axis of the
part and 35 degrees about the lateral axis of the part. Now,
we manually introduced 2-degree increments of perception
error about each axis and observed its impact on grasping
performance. For the part shown in Fig. 4(b), we noticed
that the robot was able to successfully grasp up to an error
of ±8 degrees about the longitudinal axis. We noticed a high
asymmetry about the lateral axis with successful grasping up
to 8 degrees in the clockwise direction and only 2 degrees
in the counter clockwise direction.
III. I LLUSTRATIVE E XPERIMENTS
The experimental setup consists of a Baxter robot, an
automated perception system built using Ensenso 3D camera
and Halcon software, and an user interface that communicates with the perception system remotely via the Internet (Figs. 3(a) and 3(b)). We considered representative
industrial parts (Fig. 4(a)) that afford different recognition
and grasping complexities to illustrate various challenges
encountered during the bin-picking task. In this paper, we
focussed our experiments with respect to the part shown
in Fig. 4(b). This part presents both recognition as well as
grasping complexities. In particular, the quality of the point
cloud corresponding to this part is heavily influenced by its
orientation relative to the 3D camera. Whereas the part is
symmetric along its longitudinal axis, it is asymmetric along
its lateral axis making the grasping problem nontrivial. We
consider the following two bin scenarios.
Fig. 3.
(a) Baxter robot equipped with an automated perception system built using Ensenso 3D camera and Halcon software. (b) Remote user interface.
Fig. 5. (a) Uniform bin: scene 1. (b) Part match found by automated perception system. Pose [x,y,z,roll,pitch,yaw] = (0.141415, -0.116390, 0.720805,
273.484056, 315.422986, 104.475811). (c) Robot uses the detected postural
information to pick up the target part.
Fig. 4. (a) Set of industrial parts used in bin-picking experiments. (b) CAD
model of the target part to be picked by the robot.
invoking of the ICP algorithm. This information is relayed in
realtime to the robot. Next, the robot proceeds with picking
up of the part (Fig. 6(c)).
B. Mixed Bins
A. Uniform Bins
In this regime, bins have same type of parts. Figure 5(a)
shows one such example. The automated perception system
succeeds in detecting one instance of the desired part to be
picked and its postural information (Fig. 5(b)). The Baxter
robot uses this information to find a motion plan and pick
the detected part (Fig. 5(c)). Figure 6(a) shows an example
in which the automated system fails to find a match. This
triggers the sending of the relevant data to the remote human
operator. Figures 7(a) and 7(b) show snapshots of the part
match found by the human using manual adjusting and
In this regime, bins have different types of parts. Figure
9(a) shows one such example. The automated perception
system succeeds in detecting one instance of the desired
part to be picked and its postural information (Fig. 9(b)).
The Baxter robot uses this information to find a motion
plan and pick the detected part (Fig. 9(c)). Figure 10(a)
shows an example in this regime on which the automated
system fails to find a match. This triggers the sending of
the relevant data to the remote human operator. Figures 8(a)
and 8(b) show snapshots of the part match found by the
human operator using manual adjusting and invoking of the
Fig. 6. (a) Uniform bin: scene 2. (b) Perception failure by automated
perception system. (c) Robot uses the postural information relayed by the
remote human operator to pick up the target part.
Fig. 8. Failure case of mixed bin scenario resolved by remote human
operator using the user interface: (a) Snapshot of initial display. (b) Snapshot
of scene point cloud and CAD model after match is found. (c) Point cloud
of the cropped scene. (d) Posture of the matched CAD model. (e) Display
of the final posture values of the CAD model.
Fig. 7. Failure case of uniform bin scenario resolved by remote human
operator using the user interface: ((a) Snapshot of initial display. (b)
Snapshot of scene point cloud and CAD model after match is found. (c)
Point cloud of the cropped scene. (d) Posture of the matched CAD model.
(e) Display of the final posture values of the CAD model.
ICP algorithm. This information is relayed in realtime to the
robot. Subsequently the robot proceeds with picking up of
the part (Fig. 10(c)). We have observed that in both failure
cases the human operator was able to find part matches and
postural information in a matter of few seconds.
IV. C ONCLUSIONS
We presented design details of our approach that enables
resolving of automated perception failures in bin-picking
tasks using assistance from remote human operators. We used
illustrative experiments to present different regimes in which
human robot information exchange can take place to resolve
perception problems encountered during bin-picking. In this
paper, we considered bin-picking used for assembly tasks.
However, our approach can be extended to the general problem of bin-picking as applied to other industrial tasks like
packaging. More experiments based empirical evaluations
are in order for systematically testing the ideas presented in
the paper. The human-robot collaboration based bin-picking
described in this paper is one of the key modules required to
achieve hybrid work cells for industrial tasks. In our previous
work, we have developed other related modules including
sequence planning for complex assemblies [27], instruction
generation for human operations [28], ensuring human safety
[29], and a framework for replanning to recover from errors
[30]. As part of the future work, we plan to integrate these
individual modules in order to realize the overall operation
of the envisioned hybrid cell.
R EFERENCES
[1] Baxter,
”Baxter - Rethink Robotics”.
[Online: 2012].
http://www.rethinkrobotics.com/products/baxter/.
[2] Kuka,
”Kuka LBR IV”.
[Online: 2013]. http://www.kukalabs.com/en/medical robotics/lightweight robotics/.
[3] ABB, ”ABB Friendly Robot for Industrial Dual-Arm FRIDA”. [Online:
2013]. http://www.abb.us/cawp/abbzh254
/8657f5e05ede6ac5c1257861002c8ed2.aspx.
[4] Krüger, J., Lien, T., and Verl, A., 2009. “Cooperation of human and
machines in assembly lines”. CIRP Annals - Manufacturing Technology,
58(2), pp. 628 – 646.
[5] Shi, J., Jimmerson, G., Pearson, T., and Menassa, R., 2012. “Levels of
human and robot collaboration for automotive manufacturing”. In Proc.
Workshop on Performance Metrics for Intelligent Systems, pp. 95–100.
[6] Shi, J., and Menassa, R., 2012. “Transitional or partnership human and
robot collaboration for automotive assembly”. In Proc. Workshop on
Performance Metrics for Intelligent Systems, pp. 187–194.
[7] Buchholz, D., Winkelbach, S., and Wahl, F.M., 2010. “RANSAM for
industrial bin-picking”. In Proc. International Symposium on Robotics
and German Conference on Robotics.
Fig. 9.
(a) Mixed bin: scene 1. (b) Part match found by automated
perception system. Pose = (0.141415, -0.116390, 0.720805, 273.484056,
315.422986, 104.475811). (c) Robot uses the detected postural information
to pick up the target part.
Fig. 10. (a) Mixed bin: scene 2. (b) Perception failure by automated
perception system. (c) Robot uses the postural information relayed by the
remote human operator to pick up the target part.
[8] Balakirsky, S., Kootbally, Z., Schlenoff, C., Kramer, T., and Gupta, S.K.,
2012 “An industrial robotic knowledge representation for kit building
applications”. In Proceedings of IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2012), PP. 1365–1370.
[9] Schyja, A., Hypki, A.,and Kuhlenkotter, B., 2012. “A modular and
extensible framework for real and virtual bin-picking environments”.
In Proceedings of IEEE International Conference on Robotics and
Automation, pp. 5246–5251.
[10] Liu, M-Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T.K.,
and Chellappa, R., 2012. “Fast object localization and pose estimation
in heavy clutter for robotic bin picking”. The International Journal of
Robotics Research, 31(8), pp. 951–973.
[11] Marvel, J.A., Saidi, K., Eastman, R., Hong, T., Cheok, G., and
Messina, E., 2012. “Technology Readiness Levels for Randomized Bin
Picking”, In Proceedings of the Workshop on Performance Metrics for
Intelligent Systems, pp. 109–113.
[12] J. Heiser, D. Phan, M. Agrawala, B. Tversky, and P. Hanrahan, “Identification and validation of cognitive design principles for automated
generation of assembly instructions,” in Proceedings of the Working
Conference on Advanced Visual Interfaces, ser. AVI ’04. New York,
NY, USA: ACM, 2004, pp. 311–319.
[13] M. Dalal, S. Feiner, K. McKeown, S. Pan, M. Zhou, T. Höllerer,
J. Shaw, Y. Feng, and J. Fromer, “Negotiation for automated generation
of temporal multimedia presentations,” in Proceedings of the Fourth
ACM International Conference on Multimedia, ser. MULTIMEDIA ’96.
New York, NY, USA: ACM, 1996, pp. 55–64.
[14] G. Zimmerman, J. Barnes, and L. Leventhal, “A comparison of the
usability and effectiveness of web-based delivery of instructions for
inherently-3d construction tasks on handheld and desktop computers,”
in Proc. International Conference on 3D Web Technology, New York,
NY, USA: ACM, 2003, pp. 49–54.
[15] S. Kim, I. Woo, R. Maciejewski, D. S. Ebert, T. D. Ropp, and
K. Thomas, “Evaluating the effectiveness of visualization techniques
for schematic diagrams in maintenance tasks,” in Proceedings of the
7th Symposium on Applied Perception in Graphics and Visualization,
ser. APGV ’10. New York, NY, USA: ACM, 2010, pp. 33–40.
[16] D. Dionne, S. de la Puente, C. León, R. Hervás, and P. Gervás,
“A model for human readable instruction generation using level-based
discourse planning and dynamic inference of attributes disambiguation,”
in Proceedings of the 12th European Workshop on Natural Language
Generation, ser. ENLG ’09. Stroudsburg, PA, USA: Association for
Computational Linguistics, 2009, pp. 66–73.
[17] J. E. Brough, M. Schwartz, S. K. Gupta, D. K. Anand, R. Kavetsky,
and R. Pettersen, “Towards the development of a virtual environmentbased training system for mechanical assembly operations” Virtual
Reality, vol. 11, no. 4, pp. 189–206, 2007.
[18] F. Duan, J. Tan, J. G. Tong, R. Kato, and T. Arai, “Application of
the assembly skill transfer system in an actual cellular manufacturing
system,” Automation Science and Engineering, IEEE Transactions on,
vol. 9, no. 1, pp. 31–41, Jan 2012.
[19] S. Henderson and S. Feiner, “Exploring the benefits of augmented
reality documentation for maintenance and repair,” Visualization and
Computer Graphics, IEEE Transactions on, vol. 17, no. 10, pp. 1355–
1368, Oct 2011.
[20] D. Kalkofen, M. Tatzgern, and D. Schmalstieg, “Explosion diagrams
in augmented reality,” in Virtual Reality Conference, 2009. VR 2009.
IEEE, March 2009, pp. 71–78.
[21] A. Pichler and C. Wogerer, “Towards robot systems for small batch
manufacturing,” in Assembly and Manufacturing (ISAM), 2011 IEEE
International Symposium on, May 2011, pp. 1–6.
[22] Ensenso 3D Camera,
”Ensenso N10 3D Camera - IDS
Imaging Development Systems GmbH”.
https://en.idsimaging.com/store/produkte/kameras/ensenso-n10-3d-usb-2-0.html.
[23] Halcon Software,
”Halcon 12.0 - MvTec Software GmbH”.
http://www.halcon.com/.
[24] Besl, P.J. and McKay, Neil D., 1992. “A method for registration of
3-D shapes”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 14(2). pp. 239–256.
[25] Wilm, J.
”ICP code - MATLAB Central File Exchange”.
http://www.mathworks.com/matlabcentral/fileexchange/27804-iterativeclosest-point/content//icp.m
[26] Rusinkiewicz, S., and Levoy, M., 2001. “Efficient variants of the ICP
algorithm”. In Proceedings of the Third International Conference on
3D Digital Imaging and Modeling, pp. 145-152.
[27] Morato, C., Kaipa, K. N., and Gupta, S. K., 2013. “Improving Assembly Precedence Constraint Generation by Utilizing Motion Planning
and Part Interaction Clusters”. Journal of Computer-Aided Design, 45
(11), pp. 1349–1364.
[28] Kaipa, K. N., Morato, C., Zhao, B., and Gupta, S. K. “Instruction
generation for assembly operation performed by humans”. In ASME
Computers and Information in Engineering Conference, Chicago, IL,
August 2012.
[29] Morato, C., Kaipa, K. N., and Gupta, S. K., 2014. “Toward Safe
Human Robot Collaboration by using Multiple Kinects based Real-time
Human Tracking”. Journal of Computing and Information Science in
Engineering, 14(1), pp. 011006.
[30] Morato, C., Kaipa, K. N., Liu, J., and Gupta, S. K., 2014. “A
framework for hybrid cells that support safe and efficient human-robot
collaboration in assembly operations”. ASME International Design
Engineering Technical Conferences & Computers and Information in
Engineering Conference, Buffalo, New York.
Download