Skip to main content

[]

Intended for healthcare professionals
Skip to main content
Restricted access
Research article
First published online August 29, 2019

Learning task-oriented grasping for tool manipulation from simulated self-supervision

Abstract

Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and, thus, properly grasping and manipulating the tool to achieve the task. Most work in robotics has focused on task-agnostic grasping, which optimizes for only grasp robustness without considering the subsequent manipulation tasks. In this article, we propose the Task-Oriented Grasping Network (TOG-Net) to jointly optimize both task-oriented grasping of a tool and the manipulation policy for that tool. The training process of the model is based on large-scale simulated self-supervision with procedurally generated tool objects. We perform both simulated and real-world experiments on two tool-based manipulation tasks: sweeping and hammering. Our model achieves overall 71.1% task success rate for sweeping and 80.0% task success rate for hammering.

Get full access to this article

View all access and purchase options for this article.

References

Baber C (2003) Cognition and Tool Use: Forms of Engagement in Human and Animal Use of Tools. Boca Raton, FL: CRC Press.
Bohg J, Morales A, Asfour T, Kragic D (2014) Data-driven grasp synthesis – a survey. IEEE Transactions on Robotics 30(2): 289–309.
Bousmalis K, Irpan A, Wohlhart P, et al. (2017) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857.
Bousmalis K, Irpan A, Wohlhart P, et al. (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE International Conference on Robotics and Automation. IEEE, pp. 4243–4250.
Brown S, Sammut C (2012) Tool use and learning in robots. In: Encyclopedia of the Sciences of Learning. Berlin: Springer, pp. 3327–3330.
Ciocarlie MT, Allen PK (2009) Hand posture subspaces for dexterous robotic grasping. The International Journal of Robotics Research 28(7): 851–867.
Coumans E, Bai Y (2016) PyBullet, a Python module for physics simulation, games, robotics and machine learning. Available at: http://pybullet.org/ (accessed August 2019).
Dang H, Allen PK (2012) Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1311–1317.
Detry R, Papon J, Matthies LH (2017) Task-oriented grasping with semantic and geometric scene understanding. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3266–3273.
Do TT, Nguyen A, Reid I, Caldwell DG, Tsagarakis NG (2017) Affordancenet: An end-to-end deep learning approach for object affordance detection. arXiv preprint arXiv:1709.07326.
Dogar M, Srinivasa S (2011) A framework for push-grasping in clutter. In: Robotics: Science and Systems.
Eitel A, Hauff N, Burgard W (2017) Learning to singulate objects using a push proposal network. arXiv preprint arXiv:1707.08101.
Fang K, Bai Y, Hinterstoisser S, Savarese S, Kalakrishnan M (2018a) Multi-task domain adaptation for deep learning of instance grasping from simulation. In: IEEE International Conference on Robotics and Automation.
Fang K, Zhu Y, Garg A, et al. (2018b) Learning task-oriented grasping for tool manipulation from simulated self-supervision. In: Robotics: Science and Systems.
Ferrari C, Canny J (1992) Planning optimal grasps. In: IEEE International Conference on Robotics and Automation, pp. 2290–2295.
Fitzpatrick P, Metta G, Natale L, Rao S, Sandini G (2003) Learning about objects through action-initial steps towards artificial cognition. In: IEEE International Conference on Robotics and Automation, Vol. 3, pp. 3140–3145.
Gibson JJ (1979) The Ecological Approach to Visual Perception. Houghton Mifflin.
Goldfeder C, Ciocarlie M, Dang H, Allen PK (2009) The Columbia grasp database. In: IEEE International Conference on Robotics and Automation, pp. 1710–1716.
Hartley R, Zisserman A (2003) Multiple View Geometry in Computer Vision. Cambridge: Cambridge University Press.
Haschke R, Steil JJ, Steuwer I, Ritter H (2005) Task-oriented quality measures for dextrous grasping. In: Proceedings 2005 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA 2005), pp. 689–694.
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
Herzog A, Pastor P, Kalakrishnan M, Righetti L, Asfour T, Schaal S (2012) Template-based learning of grasp selection. In: 2012 IEEE International Conference on Robotics and Automation, pp. 2379–2384.
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456.
Jain R, Inamura T (2011) Learning of tool affordances for autonomous tool manipulation. In: 2011 IEEE/SICE International Symposium on System Integration (SII), pp. 814–819.
Jang E, Vijaynarasimhan S, Pastor P, Ibarz J, Levine S (2017) End-to-end learning of semantic grasping. arXiv preprint arXiv:1707.01932.
Jiang Y, Lim M, Saxena A (2012) Learning object arrangements in 3D scenes using human context. arXiv preprint arXiv:1206.6462.
Kalashnikov D, Irpan A, Pastor P, et al. (2018) QT-OPT: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293.
Kappler D, Bohg J, Schaal S (2015) Leveraging big data for grasp planning. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 4304–4311.
Katz D, Venkatraman A, Kazemi M, Bagnell JA, Stentz A (2014) Perceiving, learning, and exploiting object affordances for autonomous pile manipulation. Autonomous Robots 37(4): 369–382.
Kokic M, Stork JA, Haustein JA, Kragic D (2017) Affordance detection for task-specific grasping using big ol’ neural nets. In: 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). IEEE, pp. 91–98.
Komizunai S, Teppei T, Fumiya N, Nomura Y, Owa T (2008) Experiments on hammering a nail by a humanoid robot HRP-2. In: Proceedings of the 17th CISM-IFToMM Symposium on Robot Design, Dynamics, and Control.
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research 32(8): 951–970.
Kroemer O, Ugur E, Öztop E, Peters J (2012) A kernel-based approach to direct action perception. In: 2012 IEEE International Conference on Robotics and Automation, pp. 2605–2610.
Laskey M, Chuck C, Lee J, et al. (2017) Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. In: IEEE International Conference on Robotics and Automation, pp. 358–365.
Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34(4–5): 705–724.
Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2016) Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37(4–5): 421–436.
Li Z, Sastry SS (1988) Task-oriented optimal grasping by multifingered robot hands. IEEE Journal on Robotics and Automation 4(1): 32–44.
Lynch KM, Mason MT (1996) Stable pushing: Mechanics, controllability, and planning. The International Journal of Robotics Research 15(6): 533–556.
Mahler J, Liang J, Niyaz S, et al. (2017) Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312.
Mahler J, Pokorny FT, Hou B, et al. (2016) Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In: IEEE International Conference on Robotics and Automation, pp. 1957–1964.
Mamou K, Ghorbel F (2009) A simple and efficient approach for 3D mesh approximate convex decomposition. In: 2009 16th IEEE International Conference on Image Processing.
Mar T, Tikhanoff V, Metta G, Natale L (2015) Self-supervised learning of grasp dependent tool affordances on the iCub humanoid robot. In: IEEE International Conference on Robotics and Automation, pp. 3200–3206.
Mar T, Tikhanoff V, Metta G, Natale L (2017) Self-supervised learning of tool affordances from 3D tool representation through parallel som mapping. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 894–901.
Meriçli T, Veloso M, Akn HL (2015) Push-manipulation of complex passive mobile objects using experimentally acquired motion models. Autonomous Robots 38(3): 317–329.
Miller AT, Knoop S, Christensen HI, Allen PK (2003) Automatic grasp planning using shape primitives. In: IEEE International Conference on Robotics and Automation, Vol. 2, pp. 1824–1829.
Morrison D, Tow AW, McTaggart M, et al. (2018) Cartman: The low-cost Cartesian manipulator that won the Amazon robotics challenge. In: 2018 IEEE International Conference on Robotics and Automation, pp. 7757–7764.
Osiurak F, Jarry C, Le Gall D (2010) Grasping the affordances, understanding the reasoning: toward a dialectical theory of human tool use. Psychological Review 117(2): 517.
Pinto L, Gupta A (2016) Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In: IEEE International Conference on Robotics and Automation, pp. 3406–3413.
Prats M, Sanz PJ, Del Pobil AP (2007) Task-oriented grasping using hand preshapes and task frames. In: IEEE International Conference on Robotics and Automation. IEEE, pp. 1794–1799.
Rodriguez A, Mason MT, Ferry S (2012) From caging to grasping. The International Journal of Robotics Research 31(7): 886–900.
Rubinstein RY, Kroese DP (2004) The cross-entropy method: A unified approach to monte carlo simulation, randomized optimization and machine learning. In: Information Science & Statistics. New York: Springer Verlag.
Saxena A, Driemeyer J, Ng AY (2008) Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27(2): 157–173.
Song D, Huebner K, Kyrki V, Kragic D (2010) Learning task constraints for robot grasping using graphical models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1579–1585.
Stoytchev A (2005) Behavior-grounded representation of tool affordances. In: IEEE International Conference on Robotics and Automation, pp. 3060–3065.
ten Pas A, Gualtieri M, Saenko K, Platt R (2017) Grasp pose detection in point clouds. The International Journal of Robotics Research 36: 1455–1473.
Tobin J, Zaremba W, Abbeel P (2017) Domain randomization and generative models for robotic grasping. CoRR abs/1710.06425.
Varadarajan KM, Vincze M (2012) Afrob: The affordance network ontology for robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1343–1350.
Viereck U, Pas At, Saenko K, Platt R (2017) Learning a visuomotor controller for real world robotic grasping using easily simulated depth images. arXiv preprint arXiv:1706.04652.
Weisz J, Allen PK (2012) Pose error robust grasping from contact wrench space metrics. In: IEEE International Conference on Robotics and Automation, pp. 557–562.
Williamson MM (1999) Robot arm control exploiting natural dynamics. PhD Thesis, Massachusetts Institute of Technology.
Yan X, Hsu J, Khansari M, et al. (2018) Learning 6-DOF grasping interaction via deep geometry-aware 3D representations. In: 2018 IEEE International Conference on Robotics and Automation. IEEE, pp. 1–9.
Zhu Y, Fathi A, Fei-Fei L (2014) Reasoning about object affordances in a knowledge base representation. In: European Conference on Computer Vision.
Zhu Y, Zhao Y, Chun Zhu S (2015) Understanding tools: Task-oriented object modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2855–2864.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
Email Article Link
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: August 29, 2019
Issue published: March 2020

Keywords

  1. Grasping
  2. manipulation
  3. learning and adaptive systems

Rights and permissions

© The Author(s) 2019.
Request permissions for this article.

Authors

Affiliations

Yuke Zhu
Stanford University, Stanford, CA, USA
Animesh Garg
Stanford University, Stanford, CA, USA
Nvidia, Santa Clara, CA, USA
Andrey Kurenkov
Stanford University, Stanford, CA, USA
Viraj Mehta
Stanford University, Stanford, CA, USA
Li Fei-Fei
Stanford University, Stanford, CA, USA
Silvio Savarese
Stanford University, Stanford, CA, USA

Notes

Kuan Fang, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA. Email: [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in The International Journal of Robotics Research.

View All Journal Metrics

Article usage*

Total views and downloads: 3013

*Article usage tracking started in December 2016


Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores



Articles citing this one

Receive email alerts when this article is cited

Web of Science: 84 view articles Opens in new tab

Crossref: 107

  1. Fusion-Perception-to-Action Transformer: Enhancing Robotic Manipulation With 3-D Visual Fusion Attention and Proprioception
    Go to citationCrossrefGoogle Scholar
  2. Task-Oriented Tool Manipulation With Robotic Dexterous Hands: A Knowledge Graph Approach From Fingers to Functionality
    Go to citationCrossrefGoogle Scholar
  3. A Dynamic Movement Primitives-Based Tool Use Skill Learning and Transfer Framework for Robot Manipulation
    Go to citationCrossrefGoogle Scholar
  4. Manipulability-Aware Task-Oriented Grasp Planning and Motion Control with Application in a Seven-DoF Redundant Dual-Arm Robot
    Go to citationCrossrefGoogle Scholar
  5. Computer Vision – ECCV 2024
    Go to citationCrossrefGoogle Scholar
  6. Learning Visual Affordance Grounding From Demonstration Videos
    Go to citationCrossrefGoogle Scholar
  7. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Go to citationCrossrefGoogle Scholar
  8. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Go to citationCrossrefGoogle Scholar
  9. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Go to citationCrossrefGoogle Scholar
  10. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    Go to citationCrossrefGoogle Scholar
  11. View More

Figures and tables

Figures & Media

Tables

View Options

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

IOM3 members can access this journal content using society membership credentials.

IOM3 members can access this journal content using society membership credentials.


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.

View options

PDF/EPUB

View PDF/EPUB

Full Text

View Full Text