Incremental-learning for robot control

I-Jen Chiang, Jane Yung jen Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A robot can learn to act by trial and error in the world. A robot continues to obtain information about the environment from its sensors and to choose a suitable action to take. Having executed an action, the robot receives a reinforcement signal from the world indicating how well the action performed in that situation. The evaluation is used to adjust the robot's action selection policy for the given state. The process of learning the state-action function has been addressed by Watkins' Q-learning, Sutton's temporal-difference method, and Kaelbling's interval estimation method. One common problem with these reinforcement learning methods is that the convergence can be very slow due to the large state space. State clustering by least-square-error or Hamming distance, hierarchical learning architecture, and prioritized swapping can reduce the number of states, but a large portion of the space still has to be considered. This paper presents a new solution to this problem. A state is taken to be a combination of the robot's sensor status. Each sensor is viewed as an independent component. The importance of each sensor status relative to each action is computed based on the frequency of its occurrences. Not all sensors are needed for every action. For example, the forward sensors play the most important roles when the robot is moving forward.

Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Systems, Man and Cybernetics
Editors Anon
PublisherIEEE
Pages4331-4336
Number of pages6
Volume5
Publication statusPublished - 1995
Externally publishedYes
EventProceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics. Part 2 (of 5) - Vancouver, BC, Can
Duration: Oct 22 1995Oct 25 1995

Other

OtherProceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics. Part 2 (of 5)
CityVancouver, BC, Can
Period10/22/9510/25/95

Fingerprint

Robots
Sensors
Hamming distance
Reinforcement learning
Reinforcement

ASJC Scopus subject areas

  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

Chiang, I-J., & Hsu, J. Y. J. (1995). Incremental-learning for robot control. In Anon (Ed.), Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (Vol. 5, pp. 4331-4336). IEEE.

Incremental-learning for robot control. / Chiang, I-Jen; Hsu, Jane Yung jen.

Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. ed. / Anon. Vol. 5 IEEE, 1995. p. 4331-4336.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chiang, I-J & Hsu, JYJ 1995, Incremental-learning for robot control. in Anon (ed.), Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. vol. 5, IEEE, pp. 4331-4336, Proceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics. Part 2 (of 5), Vancouver, BC, Can, 10/22/95.
Chiang I-J, Hsu JYJ. Incremental-learning for robot control. In Anon, editor, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 5. IEEE. 1995. p. 4331-4336
Chiang, I-Jen ; Hsu, Jane Yung jen. / Incremental-learning for robot control. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. editor / Anon. Vol. 5 IEEE, 1995. pp. 4331-4336
@inproceedings{9a91923a9a294057b2abe74de68ba635,
title = "Incremental-learning for robot control",
abstract = "A robot can learn to act by trial and error in the world. A robot continues to obtain information about the environment from its sensors and to choose a suitable action to take. Having executed an action, the robot receives a reinforcement signal from the world indicating how well the action performed in that situation. The evaluation is used to adjust the robot's action selection policy for the given state. The process of learning the state-action function has been addressed by Watkins' Q-learning, Sutton's temporal-difference method, and Kaelbling's interval estimation method. One common problem with these reinforcement learning methods is that the convergence can be very slow due to the large state space. State clustering by least-square-error or Hamming distance, hierarchical learning architecture, and prioritized swapping can reduce the number of states, but a large portion of the space still has to be considered. This paper presents a new solution to this problem. A state is taken to be a combination of the robot's sensor status. Each sensor is viewed as an independent component. The importance of each sensor status relative to each action is computed based on the frequency of its occurrences. Not all sensors are needed for every action. For example, the forward sensors play the most important roles when the robot is moving forward.",
author = "I-Jen Chiang and Hsu, {Jane Yung jen}",
year = "1995",
language = "English",
volume = "5",
pages = "4331--4336",
editor = "Anon",
booktitle = "Proceedings of the IEEE International Conference on Systems, Man and Cybernetics",
publisher = "IEEE",

}

TY - GEN

T1 - Incremental-learning for robot control

AU - Chiang, I-Jen

AU - Hsu, Jane Yung jen

PY - 1995

Y1 - 1995

N2 - A robot can learn to act by trial and error in the world. A robot continues to obtain information about the environment from its sensors and to choose a suitable action to take. Having executed an action, the robot receives a reinforcement signal from the world indicating how well the action performed in that situation. The evaluation is used to adjust the robot's action selection policy for the given state. The process of learning the state-action function has been addressed by Watkins' Q-learning, Sutton's temporal-difference method, and Kaelbling's interval estimation method. One common problem with these reinforcement learning methods is that the convergence can be very slow due to the large state space. State clustering by least-square-error or Hamming distance, hierarchical learning architecture, and prioritized swapping can reduce the number of states, but a large portion of the space still has to be considered. This paper presents a new solution to this problem. A state is taken to be a combination of the robot's sensor status. Each sensor is viewed as an independent component. The importance of each sensor status relative to each action is computed based on the frequency of its occurrences. Not all sensors are needed for every action. For example, the forward sensors play the most important roles when the robot is moving forward.

AB - A robot can learn to act by trial and error in the world. A robot continues to obtain information about the environment from its sensors and to choose a suitable action to take. Having executed an action, the robot receives a reinforcement signal from the world indicating how well the action performed in that situation. The evaluation is used to adjust the robot's action selection policy for the given state. The process of learning the state-action function has been addressed by Watkins' Q-learning, Sutton's temporal-difference method, and Kaelbling's interval estimation method. One common problem with these reinforcement learning methods is that the convergence can be very slow due to the large state space. State clustering by least-square-error or Hamming distance, hierarchical learning architecture, and prioritized swapping can reduce the number of states, but a large portion of the space still has to be considered. This paper presents a new solution to this problem. A state is taken to be a combination of the robot's sensor status. Each sensor is viewed as an independent component. The importance of each sensor status relative to each action is computed based on the frequency of its occurrences. Not all sensors are needed for every action. For example, the forward sensors play the most important roles when the robot is moving forward.

UR - http://www.scopus.com/inward/record.url?scp=0029531281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029531281&partnerID=8YFLogxK

M3 - Conference contribution

VL - 5

SP - 4331

EP - 4336

BT - Proceedings of the IEEE International Conference on Systems, Man and Cybernetics

A2 - Anon, null

PB - IEEE

ER -