Core¶
This module defines abstract classes for different concepts, e.g. Environment, Agent, Robot. These abstract classes are useful for defining he interfaces used in this packages.
- class clt_core.core.Env(name='', params={})¶
Base class for creating an Environment for a Markov Decision Process. An environment could be a PyBullet simulation, Mujoco Simulation or a real robotic environment. The interfaces that an environment should provide are:
reset(): Resets the environment for a new episode. E.g. creates randomly objects on a table for singulation.
step(action): Moves one step in time given an action.
- Parameters
name (str) – A string with the name of the environment.
params (dict) – A dictionary with parameters for the environment.
- reset()¶
Resets the environment. Basically it creates the initial state of the MDP.
- Returns
The observation after the reset.
- Return type
dict
- seed(seed=None)¶
Implement it for seeding the random generators of the environment, if it implements any stochasticity.
- Parameters
seed (int) – The seed
- step(action)¶
Moves the environment one step forward in time.
- Parameters
action – An action to perform.
- Returns
The observation after the step.
- Return type
dict
- class clt_core.core.MDP(name='', params={})¶
- Parameters
name (str) – A string with the name of the environment.
params (dict) – A dictionary with parameters for the environment.
- action(agent_action)¶
Receives an action produced by an agent (e.g. the output of a network in [-1, 1] range) and transform it to an action for the environment (e.g. coordinates to be reached by the robot.
- reset_goal()¶
Implement it for Goal MDPs
- reward(env_state, next_env_state, action)¶
Calculates the reward for the given env_state, next env state and action.
- state_representation(env_state)¶
Receives a raw state from the environment (RGBD, mask, simulation state) and transforms it in a feature to be fed to an agent.
- terminal(env_state, next_env_state, action)¶
- Returns an integer indicating the id of the terminal state. There are different types of ids:
id = 0: Not a terminal state (the episode does not terminate)
id > 0: Typical terminal state, the episode terminates.
- -10 < id < 0: A terminal state that is considered terminal by the learning algorithm, but the episode
continuous in order to collect more data. This is useful when an action does not change the state of the environment.
- id <= -10: Invalid terminal state. The episode will terminate without logging the last transition and
without calling agent.learn() for this transition.
- Parameters
action – An action to perform.
- Returns
The id of the terminal state.
- Return type
int
- class clt_core.core.Push(p1=None, p2=None)¶
A pushing action of two 3D points for init and final pos. Every pushing action should inherite this class.
- rotate(quat)¶
Rot: rotation matrix
- transform(pos, quat)¶
The ref frame
- class clt_core.core.Robot(name='')¶
Base class for a robot, which can be a simulated robot or a real one. Each robot should provide interfaces for measuring joint states and commanding joint commands. A Robot is used by an Env.
- Parameters
name (str) – A string with the name of the robot.
- get_joint_position()¶
Returns the positions of the robot’s joints.
- Returns
A list of floats in rad for the joint positions of the robot.
- Return type
list
- get_joint_velocity()¶
Returns the velocities of the robot’s joints.
- Returns
A list of floats in rad/sec for the joint velocities of the robot.
- Return type
list
- get_task_pose()¶
Returns the Cartesian pose of the end effector with respect the base frame.
- Returns
list – The position of the end-effector as a list with 3 elements
Quaternion – The orientation of the end-effector as quaternion
- reset_joint_position(joint_position)¶
Resets the joint positions of the robot by teleporting it. This is of course for a simulated robot, not a real one :). For real robots just call set_joint_trajectory to smoothly move the root.
- Parameters
joint_position (list) – A list of floats in rad for the commanded joint positions of the robot.
- reset_task_pose(pos, quat)¶
Resets/Teleports the robot to the desired pose. This is of course for a simulated robot, not a real one :). For real robots just call set_task_pose_trajectory to smoothly move the robot.
- Parameters
pos (list) – The desired position of the end-effector as a list with 3 elements
quat (Quaternion) – The desired orientation of the end-effector as quaternion
- set_joint_position(joint_position)¶
Commands the robot with desired joint positions.
- Parameters
joint_position (list) – A list of floats in rad for the commanded joint positions of the robot.
- set_joint_trajectory(joint, duration)¶
Commands the robot with a desired joint configuration be reached with 5th order spline. The reaching is performed in a straight line.
- Parameters
joint (list) – The joint configuration to be reached in rad.
duration (float) – The duration of the motion
- set_joint_velocity(joint_velocity)¶
Commands the robot with desired joint velocities.
- Parameters
joint_velocity (list) – A list of floats in rad/sec for the commanded joint velocities of the robot.
- set_task_pose(pos, quat)¶
Commands the robot with a desired cartesian pose. This should solve the inverse kinematics of the robot for the desired pose and call set_joint_position. Returns the Cartesian pose of the end effector with respect the base frame.
- Parameters
pos (list) – The position of the end-effector as a list with 3 elements
quat (Quaternion) – The orientation of the end-effector as quaternion
- set_task_pose_trajectory(pos, quat, duration)¶
Commands the robot with a desired cartesian pose to be reached. The reaching is performed in a straight line.
- Parameters
pos (list) – The position of the end-effector as a list with 3 elements
quat (Quaternion) – The orientation of the end-effector as quaternion
duration (float) – The duration of the motion
- clt_core.core.run_episode(env, agent, mdp, max_steps=50, train=False, seed=0)¶
Runs an episode. Useful for training or evaluating RL agents.
- Parameters
env (Env) – An object of an environment.
agent (MDP) – An object of an Agent.
agent – An object of an MDP.
max_steps (int) – The maximum steps to run the episode.
train (bool) – Set True for a training episode and False for an evaluation episode. In particular, if true the agent.explore() will be called instead of agent.predict(). Also the agent.learn() will be called for updating the agent’s policy.
- Returns
A list of N dictionaries, where N the total number of timesteps the episode performed. Each dictionary contains data for each step, such as the Q-value, the reward and the terminal state’s id.
- Return type
list