Core¶

This module defines abstract classes for different concepts, e.g. Environment, Agent, Robot. These abstract classes are useful for defining he interfaces used in this packages.

class clt_core.core.Env(name='', params={})¶

Base class for creating an Environment for a Markov Decision Process. An environment could be a PyBullet simulation, Mujoco Simulation or a real robotic environment. The interfaces that an environment should provide are:

reset(): Resets the environment for a new episode. E.g. creates randomly objects on a table for singulation.
step(action): Moves one step in time given an action.

Parameters

name (str) – A string with the name of the environment.
params (dict) – A dictionary with parameters for the environment.

reset()¶

Resets the environment. Basically it creates the initial state of the MDP.

Returns: The observation after the reset.
Return type: dict

seed(seed=None)¶

Implement it for seeding the random generators of the environment, if it implements any stochasticity.

Parameters: seed (int) – The seed

step(action)¶

Moves the environment one step forward in time.

Parameters: action – An action to perform.
Returns: The observation after the step.
Return type: dict

class clt_core.core.MDP(name='', params={})¶

Parameters

name (str) – A string with the name of the environment.
params (dict) – A dictionary with parameters for the environment.

action(agent_action)¶: Receives an action produced by an agent (e.g. the output of a network in [-1, 1] range) and transform it to an action for the environment (e.g. coordinates to be reached by the robot.

reset_goal()¶: Implement it for Goal MDPs

reward(env_state, next_env_state, action)¶: Calculates the reward for the given env_state, next env state and action.

state_representation(env_state)¶: Receives a raw state from the environment (RGBD, mask, simulation state) and transforms it in a feature to be fed to an agent.

terminal(env_state, next_env_state, action)¶

Returns an integer indicating the id of the terminal state. There are different types of ids:

id = 0: Not a terminal state (the episode does not terminate)
id > 0: Typical terminal state, the episode terminates.
-10 < id < 0: A terminal state that is considered terminal by the learning algorithm, but the episode
continuous in order to collect more data. This is useful when an action does not change the state of the environment.
id <= -10: Invalid terminal state. The episode will terminate without logging the last transition and
without calling agent.learn() for this transition.

Parameters: action – An action to perform.
Returns: The id of the terminal state.
Return type: int

class clt_core.core.Push(p1=None, p2=None)¶

A pushing action of two 3D points for init and final pos. Every pushing action should inherite this class.

rotate(quat)¶: Rot: rotation matrix

transform(pos, quat)¶: The ref frame

class clt_core.core.Robot(name='')¶

Base class for a robot, which can be a simulated robot or a real one. Each robot should provide interfaces for measuring joint states and commanding joint commands. A Robot is used by an Env.

Parameters: name (str) – A string with the name of the robot.

get_joint_position()¶

Returns the positions of the robot’s joints.

Returns: A list of floats in rad for the joint positions of the robot.
Return type: list

get_joint_velocity()¶

Returns the velocities of the robot’s joints.

Returns: A list of floats in rad/sec for the joint velocities of the robot.
Return type: list

get_task_pose()¶

Returns the Cartesian pose of the end effector with respect the base frame.

Returns

list – The position of the end-effector as a list with 3 elements
Quaternion – The orientation of the end-effector as quaternion

reset_joint_position(joint_position)¶

Resets the joint positions of the robot by teleporting it. This is of course for a simulated robot, not a real one :). For real robots just call set_joint_trajectory to smoothly move the root.

Parameters: joint_position (list) – A list of floats in rad for the commanded joint positions of the robot.

reset_task_pose(pos, quat)¶

Resets/Teleports the robot to the desired pose. This is of course for a simulated robot, not a real one :). For real robots just call set_task_pose_trajectory to smoothly move the robot.

Parameters

pos (list) – The desired position of the end-effector as a list with 3 elements
quat (Quaternion) – The desired orientation of the end-effector as quaternion

set_joint_position(joint_position)¶

Commands the robot with desired joint positions.

Parameters: joint_position (list) – A list of floats in rad for the commanded joint positions of the robot.

set_joint_trajectory(joint, duration)¶

Commands the robot with a desired joint configuration be reached with 5th order spline. The reaching is performed in a straight line.

Parameters

joint (list) – The joint configuration to be reached in rad.
duration (float) – The duration of the motion

set_joint_velocity(joint_velocity)¶

Commands the robot with desired joint velocities.

Parameters: joint_velocity (list) – A list of floats in rad/sec for the commanded joint velocities of the robot.

set_task_pose(pos, quat)¶

Commands the robot with a desired cartesian pose. This should solve the inverse kinematics of the robot for the desired pose and call set_joint_position. Returns the Cartesian pose of the end effector with respect the base frame.

Parameters

pos (list) – The position of the end-effector as a list with 3 elements
quat (Quaternion) – The orientation of the end-effector as quaternion

set_task_pose_trajectory(pos, quat, duration)¶

Commands the robot with a desired cartesian pose to be reached. The reaching is performed in a straight line.

Parameters

pos (list) – The position of the end-effector as a list with 3 elements
quat (Quaternion) – The orientation of the end-effector as quaternion
duration (float) – The duration of the motion

clt_core.core.run_episode(env, agent, mdp, max_steps=50, train=False, seed=0)¶

Runs an episode. Useful for training or evaluating RL agents.

Parameters

env (Env) – An object of an environment.
agent (MDP) – An object of an Agent.
agent – An object of an MDP.
max_steps (int) – The maximum steps to run the episode.
train (bool) – Set True for a training episode and False for an evaluation episode. In particular, if true the agent.explore() will be called instead of agent.predict(). Also the agent.learn() will be called for updating the agent’s policy.

Returns

A list of N dictionaries, where N the total number of timesteps the episode performed. Each dictionary contains data for each step, such as the Q-value, the reward and the terminal state’s id.

Return type

list