Q-functions¶

Q-function interfaces¶

class pfrl.q_function.StateQFunction[source]¶

Abstract Q-function with state input.

__call__(x)[source]¶

Evaluates Q-function

Parameters:	x (ndarray) – state input
Returns:	An instance of ActionValue that allows to calculate the Q-values for state x and every possible action

class pfrl.q_function.StateActionQFunction[source]¶

Abstract Q-function with state and action input.

__call__(x, a)[source]¶

Evaluates Q-function

Parameters:	x (ndarray) – state input a (ndarray) – action input
Returns:	Q-value for state x and action a

Q-function implementations¶

class pfrl.q_functions.DuelingDQN(n_actions, n_input_channels=4, activation=<function relu>, bias=0.1)[source]¶

Dueling Q-Network

See: http://arxiv.org/abs/1511.06581

class pfrl.q_functions.DistributionalDuelingDQN(n_actions, n_atoms, v_min, v_max, n_input_channels=4, activation=<built-in method relu of type object>, bias=0.1)[source]¶: Distributional dueling fully-connected Q-function with discrete actions.

class pfrl.q_functions.SingleModelStateQFunctionWithDiscreteAction(model)[source]¶

Q-function with discrete actions.

Parameters:	model (nn.Module) – Model that is callable and outputs action values.

class pfrl.q_functions.FCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected state-input Q-function with discrete actions.

Parameters:	n_dim_obs – number of dimensions of observation space n_actions (int) – Number of actions in action space. n_hidden_channels – number of hidden channels n_hidden_layers – number of hidden layers nonlinearity (callable) – Nonlinearity applied after each hidden layer. last_wscale (float) – Weight scale of the last layer.

class pfrl.q_functions.DistributionalSingleModelStateQFunctionWithDiscreteAction(model, z_values)[source]¶

Distributional Q-function with discrete actions.

Parameters:	model (nn.Module) – model that is callable and outputs atoms for each action. z_values (ndarray) – Returns represented by atoms. Its shape must be (n_atoms,).

class pfrl.q_functions.DistributionalFCStateQFunctionWithDiscreteAction(ndim_obs, n_actions, n_atoms, v_min, v_max, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Distributional fully-connected Q-function with discrete actions.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_actions (int) – Number of actions in action space.
n_atoms (int) – Number of atoms of return distribution.
v_min (float) – Minimum value this model can approximate.
v_max (float) – Maximum value this model can approximate.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity applied after each hidden layer.
last_wscale (float) – Weight scale of the last layer.

class pfrl.q_functions.FCQuadraticStateQFunction(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True)[source]¶

Fully-connected state-input continuous Q-function.

See: https://arxiv.org/abs/1603.00748

Parameters:	n_input_channels – number of input channels n_dim_action – number of dimensions of action space n_hidden_channels – number of hidden channels n_hidden_layers – number of hidden layers action_space – action_space scale_mu (bool) – scale mu by applying tanh if True

class pfrl.q_functions.SingleModelStateActionQFunction(model)[source]¶

Q-function with discrete actions.

Parameters:	model (nn.Module) – Module that is callable and outputs action values.

class pfrl.q_functions.FCSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class pfrl.q_functions.FCLSTMSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + LSTM (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.

class pfrl.q_functions.FCBNSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + BN (s,a)-input Q-function.

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers.
normalize_input (bool) – If set to True, Batch Normalization is applied to both observations and actions.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
last_wscale (float) – Scale of weight initialization of the last layer.

class pfrl.q_functions.FCBNLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected + BN (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
normalize_input (bool) – If set to True, Batch Normalization is applied
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.

class pfrl.q_functions.FCLateActionSAQFunction(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶

Fully-connected (s,a)-input Q-function with late action input.

Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971

Parameters:

n_dim_obs (int) – Number of dimensions of observation space.
n_dim_action (int) – Number of dimensions of action space.
n_hidden_channels (int) – Number of hidden channels.
n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
last_wscale (float) – Scale of weight initialization of the last layer.