Q-functions¶
Q-function interfaces¶
Q-function implementations¶
-
class
pfrl.q_functions.
DuelingDQN
(n_actions, n_input_channels=4, activation=<function relu>, bias=0.1)[source]¶ Dueling Q-Network
-
class
pfrl.q_functions.
DistributionalDuelingDQN
(n_actions, n_atoms, v_min, v_max, n_input_channels=4, activation=<built-in method relu of type object>, bias=0.1)[source]¶ Distributional dueling fully-connected Q-function with discrete actions.
-
class
pfrl.q_functions.
SingleModelStateQFunctionWithDiscreteAction
(model)[source]¶ Q-function with discrete actions.
Parameters: model (nn.Module) – Model that is callable and outputs action values.
-
class
pfrl.q_functions.
FCStateQFunctionWithDiscreteAction
(ndim_obs, n_actions, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected state-input Q-function with discrete actions.
Parameters: - n_dim_obs – number of dimensions of observation space
- n_actions (int) – Number of actions in action space.
- n_hidden_channels – number of hidden channels
- n_hidden_layers – number of hidden layers
- nonlinearity (callable) – Nonlinearity applied after each hidden layer.
- last_wscale (float) – Weight scale of the last layer.
-
class
pfrl.q_functions.
DistributionalSingleModelStateQFunctionWithDiscreteAction
(model, z_values)[source]¶ Distributional Q-function with discrete actions.
Parameters: - model (nn.Module) – model that is callable and outputs atoms for each action.
- z_values (ndarray) – Returns represented by atoms. Its shape must be (n_atoms,).
-
class
pfrl.q_functions.
DistributionalFCStateQFunctionWithDiscreteAction
(ndim_obs, n_actions, n_atoms, v_min, v_max, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Distributional fully-connected Q-function with discrete actions.
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_actions (int) – Number of actions in action space.
- n_atoms (int) – Number of atoms of return distribution.
- v_min (float) – Minimum value this model can approximate.
- v_max (float) – Maximum value this model can approximate.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers.
- nonlinearity (callable) – Nonlinearity applied after each hidden layer.
- last_wscale (float) – Weight scale of the last layer.
-
class
pfrl.q_functions.
FCQuadraticStateQFunction
(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True)[source]¶ Fully-connected state-input continuous Q-function.
See: https://arxiv.org/abs/1603.00748
Parameters: - n_input_channels – number of input channels
- n_dim_action – number of dimensions of action space
- n_hidden_channels – number of hidden channels
- n_hidden_layers – number of hidden layers
- action_space – action_space
- scale_mu (bool) – scale mu by applying tanh if True
-
class
pfrl.q_functions.
SingleModelStateActionQFunction
(model)[source]¶ Q-function with discrete actions.
Parameters: model (nn.Module) – Module that is callable and outputs action values.
-
class
pfrl.q_functions.
FCSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected (s,a)-input Q-function.
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_dim_action (int) – Number of dimensions of action space.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
pfrl.q_functions.
FCLSTMSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected + LSTM (s,a)-input Q-function.
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_dim_action (int) – Number of dimensions of action space.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
pfrl.q_functions.
FCBNSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected + BN (s,a)-input Q-function.
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_dim_action (int) – Number of dimensions of action space.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers.
- normalize_input (bool) – If set to True, Batch Normalization is applied to both observations and actions.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
pfrl.q_functions.
FCBNLateActionSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected + BN (s,a)-input Q-function with late action input.
Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_dim_action (int) – Number of dimensions of action space.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
- normalize_input (bool) – If set to True, Batch Normalization is applied
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
- last_wscale (float) – Scale of weight initialization of the last layer.
-
class
pfrl.q_functions.
FCLateActionSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fully-connected (s,a)-input Q-function with late action input.
Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971
Parameters: - n_dim_obs (int) – Number of dimensions of observation space.
- n_dim_action (int) – Number of dimensions of action space.
- n_hidden_channels (int) – Number of hidden channels.
- n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
- nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
- last_wscale (float) – Scale of weight initialization of the last layer.