gym_anm.envs.anm_env.ANMEnv
- class gym_anm.envs.anm_env.ANMEnv(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]
Bases:
Env
The base class for
gym-anm
environments.- K
The number of auxiliary variables.
- Type:
int
- gamma
The fixed discount factor in [0, 1].
- Type:
float
- lamb
The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
- Type:
int or float
- delta_t
The interval of time between two consecutive time steps (fraction of hour).
- Type:
float
- simulator
The electricity distribution network simulator.
- state_values
The electrical quantities to include in the state vectors. Each tuple (x, y, z) refers to quantity x at nodes/devices/branches y, using units z.
- Type:
list of tuple of str
- state_N
The number of state variables.
- Type:
int
- action_space
The action space from which the agent can select actions.
- Type:
gym.spaces.Box
- obs_values
Similarly to
state_values
, the values to include in the observation vectors. If a customizedobservation()
function is provided,obs_values
is None.- Type:
list of str or None
- observation_space
The observation space from which observation vectors are constructed.
- Type:
gym.spaces.Box
- observation_N
The number of observation variables.
- Type:
int
- done
True if a terminal state has been reached (if the network collapsed); False otherwise.
- Type:
bool
- timestep
The current timestep.
- Type:
int
- state
The current state vector \(s_t\).
- Type:
numpy.ndarray
- e_loss
The energy loss during the last transition (part of the reward signal).
- Type:
float
- penalty
The penalty associated with violating operational constraints during the last transition (part of the reward signal).
- Type:
float
- costs_clipping
The clipping values for the costs (- rewards), where
costs_clipping[0]
is the clipping value for the absolute energy loss andcosts_clipping[1]
is the clipping value for the constraint violation penalty.- Type:
tuple of float
- pfe_converged
True if the last transition converged to a load flow solution (i.e., the network is stable); False otherwise.
- Type:
bool
- np_random
The random state/seed of the environment.
- Type:
numpy.random.RandomState
- __init__(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]
- Parameters:
network (dict of {str : numpy.ndarray}) – The network input dictionary describing the power grid.
observation (callable or list or str) – The observation space. It can be specified as “state” to construct a fully observable environment (\(o_t = s_t\)); as a callable function such that \(o_t = observation(s_t)\); or as a list of tuples (x, y, z) that refer to the electrical quantities x (str) at the nodes/branches/devices y (list or ‘all’) in unit z (str, optional).
K (int) – The number of auxiliary variables.
delta_t (float) – The interval of time between two consecutive time steps (fraction of hour).
gamma (float) – The discount factor in [0, 1].
lamb (int or float) – The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
aux_bounds (numpy.ndarray, optional) – The bounds on the auxiliary internal variables as a 2D array where the \(k^{th}\)-1 auxiliary variable is bounded by
[aux_bounds[k, 0], aux_bounds[k, 1]]
. This can be useful if auxiliary variables are to be included in the observation vectors and a bounded observation space is desired.costs_clipping (tuple of float, optional) – The clipping values for the costs in the reward signal, where element 0 is the clipping value for the energy loss cost and element 1 is the clipping value for the constraint-violation penalty (e.g., (1, 100)).
seed (int, optional) – A random seed.
Methods
__init__
(network, observation, K, delta_t, ...)- param network:
The network input dictionary describing the power grid.
close
()Close the rendering of the environment (to be overwritten).
Sample an initial state \(s_0\).
next_vars
(s_t)Sample internal variables.
observation
(s_t)Returns the observation vector corresponding to the current state \(s_t\).
Builds the observation space of the environment.
render
([mode])Update the rendering of the environment (to be overwritten).
reset
()Reset the environment.
seed
([seed])Seed the random number generator.
step
(action)Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
Attributes
metadata
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.reward_range
spec
Returns the base non-wrapped environment.
- close()[source]
Close the rendering of the environment (to be overwritten).
- Raises:
NotImplementedError –
- init_state()[source]
Sample an initial state \(s_0\).
For reproducibility, the RandomState
self.np_random
should be used to generate random numbers.- Returns:
An initial state vector \(s_0\).
- Return type:
numpy.ndarray
- next_vars(s_t)[source]
Sample internal variables.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The internal variables for the next timestep, following the structure \([P_l, P_g^{(max)}, aux^{(k)}]\), where \(P_l\) contains the load injections (ordered by device ID), \(P_g^{(max)}\) the maximum generation from non-slack generators (ordered by device ID), and \(aux^{(k)} `the auxiliary variables. The vector shape should be :code:`(N_load + (N_generators-1) + K,)\).
- Return type:
numpy.ndarray
- observation(s_t)[source]
Returns the observation vector corresponding to the current state \(s_t\).
Alternatively, this function can be overwritten in customized environments.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The corresponding observation vector \(o_t\).
- Return type:
numpy.ndarray
- observation_bounds()[source]
Builds the observation space of the environment.
If the observation space is specified as a callable object, then its bounds are set to
(- np.inf, np.inf)^{N_o}
by default (this is done during thereset()
call, as the size of observation vectors is not known before then). Alternatively, the user can specify their own bounds by overwriting this function in a subclass.- Returns:
The bounds of the observation space.
- Return type:
gym.spaces.Box or None
- render(mode='human')[source]
Update the rendering of the environment (to be overwritten).
- Raises:
NotImplementedError –
- reset()[source]
Reset the environment.
If the observation space is provided as a callable object but the
observation_bounds()
method is not overwritten, then the bounds on the observation space are set to(- np.inf, np.inf)
here (after the size of the observation vectors is known).- Returns:
obs – The initial observation vector.
- Return type:
numpy.ndarray
- step(action)[source]
Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
- Parameters:
action (numpy.ndarray) – The action vector \(a_t\) taken by the agent.
- Returns:
obs (numpy.ndarray) – The observation vector \(o_{t+1}\).
reward (float) – The reward associated with the transition \(r_t\).
done (bool) – True if a terminal state has been reached; False otherwise.
info (dict) – A dictionary with further information (used for debugging).
- property np_random: Generator
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.
- property unwrapped: Env
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped gym.Env instance