gym_anm.envs.anm_env.ANMEnv
- class gym_anm.envs.anm_env.ANMEnv(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]
Bases:
EnvThe base class for
gym-anmenvironments.- K
The number of auxiliary variables.
- Type:
int
- gamma
The fixed discount factor in [0, 1].
- Type:
float
- lamb
The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
- Type:
int or float
- delta_t
The interval of time between two consecutive time steps (fraction of hour).
- Type:
float
- simulator
The electricity distribution network simulator.
- state_values
The electrical quantities to include in the state vectors. Each tuple (x, y, z) refers to quantity x at nodes/devices/branches y, using units z.
- Type:
list of tuple of str
- state_N
The number of state variables.
- Type:
int
- action_space
The action space from which the agent can select actions.
- Type:
gym.spaces.Box
- obs_values
Similarly to
state_values, the values to include in the observation vectors. If a customizedobservation()function is provided,obs_valuesis None.- Type:
list of str or None
- observation_space
The observation space from which observation vectors are constructed.
- Type:
gym.spaces.Box
- observation_N
The number of observation variables.
- Type:
int
- terminated
True if a terminal state has been reached (if the network collapsed); False otherwise.
- Type:
bool
- timestep
The current timestep.
- Type:
int
- state
The current state vector \(s_t\).
- Type:
numpy.ndarray
- e_loss
The energy loss during the last transition (part of the reward signal).
- Type:
float
- penalty
The penalty associated with violating operational constraints during the last transition (part of the reward signal).
- Type:
float
- costs_clipping
The clipping values for the costs (- rewards), where
costs_clipping[0]is the clipping value for the absolute energy loss andcosts_clipping[1]is the clipping value for the constraint violation penalty.- Type:
tuple of float
- pfe_converged
True if the last transition converged to a load flow solution (i.e., the network is stable); False otherwise.
- Type:
bool
- __init__(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]
- Parameters:
network (dict of {str : numpy.ndarray}) – The network input dictionary describing the power grid.
observation (callable or list or str) – The observation space. It can be specified as “state” to construct a fully observable environment (\(o_t = s_t\)); as a callable function such that \(o_t = observation(s_t)\); or as a list of tuples (x, y, z) that refer to the electrical quantities x (str) at the nodes/branches/devices y (list or ‘all’) in unit z (str, optional).
K (int) – The number of auxiliary variables.
delta_t (float) – The interval of time between two consecutive time steps (fraction of hour).
gamma (float) – The discount factor in [0, 1].
lamb (int or float) – The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
aux_bounds (numpy.ndarray, optional) – The bounds on the auxiliary internal variables as a 2D array where the \(k^{th}\)-1 auxiliary variable is bounded by
[aux_bounds[k, 0], aux_bounds[k, 1]]. This can be useful if auxiliary variables are to be included in the observation vectors and a bounded observation space is desired.costs_clipping (tuple of float, optional) – The clipping values for the costs in the reward signal, where element 0 is the clipping value for the energy loss cost and element 1 is the clipping value for the constraint-violation penalty (e.g., (1, 100)).
seed (int, optional) – A random seed.
Methods
__init__(network, observation, K, delta_t, ...)close()Close the rendering of the environment (to be overwritten).
get_wrapper_attr(name)Gets the attribute name from the environment.
has_wrapper_attr(name)Checks if the attribute name exists in the environment.
Sample an initial state \(s_0\).
next_vars(s_t)Sample internal variables.
observation(s_t)Returns the observation vector corresponding to the current state \(s_t\).
Builds the observation space of the environment.
render([mode])Update the rendering of the environment (to be overwritten).
reset(*[, seed, options])Reset the environment.
set_wrapper_attr(name, value)Sets the attribute name on the environment with value.
step(action)Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
Attributes
metadataReturns the environment's internal
_np_randomthat if not set will initialise with a random seed.Returns the environment's internal
_np_random_seedthat if not set will first initialise with a random int as seed.specReturns the base non-wrapped environment.
- close()[source]
Close the rendering of the environment (to be overwritten).
- Raises:
NotImplementedError –
- get_wrapper_attr(name: str) Any
Gets the attribute name from the environment.
- has_wrapper_attr(name: str) bool
Checks if the attribute name exists in the environment.
- init_state()[source]
Sample an initial state \(s_0\).
For reproducibility, the RandomState
self._np_randomshould be used to generate random numbers.- Returns:
An initial state vector \(s_0\).
- Return type:
numpy.ndarray
- next_vars(s_t)[source]
Sample internal variables.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The internal variables for the next timestep, following the structure \([P_l, P_g^{(max)}, aux^{(k)}]\), where \(P_l\) contains the load injections (ordered by device ID), \(P_g^{(max)}\) the maximum generation from non-slack generators (ordered by device ID), and \(aux^{(k)} `the auxiliary variables. The vector shape should be :code:`(N_load + (N_generators-1) + K,)\).
- Return type:
numpy.ndarray
- observation(s_t)[source]
Returns the observation vector corresponding to the current state \(s_t\).
Alternatively, this function can be overwritten in customized environments.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The corresponding observation vector \(o_t\).
- Return type:
numpy.ndarray
- observation_bounds()[source]
Builds the observation space of the environment.
If the observation space is specified as a callable object, then its bounds are set to
(- np.inf, np.inf)^{N_o}by default (this is done during thereset()call, as the size of observation vectors is not known before then). Alternatively, the user can specify their own bounds by overwriting this function in a subclass.- Returns:
The bounds of the observation space.
- Return type:
gym.spaces.Box or None
- render(mode='human')[source]
Update the rendering of the environment (to be overwritten).
- Raises:
NotImplementedError –
- reset(*, seed: int | None = None, options: dict | None = None)[source]
Reset the environment.
If the observation space is provided as a callable object but the
observation_bounds()method is not overwritten, then the bounds on the observation space are set to(- np.inf, np.inf)here (after the size of the observation vectors is known).- Parameters:
seed (int, optional) – A random seed for reproducibility.
options (dict, optional) – A dictionary of options to pass to the environment.
- Returns:
obs – The initial observation vector.
- Return type:
numpy.ndarray
- set_wrapper_attr(name: str, value: Any)
Sets the attribute name on the environment with value.
- step(action)[source]
Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
- Parameters:
action (numpy.ndarray) – The action vector \(a_t\) taken by the agent.
- Returns:
obs (numpy.ndarray) – The observation vector \(o_{t+1}\).
reward (float) – The reward associated with the transition \(r_t\).
terminated (bool) – True if a terminal state has been reached; False otherwise.
truncated (bool) – True if the episode was truncated; False otherwise. Always False here.
info (dict) – A dictionary with further information (used for debugging).
- property np_random: Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughreset()orset_np_random_through_seed(), the seed will take the value -1.- Returns:
int: the seed of the current np_random or -1, if the seed of the rng is unknown
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance