gym_anm.envs.anm6_env.anm6.ANM6
- class gym_anm.envs.anm6_env.anm6.ANM6(observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=(None, None), seed=None)[source]
Bases:
ANMEnvThe base class for a 6-bus and 7-device
gym-anmenvironment.The structure of the electricity distribution network used for this environment is shown below:
- Slack —————————-
- | |
—– ——- —–
| | | | |House PV Factory Wind EV DES
This environment supports rendering (web-based) through the functions
render()andclose().- __init__(observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=(None, None), seed=None)[source]
- Parameters:
network (dict of {str : numpy.ndarray}) – The network input dictionary describing the power grid.
observation (callable or list or str) – The observation space. It can be specified as “state” to construct a fully observable environment (\(o_t = s_t\)); as a callable function such that \(o_t = observation(s_t)\); or as a list of tuples (x, y, z) that refer to the electrical quantities x (str) at the nodes/branches/devices y (list or ‘all’) in unit z (str, optional).
K (int) – The number of auxiliary variables.
delta_t (float) – The interval of time between two consecutive time steps (fraction of hour).
gamma (float) – The discount factor in [0, 1].
lamb (int or float) – The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
aux_bounds (numpy.ndarray, optional) – The bounds on the auxiliary internal variables as a 2D array where the \(k^{th}\)-1 auxiliary variable is bounded by
[aux_bounds[k, 0], aux_bounds[k, 1]]. This can be useful if auxiliary variables are to be included in the observation vectors and a bounded observation space is desired.costs_clipping (tuple of float, optional) – The clipping values for the costs in the reward signal, where element 0 is the clipping value for the energy loss cost and element 1 is the clipping value for the constraint-violation penalty (e.g., (1, 100)).
seed (int, optional) – A random seed.
Methods
__init__(observation, K, delta_t, gamma, lamb)close()Close the rendering.
get_wrapper_attr(name)Gets the attribute name from the environment.
has_wrapper_attr(name)Checks if the attribute name exists in the environment.
Sample an initial state \(s_0\).
next_vars(s_t)Sample internal variables.
observation(s_t)Returns the observation vector corresponding to the current state \(s_t\).
Builds the observation space of the environment.
render([mode, skip_frames])Render the current state of the environment.
reset(*[, seed, options])Reset the environment.
reset_date(date_init)Reset the date displayed in the visualization (and the year count).
set_wrapper_attr(name, value)Sets the attribute name on the environment with value.
step(action)Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
Attributes
metadataReturns the environment's internal
_np_randomthat if not set will initialise with a random seed.Returns the environment's internal
_np_random_seedthat if not set will first initialise with a random int as seed.render_modespecReturns the base non-wrapped environment.
action_spaceobservation_space- get_wrapper_attr(name: str) Any
Gets the attribute name from the environment.
- has_wrapper_attr(name: str) bool
Checks if the attribute name exists in the environment.
- init_state()
Sample an initial state \(s_0\).
For reproducibility, the RandomState
self._np_randomshould be used to generate random numbers.- Returns:
An initial state vector \(s_0\).
- Return type:
numpy.ndarray
- next_vars(s_t)
Sample internal variables.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The internal variables for the next timestep, following the structure \([P_l, P_g^{(max)}, aux^{(k)}]\), where \(P_l\) contains the load injections (ordered by device ID), \(P_g^{(max)}\) the maximum generation from non-slack generators (ordered by device ID), and \(aux^{(k)} `the auxiliary variables. The vector shape should be :code:`(N_load + (N_generators-1) + K,)\).
- Return type:
numpy.ndarray
- observation(s_t)
Returns the observation vector corresponding to the current state \(s_t\).
Alternatively, this function can be overwritten in customized environments.
- Parameters:
s_t (numpy.ndarray) – The current state vector \(s_t\).
- Returns:
The corresponding observation vector \(o_t\).
- Return type:
numpy.ndarray
- observation_bounds()
Builds the observation space of the environment.
If the observation space is specified as a callable object, then its bounds are set to
(- np.inf, np.inf)^{N_o}by default (this is done during thereset()call, as the size of observation vectors is not known before then). Alternatively, the user can specify their own bounds by overwriting this function in a subclass.- Returns:
The bounds of the observation space.
- Return type:
gym.spaces.Box or None
- render(mode='human', skip_frames=0)[source]
Render the current state of the environment.
Visualizing the agent-environment interactions in real-time (e.g., during training) is hard to follow and not very useful, as the state of the distribution network changes too quickly (you can try with
mode='human'and :code`skip_frames=0`). Instead, settingskip_frames>0will only update the rendering of the environment every skip_frames`+1 steps (assuming :code:`render(skip_frames) is called after every step), which will make it much easier to follow for the human eye.- Parameters:
mode ({'human'}, optional) – The mode of rendering. If ‘human’, the environment is rendered while the agent interacts with it.
skip_frames (int, optional) – The number of frames (steps) to skip when rendering the environment. For example,
skip_frames=3will update the rendering of the environment every 4 calls torender().
- Raises:
NotImplementedError – If a non-valid mode is specified.
Notes
The use of
mode='human'andskip_frames>0assumes that :py:func`render()` is called after each step the agent takes in the environment. The same behavior can be achieved withskip_frames=0and calling :py:func`render()` less frequently.
- reset(*, seed: int | None = None, options: dict | None = None)[source]
Reset the environment.
If the observation space is provided as a callable object but the
observation_bounds()method is not overwritten, then the bounds on the observation space are set to(- np.inf, np.inf)here (after the size of the observation vectors is known).- Parameters:
seed (int, optional) – A random seed for reproducibility.
options (dict, optional) – A dictionary of options to pass to the environment.
- Returns:
obs – The initial observation vector.
- Return type:
numpy.ndarray
- set_wrapper_attr(name: str, value: Any)
Sets the attribute name on the environment with value.
- step(action)[source]
Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).
- Parameters:
action (numpy.ndarray) – The action vector \(a_t\) taken by the agent.
- Returns:
obs (numpy.ndarray) – The observation vector \(o_{t+1}\).
reward (float) – The reward associated with the transition \(r_t\).
terminated (bool) – True if a terminal state has been reached; False otherwise.
truncated (bool) – True if the episode was truncated; False otherwise. Always False here.
info (dict) – A dictionary with further information (used for debugging).
- property np_random: Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughreset()orset_np_random_through_seed(), the seed will take the value -1.- Returns:
int: the seed of the current np_random or -1, if the seed of the rng is unknown
- property unwrapped: Env[ObsType, ActType]
Returns the base non-wrapped environment.
- Returns:
Env: The base non-wrapped
gymnasium.Envinstance