gym_anm.envs.anm_env.ANMEnv

class gym_anm.envs.anm_env.ANMEnv(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]

Bases: Env

The base class for gym-anm environments.

K

The number of auxiliary variables.

Type:

int

gamma

The fixed discount factor in [0, 1].

Type:

float

lamb

The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).

Type:

int or float

delta_t

The interval of time between two consecutive time steps (fraction of hour).

Type:

float

simulator

The electricity distribution network simulator.

Type:

gym_anm.simulator.simulator.Simulator

state_values

The electrical quantities to include in the state vectors. Each tuple (x, y, z) refers to quantity x at nodes/devices/branches y, using units z.

Type:

list of tuple of str

state_N

The number of state variables.

Type:

int

action_space

The action space from which the agent can select actions.

Type:

gym.spaces.Box

obs_values

Similarly to state_values, the values to include in the observation vectors. If a customized observation() function is provided, obs_values is None.

Type:

list of str or None

observation_space

The observation space from which observation vectors are constructed.

Type:

gym.spaces.Box

observation_N

The number of observation variables.

Type:

int

terminated

True if a terminal state has been reached (if the network collapsed); False otherwise.

Type:

bool

render_mode

The rendering mode. See render().

Type:

str

timestep

The current timestep.

Type:

int

state

The current state vector \(s_t\).

Type:

numpy.ndarray

e_loss

The energy loss during the last transition (part of the reward signal).

Type:

float

penalty

The penalty associated with violating operational constraints during the last transition (part of the reward signal).

Type:

float

costs_clipping

The clipping values for the costs (- rewards), where costs_clipping[0] is the clipping value for the absolute energy loss and costs_clipping[1] is the clipping value for the constraint violation penalty.

Type:

tuple of float

pfe_converged

True if the last transition converged to a load flow solution (i.e., the network is stable); False otherwise.

Type:

bool

__init__(network, observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=None, seed=None)[source]
Parameters:
  • network (dict of {str : numpy.ndarray}) – The network input dictionary describing the power grid.

  • observation (callable or list or str) – The observation space. It can be specified as “state” to construct a fully observable environment (\(o_t = s_t\)); as a callable function such that \(o_t = observation(s_t)\); or as a list of tuples (x, y, z) that refer to the electrical quantities x (str) at the nodes/branches/devices y (list or ‘all’) in unit z (str, optional).

  • K (int) – The number of auxiliary variables.

  • delta_t (float) – The interval of time between two consecutive time steps (fraction of hour).

  • gamma (float) – The discount factor in [0, 1].

  • lamb (int or float) – The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).

  • aux_bounds (numpy.ndarray, optional) – The bounds on the auxiliary internal variables as a 2D array where the \(k^{th}\)-1 auxiliary variable is bounded by [aux_bounds[k, 0], aux_bounds[k, 1]]. This can be useful if auxiliary variables are to be included in the observation vectors and a bounded observation space is desired.

  • costs_clipping (tuple of float, optional) – The clipping values for the costs in the reward signal, where element 0 is the clipping value for the energy loss cost and element 1 is the clipping value for the constraint-violation penalty (e.g., (1, 100)).

  • seed (int, optional) – A random seed.

Methods

__init__(network, observation, K, delta_t, ...)

close()

Close the rendering of the environment (to be overwritten).

get_wrapper_attr(name)

Gets the attribute name from the environment.

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

init_state()

Sample an initial state \(s_0\).

next_vars(s_t)

Sample internal variables.

observation(s_t)

Returns the observation vector corresponding to the current state \(s_t\).

observation_bounds()

Builds the observation space of the environment.

render([mode])

Update the rendering of the environment (to be overwritten).

reset(*[, seed, options])

Reset the environment.

set_wrapper_attr(name, value)

Sets the attribute name on the environment with value.

step(action)

Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).

Attributes

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

np_random_seed

Returns the environment's internal _np_random_seed that if not set will first initialise with a random int as seed.

render_mode

spec

unwrapped

Returns the base non-wrapped environment.

action_space

observation_space

close()[source]

Close the rendering of the environment (to be overwritten).

Raises:

NotImplementedError

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

has_wrapper_attr(name: str) bool

Checks if the attribute name exists in the environment.

init_state()[source]

Sample an initial state \(s_0\).

For reproducibility, the RandomState self._np_random should be used to generate random numbers.

Returns:

An initial state vector \(s_0\).

Return type:

numpy.ndarray

next_vars(s_t)[source]

Sample internal variables.

Parameters:

s_t (numpy.ndarray) – The current state vector \(s_t\).

Returns:

The internal variables for the next timestep, following the structure \([P_l, P_g^{(max)}, aux^{(k)}]\), where \(P_l\) contains the load injections (ordered by device ID), \(P_g^{(max)}\) the maximum generation from non-slack generators (ordered by device ID), and \(aux^{(k)} `the auxiliary variables. The vector shape should be :code:`(N_load + (N_generators-1) + K,)\).

Return type:

numpy.ndarray

observation(s_t)[source]

Returns the observation vector corresponding to the current state \(s_t\).

Alternatively, this function can be overwritten in customized environments.

Parameters:

s_t (numpy.ndarray) – The current state vector \(s_t\).

Returns:

The corresponding observation vector \(o_t\).

Return type:

numpy.ndarray

observation_bounds()[source]

Builds the observation space of the environment.

If the observation space is specified as a callable object, then its bounds are set to (- np.inf, np.inf)^{N_o} by default (this is done during the reset() call, as the size of observation vectors is not known before then). Alternatively, the user can specify their own bounds by overwriting this function in a subclass.

Returns:

The bounds of the observation space.

Return type:

gym.spaces.Box or None

render(mode='human')[source]

Update the rendering of the environment (to be overwritten).

Raises:

NotImplementedError

reset(*, seed: int | None = None, options: dict | None = None)[source]

Reset the environment.

If the observation space is provided as a callable object but the observation_bounds() method is not overwritten, then the bounds on the observation space are set to (- np.inf, np.inf) here (after the size of the observation vectors is known).

Parameters:
  • seed (int, optional) – A random seed for reproducibility.

  • options (dict, optional) – A dictionary of options to pass to the environment.

Returns:

obs – The initial observation vector.

Return type:

numpy.ndarray

set_wrapper_attr(name: str, value: Any)

Sets the attribute name on the environment with value.

step(action)[source]

Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).

Parameters:

action (numpy.ndarray) – The action vector \(a_t\) taken by the agent.

Returns:

  • obs (numpy.ndarray) – The observation vector \(o_{t+1}\).

  • reward (float) – The reward associated with the transition \(r_t\).

  • terminated (bool) – True if a terminal state has been reached; False otherwise.

  • truncated (bool) – True if the episode was truncated; False otherwise. Always False here.

  • info (dict) – A dictionary with further information (used for debugging).

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

int: the seed of the current np_random or -1, if the seed of the rng is unknown

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance