gym_anm.envs.anm6_env.anm6.ANM6

class gym_anm.envs.anm6_env.anm6.ANM6(observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=(None, None), seed=None)[source]

Bases: ANMEnv

The base class for a 6-bus and 7-device gym-anm environment.

The structure of the electricity distribution network used for this environment is shown below:

Slack —————————-: | |

—– ——- —–

| | | | |

House PV Factory Wind EV DES

This environment supports rendering (web-based) through the functions render() and close().

__init__(observation, K, delta_t, gamma, lamb, aux_bounds=None, costs_clipping=(None, None), seed=None)[source]

Parameters:

network (dict of {str : numpy.ndarray}) – The network input dictionary describing the power grid.
observation (callable or list or str) – The observation space. It can be specified as “state” to construct a fully observable environment (\(o_t = s_t\)); as a callable function such that \(o_t = observation(s_t)\); or as a list of tuples (x, y, z) that refer to the electrical quantities x (str) at the nodes/branches/devices y (list or ‘all’) in unit z (str, optional).
K (int) – The number of auxiliary variables.
delta_t (float) – The interval of time between two consecutive time steps (fraction of hour).
gamma (float) – The discount factor in [0, 1].
lamb (int or float) – The factor multiplying the penalty associated with violating operational constraints (used in the reward signal).
aux_bounds (numpy.ndarray, optional) – The bounds on the auxiliary internal variables as a 2D array where the \(k^{th}\)-1 auxiliary variable is bounded by [aux_bounds[k, 0], aux_bounds[k, 1]]. This can be useful if auxiliary variables are to be included in the observation vectors and a bounded observation space is desired.
costs_clipping (tuple of float, optional) – The clipping values for the costs in the reward signal, where element 0 is the clipping value for the energy loss cost and element 1 is the clipping value for the constraint-violation penalty (e.g., (1, 100)).
seed (int, optional) – A random seed.

Methods

`__init__`(observation, K, delta_t, gamma, lamb)	param network: The network input dictionary describing the power grid.
`close`()	Close the rendering.
`init_state`()	Sample an initial state \(s_0\).
`next_vars`(s_t)	Sample internal variables.
`observation`(s_t)	Returns the observation vector corresponding to the current state \(s_t\).
`observation_bounds`()	Builds the observation space of the environment.
`render`([mode, skip_frames])	Render the current state of the environment.
`reset`([date_init])	Reset the environment.
`reset_date`(date_init)	Reset the date displayed in the visualization (and the year count).
`seed`([seed])	Seed the random number generator.
`step`(action)	Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).

Attributes

`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`render_mode`
`reward_range`
`spec`
`unwrapped`	Returns the base non-wrapped environment.
`action_space`
`observation_space`

close()[source]: Close the rendering.

init_state()

Sample an initial state \(s_0\).

For reproducibility, the RandomState self.np_random should be used to generate random numbers.

Returns:: An initial state vector \(s_0\).
Return type:: numpy.ndarray

next_vars(s_t)

Sample internal variables.

Parameters:: s_t (numpy.ndarray) – The current state vector \(s_t\).
Returns:: The internal variables for the next timestep, following the structure \([P_l, P_g^{(max)}, aux^{(k)}]\), where \(P_l\) contains the load injections (ordered by device ID), \(P_g^{(max)}\) the maximum generation from non-slack generators (ordered by device ID), and \(aux^{(k)} `the auxiliary variables. The vector shape should be :code:`(N_load + (N_generators-1) + K,)\).
Return type:: numpy.ndarray

observation(s_t)

Returns the observation vector corresponding to the current state \(s_t\).

Alternatively, this function can be overwritten in customized environments.

Parameters:: s_t (numpy.ndarray) – The current state vector \(s_t\).
Returns:: The corresponding observation vector \(o_t\).
Return type:: numpy.ndarray

observation_bounds()

Builds the observation space of the environment.

If the observation space is specified as a callable object, then its bounds are set to (- np.inf, np.inf)^{N_o} by default (this is done during the reset() call, as the size of observation vectors is not known before then). Alternatively, the user can specify their own bounds by overwriting this function in a subclass.

Returns:: The bounds of the observation space.
Return type:: gym.spaces.Box or None

render(mode='human', skip_frames=0)[source]

Render the current state of the environment.

Visualizing the agent-environment interactions in real-time (e.g., during training) is hard to follow and not very useful, as the state of the distribution network changes too quickly (you can try with mode='human' and :code`skip_frames=0`). Instead, setting skip_frames>0 will only update the rendering of the environment every skip_frames`+1 steps (assuming :code:`render(skip_frames) is called after every step), which will make it much easier to follow for the human eye.

Parameters:

mode ({'human'}, optional) – The mode of rendering. If ‘human’, the environment is rendered while the agent interacts with it.
skip_frames (int, optional) – The number of frames (steps) to skip when rendering the environment. For example, skip_frames=3 will update the rendering of the environment every 4 calls to render().

Raises:

NotImplementedError – If a non-valid mode is specified.

Notes

The use of mode='human' and skip_frames>0 assumes that :py:func`render()` is called after each step the agent takes in the environment. The same behavior can be achieved with skip_frames=0 and calling :py:func`render()` less frequently.

reset(date_init=None)[source]

Reset the environment.

If the observation space is provided as a callable object but the observation_bounds() method is not overwritten, then the bounds on the observation space are set to (- np.inf, np.inf) here (after the size of the observation vectors is known).

Returns:: obs – The initial observation vector.
Return type:: numpy.ndarray

reset_date(date_init)[source]: Reset the date displayed in the visualization (and the year count).

seed(seed=None): Seed the random number generator.

step(action)[source]

Take a control action and transition from state \(s_t\) to state \(s_{t+1}\).

Parameters:

action (numpy.ndarray) – The action vector \(a_t\) taken by the agent.

Returns:

obs (numpy.ndarray) – The observation vector \(o_{t+1}\).
reward (float) – The reward associated with the transition \(r_t\).
done (bool) – True if a terminal state has been reached; False otherwise.
info (dict) – A dictionary with further information (used for debugging).

property np_random: Generator: Returns the environment’s internal _np_random that if not set will initialise with a random seed.

property unwrapped: Env

Returns the base non-wrapped environment.

Returns:: Env: The base non-wrapped gym.Env instance