Designing New Environments

The gym-anm framework was specifically designed to make it easy for users to design their own environments and ANM tasks. This page describes in details how to do so.

Template

New environments are created by creating a sub-class of gym_anm.envs.anm_env.ANMEnv. The general template to follow is shown below.

"""
This file gives the template to follow when creating new gym-anm environments.

For more information, see https://gym-anm.readthedocs.io/en/latest/topics/design_new_env.html.
"""
from gym_anm import ANMEnv


class CustomEnvironment(ANMEnv):
    def __init__(self):
        network = {"baseMVA": ..., "bus": ..., "device": ..., "branch": ...}  # power grid specs
        observation = ...  # observation space
        K = ...  # number of auxiliary variables
        delta_t = ...  # time interval between timesteps
        gamma = ...  # discount factor
        lamb = ...  # penalty weighting hyperparameter
        aux_bounds = ...  # bounds on auxiliary variable (optional)
        costs_clipping = ...  # reward clipping parameters (optional)
        seed = ...  # random seed (optional)

        super().__init__(network, observation, K, delta_t, gamma, lamb, aux_bounds, costs_clipping, seed)

    def init_state(self):
        ...

    def next_vars(self, s_t):
        ...

    def observation_bounds(self):  # optional
        ...

    def render(self, mode="human"):  # optional
        ...

    def close(self):  # optional
        ...

where:

network is the network dictionary that describes the characteristics of the power grid considered (see Appendix D of the paper),
observation defines the observation space (see Appendix C of the paper),
K is the number of auxiliary variables \(K\),
delta_t is the time interval (in hour) between subsequent timesteps \(\Delta t\),
gamma is the discount factor \(\gamma \in [0, 1]\),
lamb is the penalty weighting hyperparameter \(\lambda\) in the reward function,
aux_bounds are the bounds on the auxiliary variables, specified as a 2D array (column 1 = lower bounds, column 2 = upper bounds),
costs_clipping is a tuple of (clip value for \(\Delta E_{t:t+1}\), clip value for \(\lambda \phi(s_{t+1})\)), with \(r_{clip} = sum(costs\_clipping)\),
seed is a random seed,
init_state() is a method that must be overwritten to return an initial state vector \(s_0 \sim p_0(\cdot)\) (it gets called from env.reset()),
next_vars() is a method that must be overwritten to return the next vector of stochastic variables, which include (in that order):
- the active demand \(P_{l,t+1}^{(dev)}\) of each load \(l \in \mathcal D_L\) (ordered by their device ID \(l\)),
- the maximum generation \(P_{g,t+1}^{(max)}\) of each non-slack generator \(g \in \mathcal D_G - \{g^{slack}\}\) (ordered by their device ID \(g\)),
- the value of each auxiliary variable \(aux^{(k)}_{t+1}\) for \(k=0,\ldots,K-1\) (ordered by their auxiliary variable ID \(k\)),
observation_bounds() is an optional method that can be implemented to make the observation space finite when observation is provided as a callable. In this case, gym-anm has no way to infer the bounds of observation vectors \(o_t\) and observation_bounds() can be used to specify them.
render() and close() are optional methods that can be implemented to support rendering of the environment. For more information, see the official Gymnasium documentation.

Example

A concrete example if shown below, where the environment SimpleEnvironment is defined for a 2-bus power grid with a single load connected at bus 1.

"""
This file contains an example of a custom gym-anm environment.

Features:
* it uses a 2-bus power grid: Slack generator (bus 0) --- Load (bus 1),
* the initial state s0 is randomly generated (see `init_state()`),
* load demands are randomly generated in [-10, 0] (see `next_vars()`),
* a random auxiliary variable is added for illustrating the process of
  using them (it is useless in this case) (see `next_vars()`).

For more information, see https://gym-anm.readthedocs.io/en/latest/topics/design_new_env.html.
"""
import numpy as np
from gym_anm import ANMEnv

"""
A 2-bus power grid with topology:
    Slack (bus 0) ---- Load (bus 1)
"""
network = {
    "baseMVA": 100,
    "bus": np.array([[0, 0, 132, 1.0, 1.0], [1, 1, 33, 1.1, 0.9]]),
    "device": np.array(
        [
            [0, 0, 0, None, 200, -200, 200, -200, None, None, None, None, None, None, None],
            [1, 1, -1, 0.2, 0, -10, None, None, None, None, None, None, None, None, None],
        ]
    ),
    "branch": np.array([[0, 1, 0.01, 0.1, 0.0, 3, 1, 0]]),
}


class SimpleEnvironment(ANMEnv):
    """An example of a simple 2-bus custom gym-anm environment."""

    def __init__(self):
        observation = "state"  # fully observable environment
        K = 1  # 1 auxiliary variable
        delta_t = 0.25  # 15min intervals
        gamma = 0.9  # discount factor
        lamb = 100  # penalty weighting hyperparameter
        aux_bounds = np.array([[0, 10]])  # bounds on auxiliary variable
        costs_clipping = (1, 100)  # reward clipping parameters
        seed = 1  # random seed

        super().__init__(network, observation, K, delta_t, gamma, lamb, aux_bounds, costs_clipping, seed)

    def init_state(self):
        """Return a state vector with random values in [0, 1]."""
        n_dev = self.simulator.N_device  # number of devices
        n_des = self.simulator.N_des  # number of DES units
        n_gen = self.simulator.N_non_slack_gen  # number of non-slack generators
        N_vars = 2 * n_dev + n_des + n_gen + self.K  # size of state vectors

        return np.random.rand(N_vars)  # random state vector

    def next_vars(self, s_t):
        """Return a random load injection in [-10, 0] and a random aux variable in [0,10]."""
        P_load = -10 * np.random.rand(1)[0]  # Random demand in [-10, 0]
        aux = np.random.randint(0, 10)  # Random auxiliary variable in [0, 10]

        return np.array([P_load, aux])


if __name__ == "__main__":
    env = SimpleEnvironment()
    env.reset()

    for t in range(10):
        a = env.action_space.sample()
        o, r, terminated, _, _ = env.step(a)
        print(f"t={t}, r_t={r:.3}")

Notes

Frequent mistakes when designing new gym-anm environments include:

Failing to specify load power injections as negative injections. This is particularly important in the next_vars() method, since the demand \(P_l^{(dev)}\) returned will first get clipped to \([\underline P_l, 0]\) before being applied to the environment. This means that if a value \(>0\) is returned, it will always get clipped to 0.