Designing New Environments
The gym-anm
framework was specifically designed to make it easy for users to design their own
environments and ANM tasks. This page describes in details how to do so.
Template
New environments are created by creating a sub-class of gym_anm.envs.anm_env.ANMEnv
. The general template
to follow is shown below.
"""
This file gives the template to follow when creating new gym-anm environments.
For more information, see https://gym-anm.readthedocs.io/en/latest/topics/design_new_env.html.
"""
from gym_anm import ANMEnv
class CustomEnvironment(ANMEnv):
def __init__(self):
network = {"baseMVA": ..., "bus": ..., "device": ..., "branch": ...} # power grid specs
observation = ... # observation space
K = ... # number of auxiliary variables
delta_t = ... # time interval between timesteps
gamma = ... # discount factor
lamb = ... # penalty weighting hyperparameter
aux_bounds = ... # bounds on auxiliary variable (optional)
costs_clipping = ... # reward clipping parameters (optional)
seed = ... # random seed (optional)
super().__init__(network, observation, K, delta_t, gamma, lamb, aux_bounds, costs_clipping, seed)
def init_state(self):
...
def next_vars(self, s_t):
...
def observation_bounds(self): # optional
...
def render(self, mode="human"): # optional
...
def close(self): # optional
...
where:
network
is the network dictionary that describes the characteristics of the power grid considered (see Appendix D of the paper),observation
defines the observation space (see Appendix C of the paper),K
is the number of auxiliary variables \(K\),delta_t
is the time interval (in hour) between subsequent timesteps \(\Delta t\),gamma
is the discount factor \(\gamma \in [0, 1]\),lamb
is the penalty weighting hyperparameter \(\lambda\) in the reward function,aux_bounds
are the bounds on the auxiliary variables, specified as a 2D array (column 1 = lower bounds, column 2 = upper bounds),costs_clipping
is a tuple of (clip value for \(\Delta E_{t:t+1}\), clip value for \(\lambda \phi(s_{t+1})\)), with \(r_{clip} = sum(costs\_clipping)\),seed
is a random seed,init_state()
is a method that must be overwritten to return an initial state vector \(s_0 \sim p_0(\cdot)\) (it gets called fromenv.reset()
),next_vars()
is a method that must be overwritten to return the next vector of stochastic variables, which include (in that order):the active demand \(P_{l,t+1}^{(dev)}\) of each load \(l \in \mathcal D_L\) (ordered by their device ID \(l\)),
the maximum generation \(P_{g,t+1}^{(max)}\) of each non-slack generator \(g \in \mathcal D_G - \{g^{slack}\}\) (ordered by their device ID \(g\)),
the value of each auxiliary variable \(aux^{(k)}_{t+1}\) for \(k=0,\ldots,K-1\) (ordered by their auxiliary variable ID \(k\)),
observation_bounds()
is an optional method that can be implemented to make the observation space finite whenobservation
is provided as a callable. In this case,gym-anm
has no way to infer the bounds of observation vectors \(o_t\) andobservation_bounds()
can be used to specify them.render()
andclose()
are optional methods that can be implemented to support rendering of the environment. For more information, see the official Gym documentation.
Example
A concrete example if shown below, where the environment SimpleEnvironment
is defined for
a 2-bus power grid with a single load connected at bus 1.
"""
This file contains an example of a custom gym-anm environment.
Features:
* it uses a 2-bus power grid: Slack generator (bus 0) --- Load (bus 1),
* the initial state s0 is randomly generated (see `init_state()`),
* load demands are randomly generated in [-10, 0] (see `next_vars()`),
* a random auxiliary variable is added for illustrating the process of
using them (it is useless in this case) (see `next_vars()`).
For more information, see https://gym-anm.readthedocs.io/en/latest/topics/design_new_env.html.
"""
import numpy as np
from gym_anm import ANMEnv
"""
A 2-bus power grid with topology:
Slack (bus 0) ---- Load (bus 1)
"""
network = {
"baseMVA": 100,
"bus": np.array([[0, 0, 132, 1.0, 1.0], [1, 1, 33, 1.1, 0.9]]),
"device": np.array(
[
[0, 0, 0, None, 200, -200, 200, -200, None, None, None, None, None, None, None],
[1, 1, -1, 0.2, 0, -10, None, None, None, None, None, None, None, None, None],
]
),
"branch": np.array([[0, 1, 0.01, 0.1, 0.0, 3, 1, 0]]),
}
class SimpleEnvironment(ANMEnv):
"""An example of a simple 2-bus custom gym-anm environment."""
def __init__(self):
observation = "state" # fully observable environment
K = 1 # 1 auxiliary variable
delta_t = 0.25 # 15min intervals
gamma = 0.9 # discount factor
lamb = 100 # penalty weighting hyperparameter
aux_bounds = np.array([[0, 10]]) # bounds on auxiliary variable
costs_clipping = (1, 100) # reward clipping parameters
seed = 1 # random seed
super().__init__(network, observation, K, delta_t, gamma, lamb, aux_bounds, costs_clipping, seed)
def init_state(self):
"""Return a state vector with random values in [0, 1]."""
n_dev = self.simulator.N_device # number of devices
n_des = self.simulator.N_des # number of DES units
n_gen = self.simulator.N_non_slack_gen # number of non-slack generators
N_vars = 2 * n_dev + n_des + n_gen + self.K # size of state vectors
return np.random.rand(N_vars) # random state vector
def next_vars(self, s_t):
"""Return a random load injection in [-10, 0] and a random aux variable in [0,10]."""
P_load = -10 * np.random.rand(1)[0] # Random demand in [-10, 0]
aux = np.random.randint(0, 10) # Random auxiliary variable in [0, 10]
return np.array([P_load, aux])
if __name__ == "__main__":
env = SimpleEnvironment()
env.reset()
for t in range(10):
a = env.action_space.sample()
o, r, done, _ = env.step(a)
print(f"t={t}, r_t={r:.3}")
Notes
Frequent mistakes when designing new gym-anm
environments include:
Failing to specify load power injections as negative injections. This is particularly important in the
next_vars()
method, since the demand \(P_l^{(dev)}\) returned will first get clipped to \([\underline P_l, 0]\) before being applied to the environment. This means that if a value \(>0\) is returned, it will always get clipped to 0.