Class RLEnvironment
java.lang.Object
neqsim.process.ml.RLEnvironment
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
SeparatorLevelControlEnv
Reinforcement Learning environment wrapper for NeqSim process systems.
Provides a Gym-compatible interface for RL training on process control tasks. Key features:
- Standardized observation and action spaces
- Physics-grounded reward computation
- Safe action projection via constraint manager
- Episode management with reset capability
Usage Example:
ProcessSystem process = new ProcessSystem();
// ... build process ...
RLEnvironment env = new RLEnvironment(process);
env.addControlledEquipment("valve1", valve, actionSpace);
env.setRewardWeights(weights);
StateVector obs = env.reset();
while (!done) {
ActionVector action = agent.selectAction(obs);
StepResult result = env.step(action);
obs = result.observation;
done = result.done;
}
- Version:
- 1.0
- Author:
- ESOL
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classAdditional info from a step.static classResult of a simulation step. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ActionVectorprivate final ConstraintManagerprivate doubleprivate booleanprivate doubleprivate final ProcessSystemprivate static final longprivate doubleprivate intprivate doubleprivate doubleprivate doubleprivate double -
Constructor Summary
ConstructorsConstructorDescriptionRLEnvironment(ProcessSystem process) Create an RL environment wrapping a process system. -
Method Summary
Modifier and TypeMethodDescriptionaddConstraint(String name, String variableName, double minValue, double maxValue, String unit) Add a hard constraint.protected voidapplyAction(ActionVector action) Apply action to process equipment.protected doublecomputeReward(StateVector state, ActionVector action, RLEnvironment.StepInfo info) Compute reward for current state and action.defineAction(String name, double lowerBound, double upperBound, String unit) Define an action dimension.Get the action space specification.Get the constraint manager.doubleGet current simulation time.protected StateVectorGet current observation.Get the underlying process system.intGet step count in current episode.booleanisDone()Check if episode is done.reset()Reset the environment to initial state.setMaxEpisodeTime(double maxTime) Set maximum episode time.setRewardWeights(double energy, double setpointError, double constraintViolation, double throughput) Set reward weights.setTimeStep(double dt) Set simulation time step.step(ActionVector action) Execute one simulation step with given action.
-
Field Details
-
serialVersionUID
private static final long serialVersionUID- See Also:
-
process
-
constraintManager
-
actionSpace
-
simulationTimeStep
private double simulationTimeStep -
currentTime
private double currentTime -
maxEpisodeTime
private double maxEpisodeTime -
weightEnergy
private double weightEnergy -
weightSetpointError
private double weightSetpointError -
weightConstraintViolation
private double weightConstraintViolation -
weightThroughput
private double weightThroughput -
done
private boolean done -
stepCount
private int stepCount
-
-
Constructor Details
-
RLEnvironment
Create an RL environment wrapping a process system.- Parameters:
process- the process system to control
-
-
Method Details
-
defineAction
Define an action dimension.- Parameters:
name- action namelowerBound- minimum valueupperBound- maximum valueunit- physical unit- Returns:
- this environment for chaining
-
addConstraint
public RLEnvironment addConstraint(String name, String variableName, double minValue, double maxValue, String unit) Add a hard constraint.- Parameters:
name- constraint namevariableName- state variable to constrainminValue- minimum allowedmaxValue- maximum allowedunit- physical unit- Returns:
- this environment for chaining
-
setRewardWeights
public RLEnvironment setRewardWeights(double energy, double setpointError, double constraintViolation, double throughput) Set reward weights.- Parameters:
energy- weight for energy consumption (negative reward)setpointError- weight for setpoint deviation (negative reward)constraintViolation- weight for constraint violations (negative reward)throughput- weight for production throughput (positive reward)- Returns:
- this environment for chaining
-
setTimeStep
Set simulation time step.- Parameters:
dt- time step in seconds- Returns:
- this environment for chaining
-
setMaxEpisodeTime
Set maximum episode time.- Parameters:
maxTime- maximum time in seconds- Returns:
- this environment for chaining
-
reset
-
step
Execute one simulation step with given action.- Parameters:
action- control action to apply- Returns:
- step result with observation, reward, done flag
-
applyAction
Apply action to process equipment. Override in subclass to implement specific control logic.- Parameters:
action- the action to apply
-
getObservation
Get current observation. Override in subclass to include equipment-specific states.- Returns:
- current state vector
-
computeReward
Compute reward for current state and action.- Parameters:
state- current stateaction- applied actioninfo- info object to fill with details- Returns:
- scalar reward
-
getActionSpace
-
getConstraintManager
Get the constraint manager.- Returns:
- constraint manager
-
getProcess
-
getCurrentTime
public double getCurrentTime()Get current simulation time.- Returns:
- time in seconds
-
getStepCount
public int getStepCount()Get step count in current episode.- Returns:
- number of steps taken
-
isDone
public boolean isDone()Check if episode is done.- Returns:
- true if episode finished
-