Class TrainingDataCollector

java.lang.Object
neqsim.process.ml.TrainingDataCollector
All Implemented Interfaces:
Serializable

public class TrainingDataCollector extends Object implements Serializable
Training data collector for surrogate model development.

Collects input-output pairs from NeqSim simulations for training neural network surrogates. Supports:

  • CSV export for scikit-learn, PyTorch, TensorFlow
  • JSON export for flexible data handling
  • Feature normalization statistics
  • Train/validation/test split suggestions

Usage Example:


TrainingDataCollector collector = new TrainingDataCollector("flash_surrogate");
collector.defineInput("temperature", "K", 200.0, 500.0);
collector.defineInput("pressure", "bar", 1.0, 100.0);
collector.defineOutput("vapor_fraction", "mole_frac", 0.0, 1.0);

// Run many simulations
for (...) {
  collector.startSample();
  collector.recordInput("temperature", T);
  collector.recordInput("pressure", P);
  // Run flash calculation
  collector.recordOutput("vapor_fraction", result);
  collector.endSample();
}

collector.exportCSV("training_data.csv");

Version:
1.0
Author:
ESOL
See Also:
  • Field Details

  • Constructor Details

    • TrainingDataCollector

      public TrainingDataCollector(String name)
      Create a training data collector.
      Parameters:
      name - identifier for this dataset
  • Method Details

    • defineInput

      public TrainingDataCollector defineInput(String name, String unit, double minBound, double maxBound)
      Define an input feature.
      Parameters:
      name - feature name
      unit - physical unit
      minBound - expected minimum value
      maxBound - expected maximum value
      Returns:
      this collector for chaining
    • defineOutput

      public TrainingDataCollector defineOutput(String name, String unit, double minBound, double maxBound)
      Define an output feature.
      Parameters:
      name - feature name
      unit - physical unit
      minBound - expected minimum value
      maxBound - expected maximum value
      Returns:
      this collector for chaining
    • startSample

      public void startSample()
      Start recording a new sample.
    • recordInput

      public void recordInput(String name, double value)
      Record an input value for current sample.
      Parameters:
      name - input feature name
      value - value to record
    • recordOutput

      public void recordOutput(String name, double value)
      Record an output value for current sample.
      Parameters:
      name - output feature name
      value - value to record
    • recordStateAsInputs

      public void recordStateAsInputs(StateVector state)
      Record state vector as inputs.
      Parameters:
      state - state vector
    • recordStateAsOutputs

      public void recordStateAsOutputs(StateVector state)
      Record state vector as outputs.
      Parameters:
      state - state vector
    • endSample

      public void endSample()
      End current sample and add to dataset.
    • getSampleCount

      public int getSampleCount()
      Get number of samples collected.
      Returns:
      sample count
    • getName

      public String getName()
      Get dataset name.
      Returns:
      name
    • exportCSV

      public void exportCSV(String filePath) throws IOException
      Export to CSV format.
      Parameters:
      filePath - path to output file
      Throws:
      IOException - if writing fails
    • toCSV

      public String toCSV()
      Export to CSV string.
      Returns:
      CSV formatted string
    • getInputStatistics

      public Map<String, Map<String,Double>> getInputStatistics()
      Get normalization statistics for inputs.
      Returns:
      map of feature name to stats (mean, std, min, max)
    • getOutputStatistics

      public Map<String, Map<String,Double>> getOutputStatistics()
      Get normalization statistics for outputs.
      Returns:
      map of feature name to stats (mean, std, min, max)
    • clear

      public void clear()
      Clear all collected samples.
    • getSummary

      public String getSummary()
      Get summary statistics as formatted string.
      Returns:
      summary string
    • toString

      public String toString()
      Overrides:
      toString in class Object