Class AgentBenchmarkSuite
java.lang.Object
neqsim.util.agentic.AgentBenchmarkSuite
- All Implemented Interfaces:
Serializable
Defines and evaluates standardized engineering benchmark problems for agent performance
measurement.
Inspired by the Simona dataset used by Tian et al. (2026) for evaluating multi-agent chemical process design workflows, this class provides a curated set of engineering problems with known reference solutions. Each benchmark problem specifies inputs, expected outputs with tolerances, and pass/fail criteria. Agent systems can run the full suite to measure convergence rate, accuracy, and completeness across diverse engineering tasks.
Problem Categories:
- THERMO — Pure component and mixture thermodynamic properties
- FLASH — Phase equilibrium calculations (TP, PH, PS flash)
- PROCESS — Process equipment and flowsheet simulations
- PIPELINE — Multiphase pipe flow and pressure drop
- ECONOMICS — Field development NPV and cost estimation
- SAFETY — Depressurization, relief valve sizing, safety envelopes
Usage:
AgentBenchmarkSuite suite = AgentBenchmarkSuite.createStandardSuite();
suite.addResult("methane_density_300K_50bar", 34.05);
BenchmarkReport report = suite.evaluate();
System.out.println("Pass rate: " + report.getPassRate());
System.out.println("Failed: " + report.getFailedProblems());
String json = report.toJson();
- Version:
- 1.0
- Author:
- Even Solbraa
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA single benchmark problem with expected reference solution.static classAggregate report for the full benchmark suite evaluation.static enumDifficulty level of the benchmark problem.static enumCategory of engineering benchmark problem.static classResult of evaluating a single benchmark problem. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final List<AgentBenchmarkSuite.BenchmarkProblem> private static final longprivate final String -
Constructor Summary
ConstructorsConstructorDescriptionAgentBenchmarkSuite(String suiteName) Creates a new benchmark suite with the given name. -
Method Summary
Modifier and TypeMethodDescriptionvoidaddConvergenceResult(String problemId, boolean converged) Records whether a simulation converged for a specific problem.voidAdds a benchmark problem to the suite.voidSubmits an agent result for a specific problem.static AgentBenchmarkSuiteCreates a standard benchmark suite with representative problems across all categories.evaluate()Evaluates all submitted results against the benchmark reference data.Returns the list of benchmark problems in this suite.Returns the name of this benchmark suite.toJson()Serializes the benchmark suite definition to JSON.
-
Field Details
-
serialVersionUID
private static final long serialVersionUID- See Also:
-
suiteName
-
problems
-
submittedResults
-
submittedConvergence
-
-
Constructor Details
-
AgentBenchmarkSuite
Creates a new benchmark suite with the given name.- Parameters:
suiteName- descriptive name for the benchmark suite
-
-
Method Details
-
addProblem
Adds a benchmark problem to the suite.- Parameters:
problem- the benchmark problem to add
-
addResult
Submits an agent result for a specific problem.- Parameters:
problemId- the unique identifier of the benchmark problemvalue- the computed result value
-
addConvergenceResult
Records whether a simulation converged for a specific problem.- Parameters:
problemId- the unique identifier of the benchmark problemconverged- true if the simulation converged, false otherwise
-
evaluate
Evaluates all submitted results against the benchmark reference data.- Returns:
- a BenchmarkReport with pass/fail verdicts and aggregate metrics
-
getProblems
Returns the list of benchmark problems in this suite.- Returns:
- unmodifiable list of benchmark problems
-
getSuiteName
-
createStandardSuite
Creates a standard benchmark suite with representative problems across all categories.Reference data sources: NIST Chemistry WebBook, published experimental data, validated simulation results.
- Returns:
- a pre-populated benchmark suite
-
toJson
Serializes the benchmark suite definition to JSON.- Returns:
- JSON string representation of the suite
-