Parallel Simulation Example¶
The examples/05_parallel_two_stage_simulation.py example demonstrates efficient parallel execution of multiple biogas plant simulation scenarios for parameter optimization, sensitivity analysis, and uncertainty quantification.
Overview¶
This example demonstrates: - Parallel Scenario Execution: Running multiple simulations concurrently across CPU cores - Parameter Sweeps: Systematic exploration of single and multiple parameter variations - Monte Carlo Analysis: Uncertainty quantification with parameter distributions - Statistical Analysis: Result aggregation and comparative statistics - Performance Metrics: Speedup and parallel efficiency measurement
Architecture¶
graph TB
subgraph "Main Process"
Config[Configuration]
Results[Result Collection]
end
subgraph "Worker Pool (4 cores)"
W1[Worker 1]
W2[Worker 2]
W3[Worker 3]
W4[Worker 4]
end
Config -->|Scenario 1| W1
Config -->|Scenario 2| W2
Config -->|Scenario 3| W3
Config -->|Scenario 4| W4
W1 -->|Result 1| Results
W2 -->|Result 2| Results
W3 -->|Result 3| Results
W4 -->|Result 4| Results
style Config fill:#e1f5fe
style Results fill:#c8e6c9
style W1 fill:#fff3e0
style W2 fill:#fff3e0
style W3 fill:#fff3e0
style W4 fill:#fff3e0
Key Components¶
1. ParallelSimulator¶
The ParallelSimulator class orchestrates concurrent execution:
from pyadm1.simulation.parallel import ParallelSimulator
# Create parallel simulator with 4 worker processes
parallel = ParallelSimulator(adm1_model, n_workers=4, verbose=True)
# Define scenarios
scenarios = [
{"k_dis": 0.5, "Q": [15, 10, 0, 0, 0, 0, 0, 0, 0, 0]},
{"k_dis": 0.6, "Q": [20, 10, 0, 0, 0, 0, 0, 0, 0, 0]},
{"k_dis": 0.7, "Q": [15, 15, 0, 0, 0, 0, 0, 0, 0, 0]},
]
# Run parallel simulations
results = parallel.run_scenarios(
scenarios=scenarios,
duration=10.0, # 10 days
initial_state=adm1_state,
compute_metrics=True
)
Key Features:
- Multiprocessing: Uses Python's multiprocessing.Pool for true parallelism
- Automatic Workload Distribution: Scenarios distributed evenly across workers
- Progress Tracking: Real-time progress reporting for long-running analyses
- Error Isolation: Individual scenario failures don't crash entire execution
2. ScenarioResult¶
Each simulation returns a ScenarioResult object:
@dataclass
class ScenarioResult:
scenario_id: int # Unique identifier
parameters: Dict[str, Any] # Parameter values used
success: bool # Completion status
duration: float # Simulation duration [days]
final_state: Optional[List[float]] # Final ADM1 state vector
time_series: Optional[Dict[str, List]] # Optional time series data
metrics: Dict[str, float] # Performance metrics
error: Optional[str] # Error message if failed
execution_time: float # Wall clock time [seconds]
Computed Metrics (when compute_metrics=True):
- Q_gas: Total biogas production [m³/d]
- Q_ch4: Methane production [m³/d]
- Q_co2: CO2 production [m³/d]
- CH4_content: Methane fraction [0-1]
- pH: pH value [-]
- VFA: Volatile fatty acids [g/L]
- TAC: Total alkalinity [g CaCO3/L]
- FOS_TAC: VFA/TA ratio [-]
- HRT: Hydraulic retention time [days]
- specific_gas_production: Biogas yield [m³/m³ feed]
Simulation Types¶
Basic Parallel Scenarios¶
Run multiple independent scenarios with different configurations:
# Define feed rate scenarios
feed_scenarios = [
{"name": "Low Feed", "Q_digester_1": [10, 8, 0, 0, 0, 0, 0, 0, 0, 0]},
{"name": "Base Feed", "Q_digester_1": [15, 10, 0, 0, 0, 0, 0, 0, 0, 0]},
{"name": "High Feed", "Q_digester_1": [20, 12, 0, 0, 0, 0, 0, 0, 0, 0]},
]
scenarios = [{"Q": s["Q_digester_1"]} for s in feed_scenarios]
results = parallel.run_scenarios(
scenarios=scenarios,
duration=10.0,
initial_state=adm1_state,
dt=1.0/24.0, # 1 hour time step
compute_metrics=True,
save_time_series=False # Set True for detailed time series
)
Use Cases: - Comparing different operational strategies - Testing substrate composition variations - Evaluating design alternatives
Single-Parameter Sweep¶
Systematic exploration of one parameter:
from pyadm1.simulation.parallel import ParameterSweepConfig
# Sweep disintegration rate
sweep_config = ParameterSweepConfig(
parameter_name="k_dis",
values=[0.10, 0.14, 0.18, 0.22, 0.26, 0.30],
other_params={"Q": [15, 10, 0, 0, 0, 0, 0, 0, 0, 0]}
)
results = parallel.parameter_sweep(
config=sweep_config,
duration=10.0,
initial_state=adm1_state,
compute_metrics=True
)
# Find optimal value
ch4_productions = [r.metrics.get('Q_ch4', 0) for r in results if r.success]
best_idx = np.argmax(ch4_productions)
optimal_k_dis = sweep_config.values[best_idx]
Common Parameters to Sweep:
- Kinetic parameters: k_dis, k_hyd_ch, k_hyd_pr, k_hyd_li
- Yield parameters: Y_su, Y_aa, Y_fa, Y_c4, Y_pro, Y_ac, Y_h2
- Uptake rates: k_m_su, k_m_aa, k_m_fa, k_m_c4, k_m_pro, k_m_ac
- Operational parameters: Feed rates, temperatures, retention times
Multi-Parameter Sweep¶
Full factorial design exploring parameter interactions:
parameter_configs = {
"k_dis": [0.14, 0.18, 0.22],
"Y_su": [0.09, 0.10, 0.11],
"Q_substrate_0": [12, 15, 18] # Corn silage feed rate
}
results = parallel.multi_parameter_sweep(
parameter_configs=parameter_configs,
duration=10.0,
initial_state=adm1_state,
fixed_params={"Q_substrate_1": 10} # Manure fixed at 10 m³/d
)
# Total combinations: 3 × 3 × 3 = 27 scenarios
Analysis:
# Find best combination
best_result = max(
(r for r in results if r.success),
key=lambda r: r.metrics.get('Q_ch4', 0)
)
print(f"Optimal parameters:")
print(f" k_dis = {best_result.parameters['k_dis']:.2f}")
print(f" Y_su = {best_result.parameters['Y_su']:.2f}")
print(f" Q_substrate_0 = {best_result.parameters['Q_substrate_0']:.1f} m³/d")
print(f" Methane production = {best_result.metrics['Q_ch4']:.1f} m³/d")
Monte Carlo Simulation¶
Uncertainty quantification with parameter distributions:
from pyadm1.simulation.parallel import MonteCarloConfig
mc_config = MonteCarloConfig(
n_samples=100,
parameter_distributions={
"k_dis": (0.18, 0.03), # mean=0.18, std=0.03
"Y_su": (0.10, 0.01), # mean=0.10, std=0.01
"k_hyd_ch": (10.0, 1.5) # mean=10.0, std=1.5
},
fixed_params={"Q": [15, 10, 0, 0, 0, 0, 0, 0, 0, 0]},
seed=42 # For reproducibility
)
results = parallel.monte_carlo(
config=mc_config,
duration=30.0,
initial_state=adm1_state,
compute_metrics=True
)
# Summarize uncertainty
summary = parallel.summarize_results(results)
print(f"Methane production [m³/d]:")
print(f" Mean: {summary['metrics']['Q_ch4']['mean']:.1f}")
print(f" Std: {summary['metrics']['Q_ch4']['std']:.1f}")
print(f" 95% CI: [{summary['metrics']['Q_ch4']['q25']:.1f}, "
f"{summary['metrics']['Q_ch4']['q75']:.1f}]")
Probability Distribution Selection: - Normal: Most kinetic parameters (symmetric uncertainty) - Log-normal: Always-positive parameters with multiplicative uncertainty - Uniform: When only bounds are known - Triangular: Expert judgment with most likely value
Expected Output¶
Basic Scenario Comparison¶
================================================================================
SCENARIO COMPARISON RESULTS
================================================================================
Low Feed:
Feed Rate: 18.0 m³/d
Biogas: 723.5 m³/d
Methane: 434.1 m³/d
CH4 %: 60.0%
pH: 7.28
VFA: 1.85 g/L
FOS/TAC: 0.228
HRT: 109.8 days
Exec Time: 2.34 seconds
Base Feed:
Feed Rate: 25.0 m³/d
Biogas: 1005.2 m³/d
Methane: 603.1 m³/d
CH4 %: 60.0%
pH: 7.20
VFA: 2.45 g/L
FOS/TAC: 0.289
HRT: 79.1 days
Exec Time: 2.41 seconds
High Feed:
Feed Rate: 32.0 m³/d
Biogas: 1265.8 m³/d
Methane: 759.5 m³/d
CH4 %: 60.0%
pH: 7.15
VFA: 3.12 g/L
FOS/TAC: 0.341
HRT: 61.8 days
Exec Time: 2.38 seconds
Interpretation: - Biogas production scales linearly with feed rate (expected for stable operation) - pH decreases with higher loading (more VFA production) - FOS/TAC increases but remains below critical threshold (0.4) - Execution time is similar across scenarios (same computational complexity)
Parameter Sweep Results¶
================================================================================
4. Running parameter sweep (Disintegration Rate k_dis)
================================================================================
Testing 6 different k_dis values...
Parameter Sweep Results:
------------------------------------------------------------
k_dis | Q_gas | Q_ch4 | pH | VFA
------------------------------------------------------------
0.10 | 865.3 | 519.2 | 7.35 | 1.52
0.14 | 925.8 | 555.5 | 7.28 | 1.85
0.18 | 982.4 | 589.4 | 7.22 | 2.18
0.22 | 1005.2 | 603.1 | 7.20 | 2.45
0.26 | 1015.6 | 609.4 | 7.18 | 2.68
0.30 | 1018.2 | 610.9 | 7.17 | 2.85
✓ Optimal k_dis = 0.30 (CH4 = 610.9 m³/d)
Observations: - Methane production increases with k_dis (faster disintegration) - Diminishing returns above k_dis = 0.26 (only 1.5 m³/d increase) - pH decreases slightly due to faster organic acid production - Trade-off: Higher k_dis → more gas but lower pH stability
Monte Carlo Statistics¶
================================================================================
6. Running Monte Carlo uncertainty analysis
================================================================================
Running 50 Monte Carlo samples...
Monte Carlo Summary Statistics:
------------------------------------------------------------
Q_gas:
Mean: 1002.45
Std: 58.32
Min: 885.20
Max: 1125.80
Median: 998.60
Q25: 965.15
Q75: 1038.22
Q_ch4:
Mean: 601.47
Std: 35.00
Min: 531.12
Max: 675.48
Median: 599.16
Q25: 579.09
Q75: 622.93
pH:
Mean: 7.21
Std: 0.08
Min: 7.05
Max: 7.38
Median: 7.20
Q25: 7.15
Q75: 7.26
Success Rate: 100.0%
Uncertainty Analysis: - Coefficient of variation (CV): - Biogas: 58.32 / 1002.45 = 5.8% (moderate uncertainty) - Methane: 35.00 / 601.47 = 5.8% (same relative uncertainty) - pH: 0.08 / 7.21 = 1.1% (very stable) - 95% Confidence Interval (≈ Q25 to Q75): - Biogas: 965–1038 m³/d (±3.7% from median) - Methane: 579–623 m³/d (±3.7% from median) - Interpretation: Parameter uncertainty has moderate impact on gas production predictions
Performance Analysis¶
Parallel Efficiency¶
================================================================================
SIMULATION SUMMARY
================================================================================
Total execution time: 125.3 seconds
Average time per simulation: 1.05 seconds
Parallel efficiency:
Workers used: 4
Theoretical sequential time: 475.8 seconds
Speedup: 3.80x
Parallel efficiency: 95.0%
Performance Metrics: - Speedup: 475.8 / 125.3 = 3.80× - Efficiency: 3.80 / 4 = 95.0% - Overhead: 5% due to: - Process creation/destruction - Data serialization/deserialization - Result aggregation
Optimal Worker Count:
import multiprocessing as mp
# Rule of thumb: n_workers = CPU_count - 1
optimal_workers = max(1, mp.cpu_count() - 1)
parallel = ParallelSimulator(adm1, n_workers=optimal_workers)
Scalability: - Linear speedup: Up to CPU core count (ideal: 95%+ efficiency) - Diminishing returns: Beyond core count (hyperthreading limited) - Overhead dominates: Very short simulations (<1 second)
Memory Considerations¶
Each worker process creates a full copy of the ADM1 model:
# Memory per worker ≈ 50-100 MB (depends on model complexity)
# Total memory = n_workers × memory_per_worker + overhead
# For 100 scenarios on 4 cores:
# Memory usage ≈ 4 × 75 MB ≈ 300 MB (manageable)
# For very large parameter sweeps:
if n_scenarios > 1000:
# Use batching to limit memory
batch_size = 100
all_results = []
for i in range(0, len(scenarios), batch_size):
batch = scenarios[i:i+batch_size]
results = parallel.run_scenarios(batch, duration=30, ...)
all_results.extend(results)
Advanced Usage¶
Custom Result Processing¶
def analyze_stability(results):
"""Classify scenarios by process stability."""
stable = []
unstable = []
for r in results:
if not r.success:
continue
fos_tac = r.metrics.get('FOS_TAC', 0)
pH = r.metrics.get('pH', 0)
# Stability criteria
is_stable = (
0.2 <= fos_tac <= 0.4 and
6.8 <= pH <= 7.5
)
if is_stable:
stable.append(r)
else:
unstable.append(r)
return stable, unstable
stable, unstable = analyze_stability(results)
print(f"Stable scenarios: {len(stable)}/{len(results)}")
Sensitivity Analysis¶
def calculate_sensitivity(param_name, sweep_results):
"""Calculate parameter sensitivity coefficient."""
params = [r.parameters[param_name] for r in sweep_results if r.success]
ch4 = [r.metrics['Q_ch4'] for r in sweep_results if r.success]
# Linear regression: ΔCH4/Δparam
from numpy.polynomial import Polynomial
p = Polynomial.fit(params, ch4, deg=1)
sensitivity = p.convert().coef[1] # Slope
return sensitivity
# Compare sensitivities
k_dis_sens = calculate_sensitivity('k_dis', sweep_results_1)
Y_su_sens = calculate_sensitivity('Y_su', sweep_results_2)
print(f"Sensitivity of CH4 production:")
print(f" k_dis: {k_dis_sens:.1f} m³/d per unit")
print(f" Y_su: {Y_su_sens:.1f} m³/d per unit")
Response Surface Modeling¶
For multi-parameter sweeps, fit response surfaces:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel
# Extract training data
X = [] # Parameter combinations
y = [] # Methane production
for r in multi_results:
if r.success:
X.append([r.parameters['k_dis'], r.parameters['Y_su']])
y.append(r.metrics['Q_ch4'])
X = np.array(X)
y = np.array(y)
# Fit Gaussian Process
kernel = ConstantKernel(1.0) * RBF(length_scale=1.0)
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)
gp.fit(X, y)
# Predict at new points
k_dis_new = np.linspace(0.10, 0.30, 50)
Y_su_new = np.linspace(0.08, 0.12, 50)
K, Y = np.meshgrid(k_dis_new, Y_su_new)
X_pred = np.c_[K.ravel(), Y.ravel()]
ch4_pred, ch4_std = gp.predict(X_pred, return_std=True)
# Find optimum
opt_idx = np.argmax(ch4_pred)
opt_k_dis = X_pred[opt_idx, 0]
opt_Y_su = X_pred[opt_idx, 1]
opt_ch4 = ch4_pred[opt_idx]
print(f"Predicted optimum:")
print(f" k_dis = {opt_k_dis:.3f}")
print(f" Y_su = {opt_Y_su:.3f}")
print(f" CH4 = {opt_ch4:.1f} ± {ch4_std[opt_idx]:.1f} m³/d")
Best Practices¶
1. Choose Appropriate Number of Workers¶
import multiprocessing as mp
# Rule of thumb
n_workers = max(1, mp.cpu_count() - 1) # Leave one core for OS
# For I/O-bound tasks (rare in biogas simulation)
n_workers = 2 * mp.cpu_count()
# For memory-limited systems
available_memory_gb = 16 # Your system RAM
memory_per_worker_gb = 0.1
n_workers = min(n_workers, int(available_memory_gb / memory_per_worker_gb))
2. Balance Simulation Duration and Number of Scenarios¶
# Short simulations (< 1 second): Overhead dominates
# → Increase duration or use sequential execution
# Medium simulations (1-10 seconds): Good parallelization
# → Ideal for parameter sweeps
# Long simulations (> 60 seconds): Memory intensive
# → Consider reducing n_workers or using batching
3. Save Results Incrementally¶
import pickle
from pathlib import Path
results_dir = Path("results/parallel_sweep")
results_dir.mkdir(parents=True, exist_ok=True)
# Run in batches
batch_size = 50
for batch_id, i in enumerate(range(0, len(scenarios), batch_size)):
batch = scenarios[i:i+batch_size]
results = parallel.run_scenarios(
batch, duration=30, initial_state=adm1_state
)
# Save batch results
with open(results_dir / f"batch_{batch_id:03d}.pkl", "wb") as f:
pickle.dump(results, f)
print(f"Saved batch {batch_id}")
# Load all results later
all_results = []
for batch_file in sorted(results_dir.glob("batch_*.pkl")):
with open(batch_file, "rb") as f:
all_results.extend(pickle.load(f))
4. Handle Failed Scenarios Gracefully¶
# Check for failures
failed = [r for r in results if not r.success]
if failed:
print(f"Warning: {len(failed)} scenarios failed")
# Log failures
with open("failed_scenarios.log", "w") as f:
for r in failed:
f.write(f"\nScenario {r.scenario_id}:\n")
f.write(f"Parameters: {r.parameters}\n")
f.write(f"Error: {r.error}\n")
# Retry with tighter tolerances
retry_scenarios = [r.parameters for r in failed]
# Create new simulator with stricter settings
from pyadm1.core.solver import create_solver
strict_solver = create_solver(method='BDF', rtol=1e-8, atol=1e-10)
retry_results = parallel.run_scenarios(retry_scenarios, ...)
5. Validate Results¶
def validate_results(results):
"""Check for physically unrealistic results."""
issues = []
for r in results:
if not r.success:
continue
# Check for unrealistic values
if r.metrics.get('Q_ch4', 0) < 0:
issues.append(f"Scenario {r.scenario_id}: Negative methane")
if r.metrics.get('pH', 7) < 5 or r.metrics.get('pH', 7) > 9:
issues.append(f"Scenario {r.scenario_id}: pH out of range")
if r.metrics.get('CH4_content', 0) > 1 or r.metrics.get('CH4_content', 0) < 0:
issues.append(f"Scenario {r.scenario_id}: Invalid CH4 fraction")
return issues
issues = validate_results(results)
if issues:
print("Validation warnings:")
for issue in issues:
print(f" - {issue}")
Troubleshooting¶
Common Issues¶
Issue: "Pool workers hanging, no progress"
# Solution: Reduce complexity or increase timeout
parallel = ParallelSimulator(adm1, n_workers=2) # Fewer workers
results = parallel.run_scenarios(scenarios, duration=10, ...) # Shorter duration
Issue: "MemoryError during parallel execution"
# Solution: Use batching
batch_size = min(50, len(scenarios))
all_results = []
for i in range(0, len(scenarios), batch_size):
batch = scenarios[i:i+batch_size]
results = parallel.run_scenarios(batch, ...)
all_results.extend(results)
# Force garbage collection
import gc
gc.collect()
Issue: "Inconsistent results across runs"
# Solution: Set random seeds for Monte Carlo
mc_config = MonteCarloConfig(
n_samples=100,
parameter_distributions={...},
seed=42 # Fixed seed for reproducibility
)
Issue: "Low parallel efficiency (< 70%)"
# Possible causes:
# 1. Simulations too short (overhead dominates)
# 2. Too many workers for available cores
# 3. I/O bottleneck (disk access)
# Solutions:
# 1. Increase simulation duration
parallel = ParallelSimulator(adm1, n_workers=mp.cpu_count() - 1)
# 2. Disable time series saving
results = parallel.run_scenarios(..., save_time_series=False)
# 3. Batch processing instead of parallel
Related Examples¶
basic_digester.md: Single digester simulation basicstwo_stage_plant.md: Two-stage plant configurationcalibration_workflow.md: Parameter calibration using parallel optimization
References¶
- Multiprocessing: Python documentation on
multiprocessingmodule - Parallel Computing: Pacheco, P. (2011). An Introduction to Parallel Programming
- Design of Experiments: Montgomery, D.C. (2017). Design and Analysis of Experiments
- Uncertainty Quantification: Sullivan, T.J. (2015). Introduction to Uncertainty Quantification
Summary¶
The parallel simulation example demonstrates:
- Efficient Parameter Exploration: Test multiple scenarios 3-4× faster using parallel execution
- Systematic Optimization: Parameter sweeps identify optimal operating conditions
- Uncertainty Quantification: Monte Carlo analysis quantifies prediction uncertainty
- Scalable Architecture: Handle 100s of scenarios with proper batching and error handling
- Statistical Analysis: Comprehensive result aggregation and comparative statistics
Key Takeaway: Parallel simulation enables rapid exploration of the design and operating space, making it feasible to perform comprehensive sensitivity analysis, optimization, and uncertainty quantification that would be impractical with sequential execution.