Skip to content

Dataset Structure

The dataset is organised like a workbook: there are several plants, and for each plant several task variants. This page explains how everything fits together.

Plants as building blocks

Each plant lives in its own folder. There are currently three example plants that differ in size and equipment:

Plant Short description
BGA1 Large plant: two fermenters, secondary digester, digestate store, biogas upgrading, separator
BGA2 Small plant: one fermenter, secondary digester, digestate store, combined heat and power unit
BGA3 Medium plant: two fermenters, secondary digester, digestate store, combined heat and power unit

BGA = biogas plant

"BGA" simply stands for the German Biogasanlage (biogas plant). The number distinguishes the three examples.

Variants: the same plant, described differently

For each plant there is the same biogas plant, but described in different ways. This makes it possible to test whether the AI is robust – regardless of whether the description is long, short, in English or a sketch.

Two properties are combined here:

1. The form of the description

  • detailed text – an explanatory prose description
  • terse text – only the key figures
  • English text – the same plant in English
  • sketch – a drawing of the plant (image)

2. The completeness of the information

  • complete (suffix _full): All required information is in the description. The
    AI does not need to ask anything.
  • incomplete: Some information is missing (e.g. the operating temperature). The
    AI has to ask for it or fill it in sensibly.

Example

BGA2_terse_de_full means: plant BGA2, terse description, in German (de), with all information (full).

The reference solution ("Gold")

Each plant comes with a reference solution – similar to the answer sheet for a school exercise. It describes the correctly built plant and serves as the benchmark against which the AI's result is measured. All variants of a plant share the same reference solution, because it is always the same plant.

What the folders look like

In simplified form, the storage layout looks like this:

Dataset/
  BGA1/                     ← plant 1 (one folder per plant)
    BGA1_text_de.json         detailed text (German), incomplete
    BGA1_text_de_full.json    detailed text (German), complete
    BGA1_text_en.json         English text
    BGA1_terse_de.json        terse description
    BGA1_sketch.json          sketch only
    BGA1_sketch.png           the sketch image
    gold.py                   the shared reference solution
  BGA2/  …                   ← plant 2 (same layout)
  BGA3/  …                   ← plant 3 (same layout)

What is a .json file?

A .json file is a text file in a fixed format that a computer can read easily. You can think of it as a filled-in form with clearly named fields. What exactly is in such a form is explained on the page A Data Point in Detail.

Each of these task files is what we call a data point. What is inside one is what we look at next.