Data Points
A data point is a single measurement or observation taken during the course of an experiment. Data points are the fundamental building blocks of experimental data, and are used to generate the results and conclusions of an experiment.
WEI currently supports two broad categories of data points:
data_value
: A single JSON serializable data point, such as a number, string, or dictionary. This is useful for scalar data points, such as single sensor readings (temperature, pressure, etc.) or for data that lends itself well to a dictionary format (e.g. a set of key-value pairs).local_file
: A file that is stored locally on the WEI server. This is useful for larger data points, such as images, videos, or other large files that are generated as part of an experiment. The file is stored on the server and can be accessed later by the user. Any file type can be stored as a local file, but the user should be aware of the storage limitations of the WEI server.
Data points are typically generated by a Workflow Run, and are associated with a specific Experiment. They can be logged to the Experiment by the WEI server (such as when returned by a module as part of a Workflow Run) or by the user via the /data
REST API endpoint or wei.experiment_client.ExperimentClient.create_datapoint()
method, and are stored in WEI’s database and/or local storage for later retrieval and analysis.
Data Labels
In addition to a unique ID, all data points have a label
field, which is a human-readable string that describes the data point. This label is used to identify the data point in the WEI database and in the results of an experiment. The label should be descriptive and informative, and should help the user understand the context and meaning of the data point.
There are two ways a data point can come to be labeled:
By the module, as part of the action that generates the data point. In this case, the module should provide a label as part of the data point, which will be used by WEI to identify the data point.
By the experimenter, as part of labeling a data point in a workflow or creating it in the experiment application. In this case, the experimenter should provide a label for the data point in the workflow file, which will be used by WEI to identify the data point.
For instance, in the following example Workflow definition, the data point generated by the take_picture
action is labeled “experiment_result”:
- name: Take Picture
module: camera
action: take_picture
args:
file_name: "experiment_result.jpg"
data_labels:
image: "experiment_result"
If this step were part of a Workflow Run, the data point generated by the take_picture
action would be labeled “experiment_result”. This label would be used to identify the data point in the WEI database and in the results of the experiment. If the user does not provide a label, the default will be used for each data point returned by a module (in this case, “image”)
Retrieving Data Points
Data points can always be retrieved by their ID using the WEI REST API (/data/{datapoint_id}) or using the wei.experiment_client.ExperimentClient.get_datapoint_value()
and wei.experiment_client.ExperimentClient.save_datapoint_value()
python methods. You can find the datapoint_id using the information returned in the Workflow Run object; the ids for each datapoint are included in the result.data field for the step.
For convenience, you can also use the wei.types.workflow_types.WorkflowRun.get_datapoint_id_by_label()
and wei.types.workflow_types.WorkflowRun.get_all_datapoint_ids_by_label()
methods to retrieve data points by their label.
For instance, to retrieve the data point labeled “experiment_result” from the example above, you could use the following code:
datapoint_id = wf_run.get_datapoint_id_by_label("experiment_result")
# Retrieve the data point using the ExperimentClient
value = experiment_client.get_datapoint_value(datapoint_id)
print(value)
You can retrieve all data points from a workflow with a specific label using the following code:
datapoint_ids = wf_run.get_all_datapoint_ids_by_label("experiment_result")
# Retrieve the data points using the ExperimentClient
values = [experiment_client.get_datapoint_value(datapoint_id) for datapoint_id in datapoint_ids]
print(values)
You can retrieve all data points for an experiment using the following code:
datapoints = experiment_client.get_experiment_datapoints()
values = [experiment_client.get_datapoint_value(datapoint_id) for datapoint_id in datapoints]
print(values)