6. DSMS Apps and Pipelines

6. DSMS Apps and Pipelines#

In this tutorial we see how to create apps and run them manually

6.1: Setting up#

Before you run this tutorial: make sure to have access to a DSMS-instance of your interest, alongwith with installation of this package and have establised access to the DSMS through DSMS-SDK (refer to Connecting to DSMS)

Now let us import the needed classes and functions for this tutorial.

[1]:

from dsms import DSMS, KItem, AppConfig
import time

Now source the environmental variables from an .env file and start the DSMS-session.

[2]:

dsms = DSMS(env=".env")

6.1. Investigating Available Apps#

We can investigate which apps are available:

[3]:

dsms.app_configs

[3]:

[app:
   name: SD_Tensile_Test_Pipeline,
 app:
   name: ckan-fetch,
 app:
   name: csv_tensile_test,
 app:
   name: csv_tensile_test_f2,
 app:
   name: csv_tensile_test_three_directions,
 app:
   name: dsms-materialcard,
 app:
   name: dsms-tensile-test-analysis,
 app:
   name: excel_notch_tensile_test,
 app:
   name: excel_notched_tensile_test,
 app:
   name: excel_shear_tensile_test,
 app:
   name: excel_shear_test,
 app:
   name: excel_tensile_test,
 app:
   name: ternary-plot,
 app:
   name: testapp2,
 app:
   name: upload_double_ring_bending_test_dat,
 app:
   name: upload_double_ring_bending_test_data,
 app:
   name: upload_pilot-1_melt-spinning]

6.2 Create a new app config and apply it to a KItem#

6.2.1 Arbitrary python code#

To be defined.

6.2.2 - Data2RDF#

6.2.2.1 Prepare app and its config#

In the following example, we would like to upload some csv with some arbitrary data and describe it through an RDF. This will give us the opportunity to harmonize the entities of the data file through ontological concepts and allow us to convert values of the data frame columns in to any compatible unit we would like to have.

Fist of all, let us define the data:

[4]:

data = """A,B,C
2,1.3,1.5
7,1.8,1.9
0,2.1,2.3
5,2.6,2.8
0,3.2,3.4
6,3.7,3.9
1,4.3,4.4
7,4.8,5.0
2,5.3,5.5
8,6.0,6.1
"""

We will also give the config a defined name:

[5]:

configname = "testapp2"

As a next step, we want to create a new app specification. The specification is following the definition of an Argo Workflow. The workflow shall trigger a pipeline from a docker image with the Data2RDF package.

The image has already been deployed on the k8s cluster of the DSMS and the workflow template with the name dsms-data2rdf has been implemented previously. Hence we only need to configure our pipeline for our data shown above, which we would like to upload and describe through an RDF.

For more details about the data2rdf package, please refer to the documentation of Data2RDF mentioned above.

The parameters of the app config are defining the inputs for our Data2RDF pipeline. This e.g. are:

the parser kind (csv here)
the time series header length (1 here)
the metadata length (0 here)
the time series separator (, here)
the log level (DEBUG here)
the mapping
- A is the test time and has a unit in seconds
- B is the standard force in kilonewtons
- C is the absolut cross head travel in millimeters

[6]:

parameters = [
    {"name": "parser", "value": "csv"},
    {"name": "time_series_header_length", "value": 1},
    {"name": "metadata_length", "value": 0},
    {"name": "time_series_sep", "value": ","},
    {
        "name": "mapping",
        "value": """
            [
                {
                    "key": "A",
                    "iri": "https://w3id.org/steel/ProcessOntology/TestTime",
                    "unit": "s"
                },
                {
                    "key": "B",
                    "iri": "https://w3id.org/steel/ProcessOntology/StandardForce",
                    "unit": "kN"
                },
                {
                    "key": "C",
                    "iri": "https://w3id.org/steel/ProcessOntology/AbsoluteCrossheadTravel",
                    "unit": "mm"
                }
            ]
            """,
    },
]

Now we add the parameters to our app specification. We assign a prefix datardf- which shall generate a new name with some random characters as suffix. The workflow template with the Docker image we want to run is called dsms-data2rdf and its entrypoint is execute_pipeline.

[7]:

# Define app specification
specification = {
    "apiVersion": "argoproj.io/v1alpha1",
    "kind": "Workflow",
    "metadata": {"generateName": "data2rdf-"},
    "spec": {
        "entrypoint": "execute-pipeline",
        "workflowTemplateRef": {"name": "dsms-data2rdf-1.2.5"},
        "arguments": {"parameters": parameters},
    },
}

Now we instanciate the new app config:

[8]:

appspec = AppConfig(
    name=configname,
    specification=specification,  # this can also be a file path to a yaml file instead of a dict
    expose_sdk_config=True,
)

Please note that the expose_sdk_config is important here, since the pipeline needs to be aware of the settings of our platform and our current SDK-session.

We commit the new app config:

[9]:

dsms.add(appspec)
dsms.commit()

/app/dsms/apps/config.py:86: UserWarning: AppConfigs do not have a refresh functionality since they are already up to date after committing. You can continue normally using the app config.
  warnings.warn(

And show the app specification:

[10]:

appspec.specification

[10]:

{'apiVersion': 'argoproj.io/v1alpha1',
 'kind': 'Workflow',
 'metadata': {'generateName': 'data2rdf-'},
 'spec': {'entrypoint': 'execute-pipeline',
  'workflowTemplateRef': {'name': 'dsms-data2rdf-1.2.5'},
  'arguments': {'parameters': [{'name': 'parser', 'value': 'csv'},
    {'name': 'time_series_header_length', 'value': 1},
    {'name': 'metadata_length', 'value': 0},
    {'name': 'time_series_sep', 'value': ','},
    {'name': 'mapping',
     'value': '\n            [\n                {\n                    "key": "A",\n                    "iri": "https://w3id.org/steel/ProcessOntology/TestTime",\n                    "unit": "s"\n                },\n                {\n                    "key": "B",\n                    "iri": "https://w3id.org/steel/ProcessOntology/StandardForce",\n                    "unit": "kN"\n                },\n                {\n                    "key": "C",\n                    "iri": "https://w3id.org/steel/ProcessOntology/AbsoluteCrossheadTravel",\n                    "unit": "mm"\n                }\n            ]\n            '},
    {'name': 'request_timeout', 'value': 120},
    {'name': 'ping', 'value': True},
    {'name': 'host_url', 'value': 'https://bue.materials-data.space/'},
    {'name': 'ssl_verify', 'value': True},
    {'name': 'kitem_repo', 'value': 'knowledge-items'},
    {'name': 'encoding', 'value': 'utf-8'}]}}}

Now we would like to apply the app config to a KItem. The set the triggerUponUpload must be set to True so that the app is triggered automatically when we upload an attachment.

Additionally, we must tell the file extension for which the upload shall be triggered. Here it is .csv.

We also want to generate a qr code as avatar for the KItem with avatar={"include_qr": True}.

[11]:

item = KItem(
    name="my tensile test experiment",
    ktype_id=dsms.ktypes.Dataset,
    apps=[
        {
            "executable": appspec.name,
            "title": "data2rdf",
            "additional_properties": {
                "triggerUponUpload": True,
                "triggerUponUploadFileExtensions": [".csv"],
            },
        }
    ],
    avatar={"include_qr": True},
)

We commit the KItem:

[12]:

dsms.add(item)
dsms.commit()

Now we add our data with our attachment:

[13]:

item.attachments = [{"name": "dummy_data.csv", "content": data}]

And we commit again:

[14]:

dsms.add(item)
dsms.commit()

6.2.2.2 Get results#

Now we can verify that the data extraction was successful:

[15]:

item.refresh()

[16]:

print(item)

kitem:
  id: fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a
  name: my tensile test experiment
  ktype_id: dataset
  slug: mytensiletestexperiment-fe51bac4
  annotations: []
  attachments:
  - name: dummy_data.csv
  - name: subgraph.ttl
  linked_kitems: []
  affiliations: []
  authors:
  - user_id: 7f0e5a37-353b-4bbc-b1f1-b6ad575f562d
  avatar_exists: false
  contacts: []
  created_at: 2025-08-14 15:44:01.267904
  updated_at: 2025-08-14 15:44:01.267904
  external_links: []
  apps:
  - executable: testapp2
    title: data2rdf
    description: null
    tags: null
    additional_properties:
      triggerUponUpload: true
      triggerUponUploadFileExtensions:
      - .csv
  user_groups: []
  dataframe:
  - id: &id001 !!python/object:uuid.UUID
      int: 338048275090092658041595342616544333210
      is_safe: 0
    column_id: 0
    name: TestTime
  - id: *id001
    column_id: 1
    name: StandardForce
  - id: *id001
    column_id: 2
    name: AbsoluteCrossheadTravel
  rdf_exists: true
  contexts: []

And also that the RDF generation was successful:

[17]:

print(item.subgraph.serialize())

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <http://qudt.org/schema/qudt/> .
@prefix ns2: <http://xmlns.com/foaf/spec/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/dataset> a dcat:Dataset ;
    dcterms:hasPart <https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/tableGroup> ;
    dcat:distribution [ a dcat:Distribution ;
            dcat:accessURL "https://bue.materials-data.space/api/knowledge/data_api/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a"^^xsd:anyURI ;
            dcat:mediaType "http://www.iana.org/assignments/media-types/text/csv"^^xsd:anyURI ] .

<https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/AbsoluteCrossheadTravel> a <https://w3id.org/steel/ProcessOntology/AbsoluteCrossheadTravel> ;
    ns1:hasUnit "http://qudt.org/vocab/unit/MilliM"^^xsd:anyURI .

<https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/StandardForce> a <https://w3id.org/steel/ProcessOntology/StandardForce> ;
    ns1:hasUnit "http://qudt.org/vocab/unit/KiloN"^^xsd:anyURI .

<https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/TestTime> a <https://w3id.org/steel/ProcessOntology/TestTime> ;
    ns1:hasUnit "http://qudt.org/vocab/unit/SEC"^^xsd:anyURI .

<https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/tableGroup> a csvw:TableGroup ;
    csvw:table [ a csvw:Table ;
            rdfs:label "Dataframe" ;
            csvw:tableSchema [ a csvw:Schema ;
                    csvw:column [ a csvw:Column ;
                            ns1:quantity <https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/StandardForce> ;
                            csvw:titles "B"^^xsd:string ;
                            ns2:page [ a ns2:Document ;
                                    dcterms:format "https://www.iana.org/assignments/media-types/application/json"^^xsd:anyURI ;
                                    dcterms:identifier "https://bue.materials-data.space/api/knowledge/data_api/column-1"^^xsd:anyURI ;
                                    dcterms:type "http://purl.org/dc/terms/Dataset"^^xsd:anyURI ] ],
                        [ a csvw:Column ;
                            ns1:quantity <https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/TestTime> ;
                            csvw:titles "A"^^xsd:string ;
                            ns2:page [ a ns2:Document ;
                                    dcterms:format "https://www.iana.org/assignments/media-types/application/json"^^xsd:anyURI ;
                                    dcterms:identifier "https://bue.materials-data.space/api/knowledge/data_api/column-0"^^xsd:anyURI ;
                                    dcterms:type "http://purl.org/dc/terms/Dataset"^^xsd:anyURI ] ],
                        [ a csvw:Column ;
                            ns1:quantity <https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a/AbsoluteCrossheadTravel> ;
                            csvw:titles "C"^^xsd:string ;
                            ns2:page [ a ns2:Document ;
                                    dcterms:format "https://www.iana.org/assignments/media-types/application/json"^^xsd:anyURI ;
                                    dcterms:identifier "https://bue.materials-data.space/api/knowledge/data_api/column-2"^^xsd:anyURI ;
                                    dcterms:type "http://purl.org/dc/terms/Dataset"^^xsd:anyURI ] ] ] ] .

And now we are able to convert our data into any compatiable unit we want. For the StandardForce, it was previously kN, but we want to have it in N now:

[18]:

item.dataframe.StandardForce.convert_to("N")

[18]:

[1300.0,
 1800.0,
 2100.0,
 2600.0,
 3200.0,
 3700.0,
 4300.0,
 4800.0,
 5300.0,
 6000.0]

6.2.2.3 Manipulate dataframe#

We are able to retrieve the dataframe as pd.DataFrame:

[19]:

item.dataframe.to_df()

[19]:

	TestTime	StandardForce	AbsoluteCrossheadTravel
0	1.2	1.3	1.5
1	1.7	1.8	1.9
2	2.0	2.1	2.3
3	2.5	2.6	2.8
4	3.0	3.2	3.4
5	3.6	3.7	3.9
6	4.1	4.3	4.4
7	4.7	4.8	5.0
8	5.2	5.3	5.5
9	5.8	6.0	6.1

We are able to overwrite the dataframe with new data:

[20]:

item.dataframe = {
    "TestTime": list(range(100)),
    "StandardForce": list(range(1,101)),
    "AbsoluteCrossheadTravel": list(range(2,102))
}
dsms.add(item)
dsms.commit()

We are able to retrieve the data colum-wise:

[21]:

for column in item.dataframe:
    print("column:", column.name, ",\n", "data:", column.get())

column: TestTime ,
 data: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
column: StandardForce ,
 data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
column: AbsoluteCrossheadTravel ,
 data: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]

… and also to modify the dataframe directly as we need:

[22]:

new_df = item.dataframe.to_df().drop(['TestTime'], axis=1)
item.dataframe = new_df

6.2.2.4 Run app on demand#

We are able to run the app on demand, not being triggered automatically during the upload of an attachment every time. For this purpose, we just need to refer to the name of the app we assigned during the KItem creation ( here it is simply data2rdf).

Additionally, we need to tell the attachment_name and hand over the access token and host url to the app by explicitly setting expose_sdk_config to True. This is basically telling the SDK that the app is using the SDK also internally and that the app should receive its parameters from the current SDK session.

The app is running synchronously, hence the job is created when the pipeline run finished.

[23]:

job = item.apps.by_title["data2rdf"].run(
    attachment_name="dummy_data.csv",
    expose_sdk_config=True
)

We are able to retrieve the job status:

[24]:

job.status

[24]:

job_status:
  phase: Succeeded
  finished_at: 08/14/2025, 15:44:57
  started_at: 08/14/2025, 15:44:37
  progress: 1/1

… and the job logs:

[25]:

job.logs

[25]:

'"[2025-08-14 15:44:41,262 - dsms_data2rdf.main - INFO]: Fetch KItem: \\n kitem:\\n  id: fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a\\n  name: my tensile test experiment\\n  ktype_id: dataset\\n  slug: mytensiletestexperiment-fe51bac4\\n  annotations: []\\n  attachments:\\n  - name: dummy_data.csv\\n  - name: subgraph.ttl\\n  linked_kitems: []\\n  affiliations: []\\n  authors:\\n  - user_id: 7f0e5a37-353b-4bbc-b1f1-b6ad575f562d\\n  avatar_exists: false\\n  contacts: []\\n  created_at: 2025-08-14 15:44:01.267904\\n  updated_at: 2025-08-14 15:44:01.267904\\n  external_links: []\\n  apps:\\n  - executable: testapp2\\n    title: data2rdf\\n    description: null\\n    tags: null\\n    additional_properties:\\n      triggerUponUpload: true\\n      triggerUponUploadFileExtensions:\\n      - .csv\\n  user_groups: []\\n  rdf_exists: true\\n  contexts: []\\n\\n[2025-08-14 15:44:41,283 - dsms_data2rdf.main - INFO]: Run pipeline with the following parser arguments: {\'metadata_sep\': \',\', \'metadata_length\': 0, \'time_series_sep\': \',\', \'time_series_header_length\': 1, \'drop_na\': True, \'fillna\': None}\\n[2025-08-14 15:44:41,283 - dsms_data2rdf.main - INFO]: Run pipeline with the following parser: Parser.csv\\n[2025-08-14 15:44:41,283 - dsms_data2rdf.main - INFO]: Run pipeline with the following config: {\'base_iri\': \'https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a\', \'data_download_uri\': \'https://bue.materials-data.space/api/knowledge/data_api/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a\', \'graph_identifier\': \'https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a\', \'separator\': \'/\', \'encoding\': \'utf-8\'}\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the \'model_fields\' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\\n  for key, value in self.model_fields.items():\\n[2025-08-14 15:44:46,555 - dsms_data2rdf.main - INFO]: Pipeline finished.\\n[2025-08-14 15:44:46,555 - dsms_data2rdf.main - INFO]: Pipeline did detect any metadata. Will not make annotations for KItem\\n[2025-08-14 15:44:46,555 - dsms_data2rdf.main - INFO]: Extracted Time Series: Index([\'TestTime\', \'StandardForce\', \'AbsoluteCrossheadTravel\'], dtype=\'object\')\\n[2025-08-14 15:44:47,150 - dsms_data2rdf.main - INFO]: Checking that dataframe is up to date.\\n[2025-08-14 15:44:47,150 - dsms_data2rdf.main - INFO]: Dataframe upload was successful after 0 retries.\\n[2025-08-14 15:44:47,150 - dsms_data2rdf.main - INFO]: Done!\\n"'

In case we would like to run the job in the background, we simply add a wait=False:

[26]:

job = item.apps.by_title["data2rdf"].run(
    attachment_name="dummy_data.csv",
    expose_sdk_config=True,
    wait=False,
)

We are able to monitor the job status and logs asynchronously:

[27]:

while True:
    time.sleep(1)
    print(job.status)
    print("Current logs:")
    print(job.logs)
    print("\n")
    if job.status.phase != "Running":
        break

job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Running
  started_at: 08/14/2025, 15:44:58
  progress: 0/1

Current logs:
""


job_status:
  phase: Succeeded
  finished_at: 08/14/2025, 15:45:18
  started_at: 08/14/2025, 15:44:58
  progress: 1/1

Current logs:
"[2025-08-14 15:45:02,348 - dsms_data2rdf.main - INFO]: Fetch KItem: \n kitem:\n  id: fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a\n  name: my tensile test experiment\n  ktype_id: dataset\n  slug: mytensiletestexperiment-fe51bac4\n  annotations: []\n  attachments:\n  - name: dummy_data.csv\n  - name: subgraph.ttl\n  linked_kitems: []\n  affiliations: []\n  authors:\n  - user_id: 7f0e5a37-353b-4bbc-b1f1-b6ad575f562d\n  avatar_exists: false\n  contacts: []\n  created_at: 2025-08-14 15:44:01.267904\n  updated_at: 2025-08-14 15:44:01.267904\n  external_links: []\n  apps:\n  - executable: testapp2\n    title: data2rdf\n    description: null\n    tags: null\n    additional_properties:\n      triggerUponUpload: true\n      triggerUponUploadFileExtensions:\n      - .csv\n  user_groups: []\n  rdf_exists: true\n  contexts: []\n\n[2025-08-14 15:45:02,370 - dsms_data2rdf.main - INFO]: Run pipeline with the following parser arguments: {'metadata_sep': ',', 'metadata_length': 0, 'time_series_sep': ',', 'time_series_header_length': 1, 'drop_na': True, 'fillna': None}\n[2025-08-14 15:45:02,370 - dsms_data2rdf.main - INFO]: Run pipeline with the following parser: Parser.csv\n[2025-08-14 15:45:02,370 - dsms_data2rdf.main - INFO]: Run pipeline with the following config: {'base_iri': 'https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a', 'data_download_uri': 'https://bue.materials-data.space/api/knowledge/data_api/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a', 'graph_identifier': 'https://bue.materials-data.space/fe51bac4-bc4d-4067-bcf4-60c2cd0acd9a', 'separator': '/', 'encoding': 'utf-8'}\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n/usr/local/lib/python3.10/site-packages/data2rdf/config.py:87: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.\n  for key, value in self.model_fields.items():\n[2025-08-14 15:45:07,804 - dsms_data2rdf.main - INFO]: Pipeline finished.\n[2025-08-14 15:45:07,804 - dsms_data2rdf.main - INFO]: Pipeline did detect any metadata. Will not make annotations for KItem\n[2025-08-14 15:45:07,804 - dsms_data2rdf.main - INFO]: Extracted Time Series: Index(['TestTime', 'StandardForce', 'AbsoluteCrossheadTravel'], dtype='object')\n[2025-08-14 15:45:08,343 - dsms_data2rdf.main - INFO]: Checking that dataframe is up to date.\n[2025-08-14 15:45:08,343 - dsms_data2rdf.main - INFO]: Dataframe upload was successful after 0 retries.\n[2025-08-14 15:45:08,344 - dsms_data2rdf.main - INFO]: Done!\n"

IMPORTANT: When job has run asychronously (in the background), we need to manually refresh the KItem afterwards:

[28]:

item.refresh()

Clean up the DSMS from the tutorial

[29]:

del dsms[item]
del dsms[appspec]
dsms.commit()

	TestTime	StandardForce	AbsoluteCrossheadTravel
0	1.2	1.3	1.5
1	1.7	1.8	1.9
2	2.0	2.1	2.3
3	2.5	2.6	2.8
4	3.0	3.2	3.4
5	3.6	3.7	3.9
6	4.1	4.3	4.4
7	4.7	4.8	5.0
8	5.2	5.3	5.5
9	5.8	6.0	6.1

	TestTime	StandardForce	AbsoluteCrossheadTravel
0	1.2	1.3	1.5
1	1.7	1.8	1.9
2	2.0	2.1	2.3
3	2.5	2.6	2.8
4	3.0	3.2	3.4
5	3.6	3.7	3.9
6	4.1	4.3	4.4
7	4.7	4.8	5.0
8	5.2	5.3	5.5
9	5.8	6.0	6.1