The WTA trace format conists of 7 objects: Workload, Workflow, Tasks, TaskState, Resource, ResourceState, and DataTransfer. Each of the objects contains their a version field to enable contained updates and a unique set of properties. The format and relations between objects can best be oserved in the figure below.
The data is offered in parquet format, compressed using the Snappy compression algorithm. Parquet is the de facto columnar standard in Big Data, and is much alike SQL tables and Pandas Dataframes in Python. Parquet reading libraries exist in many popular languages, including Java and Python.
The WTA offers several parse scripts to parse other trace formats. All parse scripts are available on our GitHub wta-tools repository. The current parse scripts include, but may not be limited to: