The primary purpose of the Workflow Trace Archive (WTA) is to provide (anonymized) workload traces from cluster and cloud environments to combat the lack of diverse traces avbailable to researchers and to practitioners alike.
Research into cloud and clusters has increased during the past decade. While new scheduling and provisioning algorithms, architectures, and benchmarks are proposed, few real-world or realistic workloads are made open-source.
A diverse set of workloads is essential to objectively compare algorithms, enable claims of performance, investigate the behavior of systems, etc.
Prior work of the AtLarge group on autoscalers has demonstrated this need.
A second motivation is the lack of overview. Right now, one requires knowledge of specific articles offering their data as open-source artefact. The lack of a central repository where such datasets can be downloaded, and more importantly uploaded, hampers the spread and adaption of using a rich, and diverse set of workloads to experiment with.
Our third and final motivation is standardization. Open-source datasets differ in format, requiring researchers to implement several parsing tools to handle the different formats. To allow researchers to focus on their research, we offer all datasets in our workflow format.
Our approach to building the Workflow Trace Archive is:
The Workflow Trace Archive welcomes all contributions meeting the criteria listed below. To add traces, please contact one of the core members of the WTA. Traces are only published if they meet the criteria and if we are empowered to do so legally. Naturally, we will give credit where its due. Criteria:
The people involved in the Workflow Trace Archive can be seen here.