The Workflow Trace Archive

The primary purpose of the Workflow Trace Archive (WTA) is to provide (anonymized) workload traces from cluster and cloud environments to combat the lack of diverse traces avbailable to researchers and to practitioners alike.

1. Motivation

Research into cloud and clusters has increased during the past decade. While new scheduling and provisioning algorithms, architectures, and benchmarks are proposed, few real-world or realistic workloads are made open-source. A diverse set of workloads is essential to objectively compare algorithms, enable claims of performance, investigate the behavior of systems, etc. Prior work of the AtLarge group on autoscalers has demonstrated this need.
A second motivation is the lack of overview. Right now, one requires knowledge of specific articles offering their data as open-source artefact. The lack of a central repository where such datasets can be downloaded, and more importantly uploaded, hampers the spread and adaption of using a rich, and diverse set of workloads to experiment with.
Our third and final motivation is standardization. Open-source datasets differ in format, requiring researchers to implement several parsing tools to handle the different formats. To allow researchers to focus on their research, we offer all datasets in our workflow format.

2. Approach

Our approach to building the Workflow Trace Archive is:

We design a (compressed, machine readable) format that allows parsing through scripting, and through big data tools such as Apache Spark. To allow data reuse and community inter-connection, the WTA format contains a rich set of metrics carefully selected from other formats and appended by the WTA team.
We create and publish available online tools for parsing and conversion from different data formats. Generic tools for detailed insight and statistical analysis of the data in WTA format are also available as open-source data.
In the future, we will develop methods to upload traces to the Workflow Trace Archive.
We created, host, and maintain the Workflow Trace Archive website.

3. Contributing Traces

The Workflow Trace Archive welcomes all contributions meeting the criteria listed below. To add traces, please contact one of the core members of the WTA. Traces are only published if they meet the criteria and if we are empowered to do so legally. Naturally, we will give credit where its due. Criteria:

The trace originates from a computing infrastructure parsing (workloads of) workflows.
The trace is either a real-world trace taken from a (production) environment, or is synthetic and used in published scientific work (we also include these for reproducibility purposes).

4. Support and Contact

For assistance in anonimizing traces, please look at our anonimization script or reach out to us using the support and contact section.

5. The Workflow Trace Archive Team

The people involved in the Workflow Trace Archive can be seen here.

Sources

Icons are from flaticon. CSS is provided by the bootstrap project. Additionally, this site uses jQuery, fontawesome , and DataTables.