Name: clusterdata
Owner: Alibaba
Description: cluster data collected from production clusters in Alibaba for cluster management research
Created: 2017-09-05 03:16:34.0
Updated: 2018-05-10 13:58:52.0
Pushed: 2018-03-12 03:45:43.0
Size: 156
Language: null
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The trace data, ClusterData201708, contains cluster information of a production cluster in 12 hours period (see note below), and contains about 1.3k machines that run both online service and batch jobs.
The data is provided to address the challenges Alibaba face in idcs where online services and batch jobs are co-allocated. We distill the challenges as the following topics:
Please let us know if you have any issues, ideas, or papers about these data by sending email to us aliababa-clusterdata. The more specific the feedback, the more likely we are to be able to help you.
note for 12 hours period: although the data for server and batch spans about 24hours, data for containers is refined to 12 hours. We will release another version in near future.
The format of trace data is described in the schema description, and defined in the specification file schema.csv in the repository.
The data is stored in Alibaba Cloud Object Storage Service. You do not need to have an Alibaba account or sign up for Object Storage Service to download the data.
Downloading information can be found (after a short survey) in this link. We use the contact information to keep in touch with you, and announce goodies such as new traces. Included with the trace is a SHA256SUM file, which can be used to verify the integrity of a download, using the sha256sum command from GNU coreutils using a command like
56sum --check SHA256SUM