cloudfoundry/diego-perf-release

Name: diego-perf-release

Owner: Cloud Foundry

Description: Used to deploy multiple app-pushers and fezzik-runners for performance-testing a Diego(+Runtime) deployment

Created: 2015-02-05 22:14:42.0

Updated: 2018-04-23 13:46:18.0

Pushed: 2018-02-07 22:29:07.0

Homepage: null

Size: 1237

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

BOSH Diego Performance Release

This is a release to measure the performance of Diego. See the proposal here.

Usage

Note: To deploy with a cf-deployment style manifest using BOSH 2.0, include the ops file under operations/add-diego-perf-release.yml. You will also need to modify the ops file to use your local copy of diego-perf-release:

e:
th: /releases/-
lue:
name: diego-perf-release
url: file://<path-to-workspace>/diego-perf-release
version: create
Prerequisites

Deploy diego-release, cf-release. To deploy this release, create a BOSH deployment manifest with as many pusher instances as you want to use for testing.

Running Fezzik
  1. bosh ssh stress_tests 0
  2. Run /var/vcap/jobs/caddy/bin/1_fezzik multiple times.
  3. Output is stored in /var/vcap/packages/fezzik/src/github.com/cloudfoundry-incubator/fezzik/reports.json
Running Cedar
Automatically Running 10 Batches of Cedar (Preferred)

The steps mentioned in the previous section are automated by the ./cedar_script. The script will push 10 batches of apps each in its own spaces. Details below on how to run it:

  1. Run cd /var/vcap/jobs/cedar/bin.

  2. Run the following command to run the experiment:

    dar_script
    
  3. To resume the experiment from the nth batch (where n is a number from 1 to 10), add n as an argument to the script. For example, to run from the fourth batch:

    dar_script 4
    

    Note: if the spaces are already present from a previous run of the script, the script will not fail and will instead continue to push to those existing spaces. Manually delete spaces or the entire CF org if required.

This script also then pushes an extra batch of apps via cedar and monitors them with arborist. The file /var/vcap/sys/log/cedar/cedar-arborist-output.json contains the results from that cedar run, and the file /var/vcap/sys/log/arborist/arborist-output.json contains the arborist results.

The script will also output the min/max timestamp for each batch in /var/vcap/data/cedar/min-<batch#>.json and /var/vcap/data/cedar/max-<batch#>.json.

Running Cedar from a BOSH deployment
  1. Run ./scripts/generate-deployment-manifest and deploy diego-perf-release with the generated manifest. If on BOSH-Lite, you can use ./scripts/generate-bosh-lite-manifests.
  2. Run bosh ssh to SSH to the cedar VM in the cf-warden-diego-perf deployment.
  3. Run sudo su.
  4. Run the following commands:
    t the CF CLI on the PATH
    rt PATH=/var/vcap/packages/cf-cli/bin:$PATH
    
    rget CF and create an org and space for the apps
    pi api.bosh-lite.com --skip-ssl-validation
    uth admin admin
    reate-org o
    reate-space cedar -o o
    arget -o o -s cedar
    
    var/vcap/packages/cedar
    
    /vcap/packages/cedar/bin/cedar \
    1 \
    2 \
    yload /var/vcap/packages/cedar/assets/temp-app \
    nfig /var/vcap/packages/cedar/config.json \
    main bosh-lite.com \
    
    
Running Cedar Locally
  1. Target a CF deployment.
  2. Target a chosen org and space.
  3. From the root of this repo, run cd src/code.cloudfoundry.org/diego-stress-tests/cedar/assets/stress-app.
  4. Precompile the stress-app to assets/temp-app by running GOOS=linux GOARCH=amd64 go build -o ../temp-app/stress-app.
  5. Run cd ../.. to change back to src/code.cloudfoundry.org/diego-stress-tests/cedar.
  6. Run go build to build the cedar binary.
  7. Run the following to start a test:
    dar -n <number_of_batches> -k <max_in_flight> [-tolerance <tolerance-factor>]
    

Run ./cedar -h to see the list of options you can provide to cedar. One of the most important options is a JSON-encoded config file that provides the manifest paths for the different apps being pushed. The default config.json can be found here.

Run Arborist from a BOSH deployment

Note: Arborist depends on a successful cedar run, as it uses the output file from cedar as an input.

Run the example below to monitor apps on a BOSH-Lite installation:

  1. Run ./scripts/generate-bosh-lite-manifests and deploy diego-perf-release with the generated manifest.
  2. Run bosh ssh to SSH to the cedar VM in the cf-warden-diego-perf deployment.
  3. Run sudo su.
  4. Run the following commands to run arborist from a tmux session:
    art a new tmux session
    /vcap/packages/tmux/bin/tmux new -s arborist
    
    var/vcap/packages/arborist
    
    /vcap/packages/arborist/bin/arborist \
    -file <cedar-output-file> \
    ation 10m \
    Level info \
    uest-interval 10s \
    ult-file output.json &
    
  5. To detach from the tmux session, send Ctrl-b d.
  6. To reattach to the tmux session, run /var/vcap/packages/tmux/bin/tmux attach -t arborist.
Run Arborist Locally
  1. cd to src/code.cloudfoundry.org/diego-stress-tests/arborist

  2. Build the arborist binary with go build.

  3. Run the following to start a test:

    borist \
    -file <cedar-output-file> \
    ation 10m \
    Level info \
    uest-interval 10s \
    ult-file output.json
    

Arborist has the following usage options:

pp-file string
    path to json application file
omain string
    domain where the applications are deployed (default "bosh-lite.com")
uration duration
    total duration to check routability of applications (default 10m0s)
ogLevel string
    log level: debug, info, error or fatal (default "info")
equest-interval duration
    interval in seconds at which to make requests to each individual app (default 1m0s)
esult-file string
    path to result file (default "output.json")
Monitoring the cluster

The team has created three grafana dashboards that include graphs to monitor interesting metrics. Below are the names and description of each one of those dashboards:

Importing dashboard

To import any/all of those dashboards. From the home page:

  1. Click on Home (or the dashboard search dropdown)
  2. Click on Import
  3. Choose a file
  4. Save the dashboard (CTRL+S or the drive icon next to the dashboard dropdown)

See grafana export/import doc for more info

Exporting dashboard

To export a dashboard after editing it, do the following:

  1. Go to the dashboard you want to export (by clicking the name in the dashboard dropdown)
  2. Click the Manage dashboard button (the gear icon next to the dashboard dropdown)
  3. Click Export
  4. Grafan will automatically download the json file

See grafana export/import doc for more info

Aggregating results
Preprocessing using perfchug

perfchug is a tool that ships with the diego-perf-release. It takes log output from cedar, bbs and auctioneer, processes it, and converts it into something that can be fed into InfluxDB.

To use perfchug locally:

  1. cd <path>/diego-perf-release/src/code.cloudfoundry.org/diego-stress-tests/perfchug.
  2. Run go install to build the executable.
  3. Move the executable into your $PATH.

Once on the $PATH, supply lager-formatted logs to perfchug on its stdin.

For example:

/var/vcap/sys/log/cedar/cedar.stdout.log | perfchug

will emit influxdb-formatted metrics to stdout.

Automatic downloading and aggregation

We wrote a script to automate the entire process. The script does the following:

  1. Download brain, bbs & cedar job logs using bosh
  2. Reduce the logs to the start/end timestamps of the experiments ran
  3. Merge the logs from all jobs together
  4. Run perfchug on the resulting log file
  5. Insert the output of perfchug into influxdb
  6. Run a fixed set of queries to get percentiles of requests latency among other interesting metrics

In order to use the script, you need to do the following:

  1. You are on a jump box inside the deployment, e.g. director

  2. You are bosh targeted to the right environment

  3. You have perfchug, veritas and bosh on your PATH

  4. Create a new directory and cd into it. This will be used as the working directory for the script. BOSH logs will be downloaded in this directory.

  5. From that directory run:

    h/to/diego_results.sh \
    ://url.to.influxdb:8086 \
    h/to/diego/manifest \
    h/to/perf/manifest \
    th/to/output/file]\
    

The output file will contain one line per query. All query results are valid json. If there are no data points in InfluxDB, e.g. no failures, InfluxDB will result an empty result, e.g. {"results":[]}

If the output file parameter is provided, diego_results.sh will also trigger a post-processing script that condenses the output into metrics.csv, a more human-readable format.

Snapshotting and Restoring Influxdb (GCP Only) Snapshotting
  1. Go to the google cloud platform dashboard, and find the influxdb instance.
  2. Find the 'additional disks' section, and click on the disk to be snapshotted.
  3. Click 'Create Snapshot' at the top of the window that opens up.
  4. Name the snapshot and click 'Create'.
Restoring a snapshotted InfluxDB
  1. Go to the google cloud platform dashboard, and find the influxdb instance.
  2. Click edit at the top of the page.
  3. Find the 'additional disks' section, and add a disk from the snapshot.
  4. Click save at the bottom of the page. The new disk will appear as /dev/sd[a-z] (where [a-z] is the next available letter for a disk name).
  5. Edit /etc/mtab on the influxdb vm to add the new filesystem from /dev/sd[a-z] to /var/vcap/store2.
  6. Run mkdir -p /var/vcap/store2 && cd /var/vcap/store2 && mount /dev/sd[a-z]1.
  7. Edit all references to /var/vcap/store -> /var/vcap/store2 in /var/vcap/jobs/influxdb.
  8. Restart influxdb with monit restart influxdb.
Development

These tests are meant to be run against a real IaaS. However, it is possible to run them against BOSH-Lite during development. A deployment manifest template is in templates/bosh-lite.yml. Use spiff to merge it with a director_uuid stub.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.