chef-partners/chef-server-oms

Name: chef-server-oms

Owner: Chef Partners

Description: ARM Template and configuration files to create a Chef server from the Azure Marketplace with OMS Agent

Created: 2016-03-22 15:10:20.0

Updated: 2016-07-14 14:56:32.0

Pushed: 2016-04-08 12:53:51.0

Homepage: null

Size: 2407

Language: null

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

OMS Logging for Chef

Microsoft have released a new product into Azure called Operations Management Suite (OMS) which collects logs from servers and presents them in a dashboard. The dashboard is fully customisable to create rich charts and tables that can be used to show the state of the environment at a glance.

Note: This is still very beta and some feature in the dashboard may not work as expected.

Setting up the server

The Chef server outputs a lot of information about access and authentication and these too can be captured by OMS. In order for this to happen the chef server needs to have the OMS Agent installed. The file chefserveroms.json in the repo provides an example of what the ARM template should look like.

As with most ARM templates there is also a parameters file in the repo chefserveroms.parameters.json which has all of the configurable parameters that are expected by the template. An explanation of each of the parameters is shown in the following table.

| Name | Description | Example Value | |:——————–|:——————————————————-|:——————| | vmName | Name of the virtual machine in Azure. | chef-server-oms-1 | | adminUsername | Name of the admin user to create on the machine | azure | | adminPassword | Password to be set for the user | | | dnsLabelPrefix | Name to be added to the domain to create the FQDN | chef-server-oms-1 | | chefServerSKU | The type of chef server being used | chefbyol | | workspaceId | The OMS workspace ID that the logs should be sent to | | | workspacePrimaryKey | The primary key in order to access the named workspace | |

Note: The workspaceId and workspacePrimaryKey should be retrieved from the OMS dashboard settings.

The template will create a Chef server (of Standard_D1 size) from the Azure marketplace using the Bring Your Own License model. (If this is incorrect it can be changed to another SKU from the catalogue).

An example command to create the Chef server with OMS using the template is:

zure group deployment create -f chefserveroms.json -e chefserveroms.parameters.json oms-test-chef-server chef-server-oms-1

The output of this will be very similar to the following:

Deploy Chef Server with OMSAgent

Note: The adminPassword and workspacePrimaryKey are undefined because they are set as securestrings in the template file.

As soon as the machine starts it will immediately start logging to the OMS system. This is because the workspaceId and workspaceKey were both specified in the ARM template. The following screenshot shows the initial logging of Perf and Syslog.

OMS Dashboard Initial View

Fix Truncation issue in OMS Agent

There is an issue in the OMS Agent which means that the syslogs that are gathered from the server are truncated. This is down to the way in which the logs are parsed by the agent. The following string should be placed in the syslog input section of the /etc/opt/microsoft/omsagent/conf/omsagent.conf file:

at /(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])? *(?<message>.*)/

After modifying the file restart the OMS Agent with service omsagent restart.

Note: This a temporary fix from Microsoft, who are working on correcting this for when the agent is installed.

The following image shows how this looks when in the correct place in the file:

OMS Agent Syslog Format

Configure the Chef Server

The Chef Server now needs to be configured. This can be accomplished by logging onto the server and running the following commands.

cho 'api_fqdn "<FQDN>"' | sudo tee -a /etc/chef-marketplace/marketplace.rb
udo chef-marketplace-ctl hostname <FQDN>
udo chef-marketplace-ctl setup

Where <FQDN> is the DNS name of the server as given in the Azure Portal. The last command will ask for details such as firstname, lastname, email address, organisation name and password. This process can take a long time as the packages are upgraded during setup.

For more details on how to configure the chef server please refer to the Microsoft Azure Portal documentation on the Chef documentation website.

Configuring the Logging

After the machine has been created it needs to be configured to look at the Chef server files so that they can be forwarded to the OMS server. To achieve this rsyslog on the server needs to be configured to tail the various log files of the chef server, as shown in the file 99-chef.conf. This file should be copied to the /etc/rsyslog.d directory.

At the moment the syslog user does not have access to open the Chef log files. In order to do this it needs to be added to the opscode group:

udo usermod -G opscode -a syslog

Now restart the rsyslog daemon:

udo service rsyslog restart

The file loads the input module which allows rsyslog to tail log files. It then defines a template for how the information from the log files should be formatted to work with OMS. This template is then used when the logs are forwarded to the local OMSAgent. Any number of log files can be listed in this file.

Now attempt to login to the Chef server (using the FQDN as set before) and then go to the OMS dashboard where the access to the Chef server will be logged. Using the supplied rsyslog configuration file all web access, web errors and authentication requests will be forwarded to OMS. It can take some time for the data to come in from the Chef server logs via rsyslog.

OMS Dashboard Chef Logs

This screenshot shows a bifrost event as well as an nginx access log.

This shows the raw logs that are coming in from the Chef server. To make this more meaningful and to be able to use the advanced features of OMS such as Custom Fields some logs need to be generated.

Setup Knife

In order to set up custom fields in OMS, some data needs to be sent to the system so that there is information to create custom fields from. To do this knife should be configured to communicate with the chef server.

Note: Make sure that the private key for the user that was created is downloaded to the local workstation.

An example knife.rb file is shown below:

level   :info
location    STDOUT
_name   "<USER>"
nt_key  "<PATH_TO_KEY>"
_server_url "<URL>"
verify_mode :verify_none

Ensure that the <USER>, <PATH_TO_KEY> and <URL> are set correctly according to the setup.

Now to test that this has worked, run the following command to get a list of the users that are on the server.

nife user list

A list of the users in the system should be returned as shown in the following output. (The user that was created for this document was russells)

Knife User List

If this did indeed work then some logs will have been sent to OMS. To make sure there are some more logs to work with run some more commands:

nife user show dummy
Setting up custom fields

Custom fields are fields that are defined and then applied to subsequent logs that are parsed by OMS. They can be anything that is contained with the log entry, such as the HTTP code from a Web request for example.

In this case three custom fields will be configured:

  1. erlang_status_CF - Status codes from bifrost
  2. nginx_status_CF - HTTP codes from Chef nginx
  3. chef_type_CF - Identifies the type of the log within Syslog

Log into the OMS dashboard and goto the logs for the server and find one that starts with [chef-bifrost-requests] and then click on the three horizontal lines next to the SyslogMessage and then click on Extract Fields from 'Syslog' (Preview).

Bifrost Extract Fields

In the next view, the log entry will be displayed. Highlight the field that is of interest using the mouse, and then a dialog box will be displayed allowing the custom field to be named, and then click 'Extract'.

Bifrost Custom Field

As custom fields cannot be modified after they have been created a preview is given before the new field is committed.

Bifrost Preview

This view shows how the logs would be seen by OMS and a summary of the entries that have met the criteria on the far left. Notice that the logs that have been highlighted only pertain to the bifrost entries.

When the field is working as required click on the Save Extraction button which will commit the field to OMS. Custom fields are only applied when the logs are parsed so it will not be applied to logs that are already in the system. So to check that the field is working some new data needs to be generated.

nife user list

This command will cause a login to the chef server which will be seen in the bifrost logs. After a while the logs will come through and will now have a new field attached to them:

Bifrost Custom Field

Creating the Custom Field for the nginx status is done in the exactly the same way.

For the chef_type_CF the start of the log is used, e.g. [chef-nginx-access]. This identifier is standard in the chosen log files for Chef and will allow the creation of queries to show the status for the different types of Chef logs.

Some things that need to be noted when using custom fields:

So what is the point of custom fields? They allow interesting data to be extracted from the long string of a log file, effectively turning the log into stuctured data. This can then be used to generate custom charts and alerts based on these values.

Creating queries

The first query that will be created is to show the status codes for Nginx in the past day. This is specifically targeted at one Chef server, but that could be left out so all of the requests against all chef servers are reported. The query string for this query is as follows:

uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access | measure count() by nginx_status_CF

Note: The computer is the name in the ARM template when the machine was created by Azure.

Now click on the 'Save' icon on at the top of the screen and fill in the saved search parameters.

Nginx Status Query

A similar query for the Bifrost requests can be created, the querystring for this is:

uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-bifrost-requests | measure count() by erlang_status_CF

Save this new query as 'Bifrost Status'.

Adding to the Dashboard

Now that a couple of queries have been generated they can be added to the Dashboard. From the dashboard click on the 'Customize' button at the top of the page and highlight the query to add and then click on the '+' to the right of the query. The query will now be added as a tile.

It is possible to change the view of the tile. In both cases, when in Customize mode, select the 'Edit' tab and them select the tile and the view change be changed. For example:

Dashboard Tile View

When all edits are complete click on the 'Customize' button again and the changes will be saved.

As more information comes into the system these charts will update. Clicking on the tile takes the user to the query that generated them which can be modified to get the necessary information out from them.

Setting up Alerting

One of the things that OMS provides is the ability to generate alerts from the queries that have been setup. So far all logging shows that things are working, however some errors need to be generated to be able to setup the alerts. To do this Postgres can be shutdown which will cause Nginx and Bifrost to generate errors. On the chef server run the following command:

hef-server-ctl stop postgresql

The output from the command will be similar to the following.

Stop Postgres

To double check that the service has indeed been stopped run chef-server-ctl status which will output a status list of all the services.

Chef Service Status

Again run some command, from the workstation, to generate some log traffic for OMS to pick up and process

nife user show russells

It is expected, and indeed desired, that errors will be thrown at this point.

Knife 500 Errors

After some time these will errors will be displayed in the chart on the dashboard, click on the chart to get a better resolution of the figures.

Nginx HTTP Status

Unfortunately it is not possible to use the actual query that has been generated to display the chart as the alert system is not able to drill down into the query, it works on the total number of events, not a specific parameter. This means a slightly different query is required.

Modify the query in the query text box to the following:

uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access nginx_status_CF=500

Now click on the 'Alert' icon at the top of the screen to open the Alert form to the right of the page. Fill in the details as required, ensuring that the entire form is completed by scrolling down in the popup. The table below shows the options that should be set, and where applicable suggested values.

| Field Name | Description | Suggested Value | |:—————————|:———————————————————|:—————————————————————————————-| | Name | Name of the new Alert | Chef Nginx 500 Errors | | Saved Search | The search to use to gather the data | Select 'Use current search query' as it has been modified slightly from the saved query | | Check for this alert every | How often to check for the alert | 5 minutes (for testing) | | The number of results | Generate an alert when the count is greater than a value | Greater than 3 (to provide persistence) | | over this time window | The time window in which the count should be performed | 5 minutes |

Nginx Alert Form Section 1

| Field Name | Description | Suggested Value | |:——————————|:————————————–|:———————–| | Send me an email notification | Whether an email alert should be sent | Yes | | Subject | The subject of the email | Chef Nginx is Erroring | | Recipients | Recipent list of the errors | | | Enable Webhook | | No | | Enable Remediation | | No |

Nginx Alert Form Section 2

Once the form has been completed as required click the 'Save' button and then click 'OK' when the alert has been created.

Now to test that alerting is working run another command to generate some more errors.

nife user show russells

After sometime an email should be delivered with a notification about the errors.

Nginx Alert Email

Generating data

To check that the queries on the dashboard are working and that information is being received by OMS it is possible to get Chef to generate some data by running some tests.

Note: Ensure that the postgresql service is running.

hef-server-ctl test --all

This will take about an hour to run and will generate lots of API requests which will be picked up by OMS. As the dashboard graph is based on a 7 day time scale that cannot be changed the test will not show up very well. However by going into the query and changing it slightly to (force a graphical output) a chart like this can be produced.

Chef Server Tests

Note: The tests will generate HTTP 500 error codes which will be picked up by the alert that was created before hand thus a lot of emails will be recieved.

OMS has a magic keyword interval that shows the results visually in a graph, this is the query used:

uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access | measure count() by nginx_status_CF interval 1MINUTE

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.