Name: chef-server-oms
Owner: Chef Partners
Description: ARM Template and configuration files to create a Chef server from the Azure Marketplace with OMS Agent
Created: 2016-03-22 15:10:20.0
Updated: 2016-07-14 14:56:32.0
Pushed: 2016-04-08 12:53:51.0
Homepage: null
Size: 2407
Language: null
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Microsoft have released a new product into Azure called Operations Management Suite (OMS) which collects logs from servers and presents them in a dashboard. The dashboard is fully customisable to create rich charts and tables that can be used to show the state of the environment at a glance.
Note: This is still very beta and some feature in the dashboard may not work as expected.
The Chef server outputs a lot of information about access and authentication and these too can be captured by OMS. In order for this to happen the chef server needs to have the OMS Agent installed. The file chefserveroms.json
in the repo provides an example of what the ARM template should look like.
As with most ARM templates there is also a parameters file in the repo chefserveroms.parameters.json
which has all of the configurable parameters that are expected by the template. An explanation of each of the parameters is shown in the following table.
| Name | Description | Example Value | |:——————–|:——————————————————-|:——————| | vmName | Name of the virtual machine in Azure. | chef-server-oms-1 | | adminUsername | Name of the admin user to create on the machine | azure | | adminPassword | Password to be set for the user | | | dnsLabelPrefix | Name to be added to the domain to create the FQDN | chef-server-oms-1 | | chefServerSKU | The type of chef server being used | chefbyol | | workspaceId | The OMS workspace ID that the logs should be sent to | | | workspacePrimaryKey | The primary key in order to access the named workspace | |
Note: The workspaceId
and workspacePrimaryKey
should be retrieved from the OMS dashboard settings.
The template will create a Chef server (of Standard_D1
size) from the Azure marketplace using the Bring Your Own License model. (If this is incorrect it can be changed to another SKU from the catalogue).
An example command to create the Chef server with OMS using the template is:
zure group deployment create -f chefserveroms.json -e chefserveroms.parameters.json oms-test-chef-server chef-server-oms-1
The output of this will be very similar to the following:
Note: The adminPassword
and workspacePrimaryKey
are undefined because they are set as securestrings in the template file.
As soon as the machine starts it will immediately start logging to the OMS system. This is because the workspaceId
and workspaceKey
were both specified in the ARM template. The following screenshot shows the initial logging of Perf
and Syslog
.
There is an issue in the OMS Agent which means that the syslogs that are gathered from the server are truncated. This is down to the way in which the logs are parsed by the agent. The following string should be placed in the syslog input
section of the /etc/opt/microsoft/omsagent/conf/omsagent.conf
file:
at /(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])? *(?<message>.*)/
After modifying the file restart the OMS Agent with service omsagent restart
.
Note: This a temporary fix from Microsoft, who are working on correcting this for when the agent is installed.
The following image shows how this looks when in the correct place in the file:
The Chef Server now needs to be configured. This can be accomplished by logging onto the server and running the following commands.
cho 'api_fqdn "<FQDN>"' | sudo tee -a /etc/chef-marketplace/marketplace.rb
udo chef-marketplace-ctl hostname <FQDN>
udo chef-marketplace-ctl setup
Where <FQDN>
is the DNS name of the server as given in the Azure Portal. The last command will ask for details such as firstname
, lastname
, email address
, organisation name
and password
. This process can take a long time as the packages are upgraded during setup.
For more details on how to configure the chef server please refer to the Microsoft Azure Portal documentation on the Chef documentation website.
After the machine has been created it needs to be configured to look at the Chef server files so that they can be forwarded to the OMS server. To achieve this rsyslog
on the server needs to be configured to tail the various log files of the chef server, as shown in the file 99-chef.conf
. This file should be copied to the /etc/rsyslog.d
directory.
At the moment the syslog
user does not have access to open the Chef log files. In order to do this it needs to be added to the opscode
group:
udo usermod -G opscode -a syslog
Now restart the rsyslog daemon:
udo service rsyslog restart
The file loads the input module which allows rsyslog to tail log files. It then defines a template for how the information from the log files should be formatted to work with OMS. This template is then used when the logs are forwarded to the local OMSAgent. Any number of log files can be listed in this file.
Now attempt to login to the Chef server (using the FQDN as set before) and then go to the OMS dashboard where the access to the Chef server will be logged. Using the supplied rsyslog configuration file all web access, web errors and authentication requests will be forwarded to OMS. It can take some time for the data to come in from the Chef server logs via rsyslog.
This screenshot shows a bifrost
event as well as an nginx
access log.
This shows the raw logs that are coming in from the Chef server. To make this more meaningful and to be able to use the advanced features of OMS such as Custom Fields some logs need to be generated.
In order to set up custom fields in OMS, some data needs to be sent to the system so that there is information to create custom fields from. To do this knife should be configured to communicate with the chef server.
Note: Make sure that the private key for the user that was created is downloaded to the local workstation.
An example knife.rb
file is shown below:
level :info
location STDOUT
_name "<USER>"
nt_key "<PATH_TO_KEY>"
_server_url "<URL>"
verify_mode :verify_none
Ensure that the <USER>
, <PATH_TO_KEY>
and <URL>
are set correctly according to the setup.
Now to test that this has worked, run the following command to get a list of the users that are on the server.
nife user list
A list of the users in the system should be returned as shown in the following output. (The user that was created for this document was russells
)
If this did indeed work then some logs will have been sent to OMS. To make sure there are some more logs to work with run some more commands:
nife user show dummy
Custom fields are fields that are defined and then applied to subsequent logs that are parsed by OMS. They can be anything that is contained with the log entry, such as the HTTP code from a Web request for example.
In this case three custom fields will be configured:
erlang_status_CF
- Status codes from bifrost
nginx_status_CF
- HTTP codes from Chef nginxchef_type_CF
- Identifies the type of the log within SyslogLog into the OMS dashboard and goto the logs for the server and find one that starts with [chef-bifrost-requests]
and then click on the three horizontal lines next to the SyslogMessage
and then click on Extract Fields from 'Syslog' (Preview).
In the next view, the log entry will be displayed. Highlight the field that is of interest using the mouse, and then a dialog box will be displayed allowing the custom field to be named, and then click 'Extract'.
As custom fields cannot be modified after they have been created a preview is given before the new field is committed.
This view shows how the logs would be seen by OMS and a summary of the entries that have met the criteria on the far left. Notice that the logs that have been highlighted only pertain to the bifrost
entries.
When the field is working as required click on the Save Extraction button which will commit the field to OMS. Custom fields are only applied when the logs are parsed so it will not be applied to logs that are already in the system. So to check that the field is working some new data needs to be generated.
nife user list
This command will cause a login to the chef server which will be seen in the bifrost logs. After a while the logs will come through and will now have a new field attached to them:
Creating the Custom Field for the nginx status is done in the exactly the same way.
For the chef_type_CF
the start of the log is used, e.g. [chef-nginx-access]
. This identifier is standard in the chosen log files for Chef and will allow the creation of queries to show the status for the different types of Chef logs.
Some things that need to be noted when using custom fields:
So what is the point of custom fields? They allow interesting data to be extracted from the long string of a log file, effectively turning the log into stuctured data. This can then be used to generate custom charts and alerts based on these values.
The first query that will be created is to show the status codes for Nginx in the past day. This is specifically targeted at one Chef server, but that could be left out so all of the requests against all chef servers are reported. The query string for this query is as follows:
uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access | measure count() by nginx_status_CF
Note: The computer is the name in the ARM template when the machine was created by Azure.
Now click on the 'Save' icon on at the top of the screen and fill in the saved search parameters.
A similar query for the Bifrost requests can be created, the querystring for this is:
uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-bifrost-requests | measure count() by erlang_status_CF
Save this new query as 'Bifrost Status'.
Now that a couple of queries have been generated they can be added to the Dashboard. From the dashboard click on the 'Customize' button at the top of the page and highlight the query to add and then click on the '+' to the right of the query. The query will now be added as a tile.
It is possible to change the view of the tile. In both cases, when in Customize mode, select the 'Edit' tab and them select the tile and the view change be changed. For example:
When all edits are complete click on the 'Customize' button again and the changes will be saved.
As more information comes into the system these charts will update. Clicking on the tile takes the user to the query that generated them which can be modified to get the necessary information out from them.
One of the things that OMS provides is the ability to generate alerts from the queries that have been setup. So far all logging shows that things are working, however some errors need to be generated to be able to setup the alerts. To do this Postgres can be shutdown which will cause Nginx and Bifrost to generate errors. On the chef server run the following command:
hef-server-ctl stop postgresql
The output from the command will be similar to the following.
To double check that the service has indeed been stopped run chef-server-ctl status
which will output a status list of all the services.
Again run some command, from the workstation, to generate some log traffic for OMS to pick up and process
nife user show russells
It is expected, and indeed desired, that errors will be thrown at this point.
After some time these will errors will be displayed in the chart on the dashboard, click on the chart to get a better resolution of the figures.
Unfortunately it is not possible to use the actual query that has been generated to display the chart as the alert system is not able to drill down into the query, it works on the total number of events, not a specific parameter. This means a slightly different query is required.
Modify the query in the query text box to the following:
uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access nginx_status_CF=500
Now click on the 'Alert' icon at the top of the screen to open the Alert form to the right of the page. Fill in the details as required, ensuring that the entire form is completed by scrolling down in the popup. The table below shows the options that should be set, and where applicable suggested values.
| Field Name | Description | Suggested Value | |:—————————|:———————————————————|:—————————————————————————————-| | Name | Name of the new Alert | Chef Nginx 500 Errors | | Saved Search | The search to use to gather the data | Select 'Use current search query' as it has been modified slightly from the saved query | | Check for this alert every | How often to check for the alert | 5 minutes (for testing) | | The number of results | Generate an alert when the count is greater than a value | Greater than 3 (to provide persistence) | | over this time window | The time window in which the count should be performed | 5 minutes |
| Field Name | Description | Suggested Value | |:——————————|:————————————–|:———————–| | Send me an email notification | Whether an email alert should be sent | Yes | | Subject | The subject of the email | Chef Nginx is Erroring | | Recipients | Recipent list of the errors | | | Enable Webhook | | No | | Enable Remediation | | No |
Once the form has been completed as required click the 'Save' button and then click 'OK' when the alert has been created.
Now to test that alerting is working run another command to generate some more errors.
nife user show russells
After sometime an email should be delivered with a notification about the errors.
To check that the queries on the dashboard are working and that information is being received by OMS it is possible to get Chef to generate some data by running some tests.
Note: Ensure that the postgresql
service is running.
hef-server-ctl test --all
This will take about an hour to run and will generate lots of API requests which will be picked up by OMS. As the dashboard graph is based on a 7 day time scale that cannot be changed the test will not show up very well. However by going into the query and changing it slightly to (force a graphical output) a chart like this can be produced.
Note: The tests will generate HTTP 500 error codes which will be picked up by the alert that was created before hand thus a lot of emails will be recieved.
OMS has a magic keyword interval
that shows the results visually in a graph, this is the query used:
uter="chef-server-oms-1" Type=Syslog chef_type_CF=chef-nginx-access | measure count() by nginx_status_CF interval 1MINUTE