4 min read

Achieving Common Ground

What are some tools to help developers and operations achieve common ground?

There are tools which can help, but it’s more effective to work out the flow of information between the teams and use tools to facilitate the transfer.

Dashboards for Metrics

Lets take a web service as an example. Begin with two to four metrics which are common and relevant across the teams

  • rate of requests
  • rate of error (http_code > 499)
  • latency

They should be in an easy to remember location and available to everyone. Put them on a TV in the common space! Base your monitoring off it!

It’s easy to collaborate when you are looking and talking about the same thing.

Access

It starts with “Trust”

I don’t want to give access as they may cause an outage!

I have heard variations of the above so many times. Why will “they” who are on the same team, intentionally bring your system down?

Some of the smartest people I know have typed commands in wrong terminals :) Mistakes happen. But the “system” should be built to withstand small mistakes. You don’t have to give sudo access to everyone. It’s counter intuitive, but the system will become more secure over time as it forces a re-think of permissions.

This recommendation is to help with a better #opslife. Depending on the size of your organization, you need to have proper security guidelines about this. I don’t mean to imply that every employee should be able to SSH into all the systems. However if you are on the same team, working on the same service, you should have access.

(Bash) Aliases

How many times you login to a system to check something and don’t remember the full command? And the grep/awk/sed combo? You search your email/wiki/hipchat/evernote or worse ping somebody.

Aliases to the rescue!

What’s the log directory? cdl => cd to the log directory (wherever it is)

Where are HAProxy logs, I wanna tail them? hap-tail-access

How do I take this machine out of service? oos. And bis to bring it back.

Compliment with a common prefix such as orgname_. This way you can ssh into a box, type orgname_ and tab your way to see what’s available.

Of course there has to be some sanity to this. That’s where your configuration management system comes into play, use it to keep track of all of these and update them when a path changes.

This approach worked great for our teams as both devs and ops people were able to login and troubleshoot problems quickly without looking up the wiki!

Logging

Instead of Request Completed strive for

2017-05-06T05:32:28.9Z INFO: Request (fds12a) completed successfully for user (george)

You can use json or any other key=value formatting for faster parsing/indexing.

{
	"t": "2017-05-06T05:32:28.9Z",
	"l": "INFO",
	"request_id": "fds12a",
	"status": "success",
	"user": "george"
}

Shameless plug - hear me talk (<5 min) about logging at DevOpsDays 2013 NYC Fall, slides

Logs are a great way to achieve common ground between the authors and the operators of a system.

ChatOps & Bots

Github made ChatOps easy for everyone by open-sourcing Hubot.

In addition to benefiting our dev and ops teams, our hubot instance helped our customer-support and product teams. Hubot took the concept of ‘command aliases’ to the chat room. This expanded our collaboration with teams who did not have shell or api access to our systems but needed adhoc data (white listing in place to restrict who can run certain commands)

Everything should have a (REST) API

Never buy/adopt a tool which doesn’t have a good API.

Why have an API? Systems can be chained together with APIs thus breaking down information silos. As an example, what chef run-list and environment should be applied to a newly racked node. One could look up the documentation or I could make a CURL call during the provisioning phase of the system. We maintain a mapping of chef run-lists and environments in a git repo. A Sinatra app makes it available to anyone.

A caveat

None of this works if “the-right-culture” and “empathy” are absent. Hopefully some of these ideas can serve as ice-breakers and get everybody moving in the right direction

Use a tool which works for your organization. Maybe DevOps 4.0 Professional Edition SP2 is what you need or maybe bash and having lunch together will do.