Ship Inside of a Bottle, Repeat

…in other words, DevOps.

Github sends a notifyCommit message to Jenkins.

That message passes through the Jenkins git plugin, which triggers a job if (1) there is a job configured with a git url matching the notifyCommit git url, and (2) there is a change to the code.

If the job does not already exist the git plugin does nothing, instead logs the fact and drops the notifyCommit on the floor.

So we crafted a second plugin to capture and act when the job was NOT configured on the Jenkins server.

Traditionally a Release Engineer would be responsible for supporting Jenkins and the repository server, and for crafting builds that collect and assemble code into useful apps. And that person would configure the jobs, individually. They would be complicated, and they would be done by hand. Jenkins would need to be backed up. It would need to be substantial, stateful, supported.

In Nebula CICD, our Jenkins is stateless. It can come and go, be replaced. There can be multiple Jenkins. Nothing is done by hand on these Jenkins servers and no developer needs to log in and configure it. Variable build code lives with the application. Static job configurations use Jenkins pipeline builds and configure them on the fly using our CICD Discover Jenkins Plugin.

That takes a template and crafts the config.xml for the job and then triggers the pipeline’s first build. It will checkout the Jenkinsfile matching the commit that triggered the notifyCommit message. That Jenkinsfile then does a second checkout, of the code to actually work with and build, according to the checkout stage in the Jenkinsfile.

So far we start at GitHub, go to Jenkins then to Jenkins plugins.

The Jenkinsfile does a bunch of manipulations, gathering info it needs, setting up roles to be executed using an installer ansible play. It retrieves secrets from Hashicorp vault, which uses consul at the backend for storage. It addresses internal naming using consul service discovery. It then triggers Hashicorp packer.

Packer spins up an Amazon instance, then runs shell commands to begin to set up the basic environment, installing ansible and some other required basics. It then spins up an instance and runs ansible plays to provision it, packaging up the finished instance as an Amazon Machine Image or AMI.

Troubleshooting this is a bitch.

Did I mention Jenkins spins off agents in AWS, adding builders and then tearing them down as needed?

So to troubleshoot, you look at Github, then Jenkins, then plugin behavior, then Jenkinsfile, what code lives with the app, is it current, is there a missing change, what was checked out? Where’s the agent’s IP? Did the agent get the correct code? Was that failure on the packer generated instance or the slave? Where does that specific ansible play run? Did the variable get set by Jenkins, pass through to packer, get correctly dropped into ansible and then executed in the play correctly?

Lots of moving pieces…

— doug