I Wanted To Be 2Pac When I Grew Up

The first 2Pac song I recall hearing was his rage-fueled, infamous diss track “Hit Em’ Up”, when I was roughly eight years old — and from the very first listen my mind was fucking blown. He had made…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Deploy a Pipeline From Git

Welcome to the second article in this four part series. In the first article I discussed some of the concepts related to continuous integration and testing. In this article we’ll get into some hands-on examples for extracting pipelines from CDF/CDAP and use GitHub as a repository for storing pipelines and related artifacts.

I will cover the following topics in this article:

So, if you already have the exported pipeline JSON, what else might you need?

The exported pipeline JSON itself is enough to recreate the visual representation of the pipeline on another instance of CDF/CDAP, but you will also need all the configuration information from the source system which is not contained in the pipeline itself. Therefore, it’s prudent to create a checklist of all the information you will need when promoting a pipeline from one environment to another.

Here’s all the information you need to take into consideration:

I’ll discuss how to invoke a pipeline test on a target environment with the requisite system preferences and macro setting in the last article in the series. For now, make sure you stash your pipelines, plugins, and datasets into the git repository. Preferences and macros are key-value pairs and can be represented nicely in JSON format. This is the format I will use in later articles for storing and porting that information.

If you are new to GitHub and unfamiliar with git commands I highly recommend you read up on the topic from the multitude of sources available on the internet. This will not be a tutorial on Git, but you should be able to follow along with the git workflow. The scenario I will use in this project is a two person team that works in tandem to both develop and review the work of the other party.

You can start off by creating a new fork of the GitHub repo I set up for this project. Once you have forked the project you can clone it to your local machine.

This will create a local folder named cdap-pipelines containing the contents of the git repo.

I developed a rudimentary pipeline with three stages that looks like this:

To export this pipeline from the UI click on the Actions icon on the top right of the screen and select Export. This will bring up a new window that will let you inspect the pipeline JSON. Click Export to save the file. The pipeline JSON can now be added to your GitHub repo.

Bulk Export Pipelines

Exporting a one pipeline at a time can become very cumbersome if you have a large number of pipelines that you need to export. Unfortunately there is no way to bulk export all your pipelines from the UI. So, how can we get around this problem? This is where the ReST API comes to the rescue.

If you are familiar with curl or use tools like Postman or Insomnia, you can quickly get a handle on how the ReST API works. Go get a listing of all the pipelines in CDF/CDAP you invoke the following HTTP request using the GET option:

This gives you a listing of all the deployed pipelines in the default namespace. In this example I used Insomnia. The curl version of this request is:

You will notice that this API request simply returns a listing of all the pipelines, but our goal was to export all the pipelines all at once! So, how can we accomplish that?

To iterate over a list of pipelines and extract the one we want we would simply replace the last portion of the URL that has the pipeline name with the desired pipeline name.

Here’s the export script in action:

Once all your pipelines are exported you can copy the desired files to the pipelines folder in the git project. The python script I mentioned here is by no means comprehensive, and you may find that there are lots of other things that you may want to extract as well, but if you are so inclined it’s a good way to get started for learning how to script with the ReST API.

By the way, this is also how you can extract all the system preferences and some additional settings from the source environment so that you can add those settings to a file to support CI/CD efforts down the road. For example, we’ll want to extract all the system preferences — this is how to get them:

OK, now that you know how to extract pipelines both individually and in bulk, the next step is to check these pipelines into your git repository. Assuming you forked my GitHub repo to your own repo, you would clone your forked repo and work off of that. For this example I created a new branch for each pipeline I want to push to my repo.

Start off by configuring some global settings for your repo. This will help you avoid any pesky error messages when you attempt to push your code.

Create a branch for your work. In this example I used titanic-01 as the development branch for the first pipeline I checked in to git.

Once you’ve copied your pipeline JSON to the pipelines directory add all the files in the project folder to source control.

You can check the state of the git repo at any time by running:

Now you are ready to commit your changes. Make sure to add a comment so that the commit is documented.

Almost done. All that is left to be done is to push all the changes to the GitHub repository.

Granted, all of the steps of exporting the pipelines and checking them into git can be scripted as well, but that would be left as an exercise for the reader since no two teams operate the same way, let alone different enterprises. You can get as elaborate as you want for this process, but to minimize bugs it’s always best to keep things simple.

When working in your own feature branch, you have the freedom to make all the changes you want locally and check in whatever you would like to be merged with the upstream project. In order for the upstream project to reflect the changes we pushed to our git repository we need to create a pull request. You do this in the GitHub page.

Make sure to create the pull request against a branch that the maintainer is expecting to merge PRs in. In this case I’m creating a PR against the upstream development branch, and will leave it up to the maintainer to merge the development branch into testing, QA, or master (production) branch. These will come into play later when we configure CI/CD.

The maintainer will review, and can comment, accept, or reject the PR.

Deploying a pipeline from a cloned GitHub repo is a simple as performing an import in CDF/CDAP. Once again, the UI lets you do imports one pipeline at a time, but it also takes care a lot of the validation in the UI.

Pipeline validation includes things like:

To import from the UI locate the big green plus button. When you click it you will be presented with the following window:

Click the Import button on the Pipeline card and select the pipeline JSON from your file system. The pipeline with then load into edit mode in the studio where you can continue updating it or deploy it.

Alternatively, here’s an example of how to deploy a pipeline using the API with curl:

Make sure to substitute namespace-id and pipeline-name with your own values. In my case values are default, because I’m deploying it to the default namespace, and Titanic-01, which is the name I’ve given to this pipeline. The final component is the path to your pipeline JSON file. Don’t forget the @ symbol before the path.

A word of caution. Deploying a pipeline via API does not provide any validation like you have in the UI. Therefore, any validation that needs to take place needs to be encapsulated in your deployment code. Similarly to what I did with the pipeline export script in Python, you would have to perform each validation step if you want to ensure that the pipeline will have all the requisite artifacts on the target system.

There you have it. Your pipelines are now in source control on GitHub. Of course, you don’t have to use GitHub for your VCS, but it is one of the most popular options in the open source community and has broad adoption among a majority of open source projects, including CDAP.

In the next article we’ll discuss the process for migrating artifacts from GitHub into a TEST, QA, or PROD environments. We’ll dig a little deeper into automation options and discover how we can leverage the API more broadly.

Until next time, stay safe and healthy, and wash your hands!

Add a comment

Related posts:

My Patronus Has a Humping Problem

A notoriously difficult spell to master, the Patronus charm will produce a silvery, ghost-like “guardian” animal. The specific animal is unique to each wizard, and is said to represent their truest…

ARF Administrator Continuing Education

Per the California Registry “Residential Care facilities operate under the supervision of Community Care Licensing, a sub-agency of the California Department of Social Services. In California in the…

Analysis of contracts for tax risks

The entrepreneurial activity of any company is connected with the close control of the tax authorities. Therefore, when making transactions, it is necessary to analyze the documents for the presence…