With the rise of DevOps, low-cost cloud computing, and container technologies, the way Java developers approach development today has changed dramatically. This practical guide helps you take advantage of microservices, serverless, and cloud native technologies using the latest DevOps techniques to simplify your build process and create hyperproductive teams.
Stephen Chin, Melissa McKay, Ixchel Ruiz, and Baruch Sadogursky help you evaluate an array of options. The list includes source control with Git, build declaration with Maven and Gradle, CI/CD with CircleCI, package management with Artifactory, containerization with Docker and Kubernetes, and much more. Whether you’re building applications with Jakarta EE, Spring Boot, Dropwizard, MicroProfile, Micronaut, or Quarkus, this comprehensive guide has you covered.
Explore software lifecycle best practices
Use DevSecOps methodologies to facilitate software development and delivery
Understand the business value of DevSecOps best practices
Manage and secure software dependencies
Develop and deploy applications using containers and cloud native technologies
Manage and administrate source control repositories and development processes
Use automation to set up and administer build pipelines
Identify common deployment patterns and antipatterns
Maintain and monitor software after deployment
About the Author
Stephen Chin is Head of Developer Relations at JFrog and author of The Definitive Guide to Modern Client Development, Raspberry Pi with Java, and Pro JavaFX Platform. He has keynoted numerous Java conferences around the world including Devoxx, JNation, JavaOne, Joker, and Open Source India. Stephen is an avid motorcyclist who has done evangelism tours in Europe, Japan, and Brazil, interviewing hackers in their natural habitat. When he is not traveling, he enjoys teaching kids how to do embedded and robot programming together with his teenage daughter. You can follow his hacking adventures at: http://steveonjava.com/.
Melissa McKay is currently a Developer Advocate with the JFrog Developer Relations team. She has been active in the software industry 20 years and her background and experience spans a slew of technologies and tools used in the development and operation of enterprise products and services. Melissa is a mom, software developer, Java geek, huge promoter of Java UNconferences, and is always on the lookout for ways to grow, learn, and improve development processes. She is active in the developer community, has spoken at CodeOne, Java Dev Day Mexico and assists with organizing the JCrete and JAlba Unconferences as well as Devoxx4Kids events.
Ixchel Ruiz has developed software applications and tools since 2000. Her research interests include Java, dynamic languages, client-side technologies, and testing. She is a Java Champion, Groundbreaker Ambassador, Hackergarten enthusiast, open source advocate, JUG leader, public speaker, and mentor.
Baruch Sadogursky (a.k.a JBaruch) is the Chief Sticker Officer @JFrog (also, Head of DevOps Advocacy) at JFrog. His passion is speaking about technology. Well, speaking in general, but doing it about technology makes him look smart, and 19 years of hi-tech experience sure helps. When he’s not on stage (or on a plane to get there), he learns about technology, people and how they work, or more precisely, don’t work together.
He is a co-author of the Liquid Software book, a CNCF ambassador and a passionate conference speaker on DevOps, DevSecOps, digital transformation, containers and cloud-native, artifact management and other topics, and is a regular at the industry’s most prestigious events including DockerCon, Devoxx, DevOps Days, OSCON, Qcon, JavaOne and many others. You can see some of his talks at jfrog.com/shownotes
See: Continuous Delivery for Java Apps: Build a CD Pipeline Step by Step Using Kubernetes, Docker, Vagrant, Jenkins, Spring, Maven and Artifactory, Publisher : Leanpub (December 14, 2017)
This book will guide you through the implementation of the real-world Continuous Delivery using top-notch technologies. Instead of finishing this book thinking “I know what Continuous Delivery is, but I have no idea how to implement it”, you will end up with your machine set up with a Kubernetes cluster running Jenkins Pipelines in a distributed and scalable fashion (each Pipeline run on a new Jenkins slave dynamically allocated as a Kubernetes pod) to test (unit, integration, acceptance, performance and smoke tests), build (with Maven), release (to Artifactory), distribute (to Docker Hub) and deploy (on Kubernetes) a Spring Boot app to testing, staging and production environments implementing the Canary Release deployment pattern.
TABLE OF CONTENTS:
INTRODUCTION Agile Scrum Scrum and Continuous Integration Deployed vs Released Scrum and Continuous Delivery XP and Continuous Delivery Automated Tests Continuous Integration Feature Branch Continuous Delivery Continuous Delivery Pipeline Continuous Delivery vs Continuous Deployment Canary Release A/B Tests Feature Flags
NOTEPAD APP: AUTOMATED TESTS, MAVEN AND FLYWAY Pre-Requisites The Notepad Application Automated Tests Unit Tests Integration Tests Acceptance Tests Page Object Distributed Acceptance Tests with Selenium-Grid Smoke Tests Performance Tests with Gatling.io Apache Maven Maven Snapshot vs Release The Default Lifecycle and its Phases Maven Repositories Repository Manager (Artifactory) Maven Plugins: Surefire and Failsafe Maven Profile Running Unit Tests Running Integration Tests Running Acceptance Tests Running Smoke Tests Running Performance Tests Publish Artifacts to Artifactory with Maven Publish a Snapshot to Artifactory Publish a Release to Artifactory The release:prepare Goal The release:perform Goal Flyway
DOCKER Introduction to Docker Difference Between Container and Image Docker Hub Create your Account Official Docker Repositories Image Tags Non-Official Docker Images Create a Repository, an Image and Push it to Docker Hub Running Containers on Docker Running Containers as Daemons Container Clean Up Naming Containers Exposing Ports Persistent Data with Volumes Environment Variables Docker Networking Create a Bridge Network Container Static IP Address Linking Containers Most Used Docker Commands Images Containers Misc Building Docker Images: Dockerfile
JENKINS: PIPELINE AS CODE AND CHATOPS Jenkins Overview Jenkins Concepts Job (or Project) Build Artifact Workspace Executor Plugin Node, Master, and Agent (or Slave) ChatOps Create a Slack Workspace Integrate Slack with Jenkins Slack Notification Plugin Use Hubot to Interact with Jenkins Jenkins Pipeline Declarative Pipeline vs Scripted Pipeline Scripted Pipeline Using Docker with Jenkins Pipelines Running Docker from Within the Jenkins Container Scaling Jenkins with Slaves
KUBERNETES Why Kubernetes? Set up a Kubernetes Cluster using Vagrant Hands-on Introduction to Kubernetes Kubernetes Concepts Namespaces Pods Labels Replica Sets Services Service Discovery using DNS Service Discovery using Namespaces Volumes Handling External Configurations Config Maps Changing Logback Log Level at Runtime Secrets Using Secrets as Environment Variables Using Secrets as Files from a Pod Deployments Readiness Probes Liveness Probes Canary Release Kubernetes Architecture Kubernetes Master Components Etcd API Server Controller Manager Scheduler Kubernetes Node Components Service Proxy Kubelet cAdvisor Kubernetes Add-ons Web UI (Dashboard) Monitoring Kubernetes with Heapster, InfluxDB and Grafana Web UI Overview DNS
Application definitions, configurations, and environments should be declarative and version controlled. Application deployment and lifecycle management should be automated, auditable, and easy to understand.
Follow our getting started guide. Further user oriented documentation is provided for additional features. If you are looking to upgrade ArgoCD, see the upgrade guide. Developer oriented documentation is available for people interested in building third-party integrations.
Argo CD follows the GitOps pattern of using Git repositories as the source of truth for defining the desired application state. Kubernetes manifests can be specified in several ways:
Any custom config management tool configured as a config management plugin
Argo CD automates the deployment of the desired application states in the specified target environments. Application deployments can track updates to branches, tags, or pinned to a specific version of manifests at a Git commit. See tracking strategies for additional details about the different tracking strategies available.
For a quick 10 minute overview of Argo CD, check out the demo presented to the Sig Apps community meeting:
Argo CD is implemented as a kubernetes controller which continuously monitors running applications and compares the current, live state against the desired target state (as specified in the Git repo). A deployed application whose live state deviates from the target state is considered OutOfSync. Argo CD reports & visualizes the differences, while providing facilities to automatically or manually sync the live state back to the desired target state. Any modifications made to the desired target state in the Git repo can be automatically applied and reflected in the specified target environments.
GitOps and Kubernetes – Continuous Deployment with Argo CD, Jenkins X, and Flux, by Billy Yuen, Alexander Matyushentsev, Todd Ekenstam, and Jesse Suen; Manning Publications, 2021, 1617297275 (GtOpK8S)
Manning publishes the best quality IT books in the industry.
Manning is an independent publisher, providing computer books for software developers, engineers, architects, system administrators, and managers. Our books also cover topics for young programmers, students, and occasionally children.
summary
Manning is an independent publisher of computer books and video courses for software developers, engineers, architects, system administrators, managers and all who are professionally involved with the computer business. We also publish for students and young programmers, including occasionally for children. We are an entirely virtual organization based on Shelter Island, New York, with many staff working from far-flung places like Manila and Zagreb.
company character
“Independent” means we are not owned by a large corporate entity and are free to make decisions without bureaucratic overhead. That has allowed us to innovate and be flexible and to quickly adjust what we do as we go. We were the first by several years to sell our books as unprotected PDFs, something that later became commonplace. We were the first to start selling books before they were finished, in the Manning Early Access Program. This gave our readers access to our content as soon as it was readable, and this too has become common in the industry. And it means we are thinking every day about new ways to satisfy our customers, some of which we hope you will be pleased to discover in the not-too-distant future.
how we improve
We published our first book in 1993 and have been learning from our successes, and even more from our mistakes, ever since. Every new book teaches us something that helps us improve:
How to choose the topics we publish on
How to find the right authors for each book
How to help authors write the best books they can
How to ensure the content is valuable and easy to learn
How to let readers know about our content
book series
We publish standalone titles as well as the following book series:
Hello!
In Action
In Practice
In Depth
In a Month of Lunches
availability
Readers can access our books through the Manning Early Access Program, O’Reilly Learning (formerly Safari Books Online), and iBooks. Print copies, wherever they are bought, come with free electronic versions in PDF, ePub and Kindle formats. With your print copy in hand, register it on the Manning site and you can download the digital versions from your account.
At this time, our eBooks are available only from Manning.com and Apple’s iBookstore.
Kubernetes (/ˌk(j)uːbərˈnɛtɪs, -ˈneɪtɪs, -ˈneɪtiːz/, commonly stylized as K8s[4]) is an open-sourcecontainer–orchestration system for automating computer application deployment, scaling, and management.[5] It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation. It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”.[6] It works with a range of container tools and runs containers in a cluster, often with images built using Docker. Kubernetes originally interfaced with the Docker runtime[7] through a “Dockershim”; however, the shim has since been deprecated in favor of directly interfacing with containerd or another CRI-compliant runtime.[8]
Many cloud services offer a Kubernetes-based platform or infrastructure as a service (PaaS or IaaS) on which Kubernetes can be deployed as a platform-providing service. Many vendors also provide their own branded Kubernetes distributions.” (WP)
History
Google Kubernetes Engine talk at Google Cloud Summit
Kubernetes (κυβερνήτης, Greek for “helmsman” or “pilot” or “governor”, and the etymological root of cybernetics)[6] was founded by Joe Beda, Brendan Burns, and Craig McLuckie,[9] who were quickly joined by other Google engineers including Brian Grant and Tim Hockin, and was first announced by Google in mid-2014.[10] Its development and design are heavily influenced by Google’s Borg system,[11][12] and many of the top contributors to the project previously worked on Borg. The original codename for Kubernetes within Google was Project 7, a reference to the Star Trek ex-Borg character Seven of Nine.[13] The seven spokes on the wheel of the Kubernetes logo are a reference to that codename. The original Borg project was written entirely in C++,[11] but the rewritten Kubernetes system is implemented in Go.
Kubernetes v1.0 was released on July 21, 2015.[14] Along with the Kubernetes v1.0 release, Google partnered with the Linux Foundation to form the Cloud Native Computing Foundation (CNCF)[15] and offered Kubernetes as a seed technology. In February 2016[16] Helm[17][18] package manager for Kubernetes was released. On March 6, 2018, Kubernetes Project reached ninth place in commits at GitHub, and second place in authors and issues, after the Linux kernel.[19]
Up to v1.18, Kubernetes followed an N-2 support policy[20] (meaning that the 3 most recent minor versions receive security and bug fixes)
From v1.19 onwards, Kubernetes will follow an N-3 support policy.[21]
Legend:Old versionOlder version, still maintainedLatest versionLatest preview versionFuture release
Support windows
The chart below visualises the period for which each release is/was supported[22]
Concepts
Kubernetes architecture diagram
Kubernetes defines a set of building blocks (“primitives”), which collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory[24] or custom metrics.[25] Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by internal components as well as extensions and containers that run on Kubernetes.[26] The platform exerts its control over compute and storage resources by defining resources as Objects, which can then be managed as such.
Kubernetes follows the primary/replica architecture. The components of Kubernetes can be divided into those that manage an individual node and those that are part of the control plane.[26][27]
Control plane
The Kubernetes master is the main controlling unit of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components, each its own process, that can run both on a single master node or on multiple masters supporting high-availability clusters.[27] The various components of the Kubernetes control plane are as follows:
etcd: etcd[28] is a persistent, lightweight, distributed, key-value data store developed by CoreOS that reliably stores the configuration data of the cluster, representing the overall state of the cluster at any given point of time. Just like Apache ZooKeeper, etcd is a system that favors consistency over availability in the event of a network partition (see CAP theorem). This consistency is crucial for correctly scheduling and operating services. The Kubernetes API Server uses etcd’s watch API to monitor the cluster and roll out critical configuration changes or simply restore any divergences of the state of the cluster back to what was declared by the deployer. As an example, if the deployer specified that three instances of a particular pod need to be running, this fact is stored in etcd. If it is found that only two instances are running, this delta will be detected by comparison with etcd data, and Kubernetes will use this to schedule the creation of an additional instance of that pod.[27]
API server: The API server is a key component and serves the Kubernetes API using JSON over HTTP, which provides both the internal and external interface to Kubernetes.[26][29] The API server processes and validates REST requests and updates state of the API objects in etcd, thereby allowing clients to configure workloads and containers across Worker nodes.[30]
Scheduler: The scheduler is the pluggable component that selects which node an unscheduled pod (the basic entity managed by the scheduler) runs on, based on resource availability. The scheduler tracks resource use on each node to ensure that workload is not scheduled in excess of available resources. For this purpose, the scheduler must know the resource requirements, resource availability, and other user-provided constraints and policy directives such as quality-of-service, affinity/anti-affinity requirements, data locality, and so on. In essence, the scheduler’s role is to match resource “supply” to workload “demand”.[31]
Controller manager: A controller is a reconciliation loop that drives actual cluster state toward the desired cluster state, communicating with the API server to create, update, and delete the resources it manages (pods, service endpoints, etc.).[32][29] The controller manager is a process that manages a set of core Kubernetes controllers. One kind of controller is a Replication Controller, which handles replication and scaling by running a specified number of copies of a pod across the cluster. It also handles creating replacement pods if the underlying node fails.[32] Other controllers that are part of the core Kubernetes system include a DaemonSet Controller for running exactly one pod on every machine (or some subset of machines), and a Job Controller for running pods that run to completion, e.g. as part of a batch job.[33] The set of pods that a controller manages is determined by label selectors that are part of the controller’s definition.[34]
Nodes
A Node, also known as a Worker or a Minion, is a machine where containers (workloads) are deployed. Every node in the cluster must run a container runtime such as Docker, as well as the below-mentioned components, for communication with the primary for network configuration of these containers.
Kubelet: Kubelet is responsible for the running state of each node, ensuring that all containers on the node are healthy. It takes care of starting, stopping, and maintaining application containers organized into pods as directed by the control plane.[26][35]
Kubelet monitors the state of a pod, and if not in the desired state, the pod re-deploys to the same node. Node status is relayed every few seconds via heartbeat messages to the primary. Once the primary detects a node failure, the Replication Controller observes this state change and launches pods on other healthy nodes.[citation needed]
Kube-proxy: The Kube-proxy is an implementation of a network proxy and a load balancer, and it supports the service abstraction along with other networking operation.[26] It is responsible for routing traffic to the appropriate container based on IP and port number of the incoming request.
Container runtime: A container resides inside a pod. The container is the lowest level of a micro-service, which holds the running application, libraries, and their dependencies. Containers can be exposed to the world through an external IP address. Kubernetes has supported Docker containers since its first version, and in July 2016 the rkt container engine was added.[36]
Pods
The basic scheduling unit in Kubernetes is a pod.[37] A pod is a grouping of containerized components. A pod consists of one or more containers that are guaranteed to be co-located on the same node.[26]
Each pod in Kubernetes is assigned a unique IP address within the cluster, which allows applications to use ports without the risk of conflict.[38] Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it has to use the Pod IP Address. An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral – the specific pod that they are referencing may be assigned to another Pod IP address on restart. Instead, they should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.
A pod can define a volume, such as a local disk directory or a network disk, and expose it to the containers in the pod.[39] Pods can be managed manually through the Kubernetes API, or their management can be delegated to a controller.[26] Such volumes are also the basis for the Kubernetes features of ConfigMaps (to provide access to configuration through the filesystem visible to the container) and Secrets (to provide access to credentials needed to access remote resources securely, by providing those credentials on the filesystem visible only to authorized containers).
ReplicaSets
A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.[40]
The ReplicaSets[41] can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a Replica Set uses a selector, whose evaluation will result in identifying all pods that are associated with it.
Services
Simplified view showing how Services interact with Pod networking in a Kubernetes cluster
A Kubernetes service is a set of pods that work together, such as one tier of a multi-tier application. The set of pods that constitute a service are defined by a label selector.[26] Kubernetes provides two modes of service discovery, using environmental variables or using Kubernetes DNS.[42] Service discovery assigns a stable IP address and DNS name to the service, and load balances traffic in a round-robin manner to network connections of that IP address among the pods matching the selector (even as failures cause the pods to move from machine to machine).[38] By default a service is exposed inside a cluster (e.g., back end pods might be grouped into a service, with requests from the front-end pods load-balanced among them), but a service can also be exposed outside a cluster (e.g., for clients to reach front-end pods).[43]
Volumes
Filesystems in the Kubernetes container provide ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes Volume[44] provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the filesystem tree by different containers.
Namespaces
Kubernetes provides a partitioning of the resources it manages into non-overlapping sets called namespaces.[45] They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.
ConfigMaps and Secrets
A common application challenge is deciding where to store and manage configuration information, some of which may contain sensitive data. Configuration data can be anything as fine-grained as individual properties or coarse-grained information like entire configuration files or JSON / XML documents. Kubernetes provides two closely related mechanisms to deal with this need: “configmaps” and “secrets”, both of which allow for configuration changes to be made without requiring an application build. The data from configmaps and secrets will be made available to every single instance of the application to which these objects have been bound via the deployment. A secret and / or a configmap is only sent to a node if a pod on that node requires it. Kubernetes will keep it in memory on that node. Once the pod that depends on the secret or configmap is deleted, the in-memory copy of all bound secrets and configmaps are deleted as well. The data is accessible to the pod through one of two ways: a) as environment variables (which will be created by Kubernetes when the pod is started) or b) available on the container filesystem that is visible only from within the pod.
The data itself is stored on the master which is a highly secured machine which nobody should have login access to. The biggest difference between a secret and a configmap is that the content of the data in a secret is base64 encoded. Recent versions of Kubernetes have introduced support for encryption to be used as well. Secrets are often used to store data like certificates, passwords, pull secrets (credentials to work with image registries), and ssh keys.
StatefulSets
It is very easy to address the scaling of stateless applications: one simply adds more running pods—which is something that Kubernetes does very well. Stateful workloads are much harder, because the state needs to be preserved if a pod is restarted, and if the application is scaled up or down, then the state may need to be redistributed. Databases are an example of stateful workloads. When run in high-availability mode, many databases come with the notion of a primary instance and secondary instance(s). In this case, the notion of ordering of instances is important. Other applications like Kafka distribute the data amongst their brokers—so one broker is not the same as another. In this case, the notion of instance uniqueness is important. StatefulSets[46] are controllers (see Controller Manager, below) that are provided by Kubernetes that enforce the properties of uniqueness and ordering amongst instances of a pod and can be used to run stateful applications.
DaemonSets
Normally, the locations where pods are run are determined by the algorithm implemented in the Kubernetes Scheduler. For some use cases, though, there could be a need to run a pod on every single node in the cluster. This is useful for use cases like log collection, ingress controllers, and storage services. The ability to do this kind of pod scheduling is implemented by the feature called DaemonSets.[47]
Labels and selectors
Kubernetes enables clients (users or internal components) to attach keys called “labels” to any API object in the system, such as pods and nodes. Correspondingly, “label selectors” are queries against labels that resolve to matching objects.[26] When a service is defined, one can define the label selectors that will be used by the service router / load balancer to select the pod instances that the traffic will be routed to. Thus, simply changing the labels of the pods or changing the label selectors on the service can be used to control which pods get traffic and which don’t, which can be used to support various deployment patterns like blue-green deployments or A-B testing. This capability to dynamically control how services utilize implementing resources provides a loose coupling within the infrastructure.
For example, if an application’s pods have labels for a system tier (with values such as front-end, back-end, for example) and a release_track (with values such as canary, production, for example), then an operation on all of back-end and canary nodes can use a label selector, such as:[34]
tier=back-end AND release_track=canary
Just like labels, field selectors also let one select Kubernetes resources. Unlike labels, the selection is based on the attribute values inherent to the resource being selected, rather than user-defined categorization. metadata.name and metadata.namespace are field selectors that will be present on all Kubernetes objects. Other selectors that can be used depend on the object/resource type.
Replication Controllers and Deployments
A ReplicaSet declares the number of instances of a pod that is needed, and a Replication Controller manages the system so that the number of healthy pods that are running matches the number of pods declared in the ReplicaSet (determined by evaluating its selector).
Deployments are a higher level management mechanism for ReplicaSets. While the Replication Controller manages the scale of the ReplicaSet, Deployments will manage what happens to the ReplicaSet – whether an update has to be rolled out, or rolled back, etc. When deployments are scaled up or down, this results in the declaration of the ReplicaSet changing – and this change in declared state is managed by the Replication Controller.
Add-ons
Add-ons operate just like any other application running within the cluster: they are implemented via pods and services, and are only different in that they implement features of the Kubernetes cluster. The pods may be managed by Deployments, ReplicationControllers, and so on. There are many add-ons, and the list is growing. Some of the more important are:
DNS: All Kubernetes clusters should have cluster DNS; it is a mandatory feature. Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
Web UI: This is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.
Container Resource Monitoring: Providing a reliable application runtime, and being able to scale it up or down in response to workloads, means being able to continuously and effectively monitor workload performance. Container Resource Monitoring provides this capability by recording metrics about containers in a central database, and provides a UI for browsing that data. The cAdvisor is a component on a slave node that provides a limited metric monitoring capability. There are full metrics pipelines as well, such as Prometheus, which can meet most monitoring needs.
Cluster-level logging: Logs should have a separate storage and lifecycle independent of nodes, pods, or containers. Otherwise, node or pod failures can cause loss of event data. The ability to do this is called cluster-level logging, and such mechanisms are responsible for saving container logs to a central log store with search/browsing interface. Kubernetes provides no native storage for log data, but one can integrate many existing logging solutions into the Kubernetes cluster.
Storage
Containers emerged as a way to make software portable. The container contains all the packages you need to run a service. The provided filesystem makes containers extremely portable and easy to use in development. A container can be moved from development to test or production with no or relatively few configuration changes.
Historically Kubernetes was suitable only for stateless services. However, many applications have a database, which requires persistence, which leads to the creation of persistent storage for Kubernetes. Implementing persistent storage for containers is one of the top challenges of Kubernetes administrators, DevOps and cloud engineers. Containers may be ephemeral, but more and more of their data is not, so one needs to ensure the data’s survival in case of container termination or hardware failure. When deploying containers with Kubernetes or containerized applications, companies often realize that they need persistent storage. They need to provide fast and reliable storage for databases, root images and other data used by the containers.
In addition to the landscape, the Cloud Native Computing Foundation (CNCF), has published other information about Kubernetes Persistent Storage including a blog helping to define the container attached storage pattern. This pattern can be thought of as one that uses Kubernetes itself as a component of the storage system or service.[48]
More information about the relative popularity of these and other approaches can be found on the CNCF’s landscape survey as well, which showed that OpenEBS from MayaData and Rook – a storage orchestration project – were the two projects most likely to be in evaluation as of the Fall of 2019.[49]
Container Attached Storage is a type of data storage that emerged as Kubernetes gained prominence. The Container Attached Storage approach or pattern relies on Kubernetes itself for certain capabilities while delivering primarily block, file, object and interfaces to workloads running on Kubernetes.[50]
Common attributes of Container Attached Storage include the use of extensions to Kubernetes, such as custom resource definitions, and the use of Kubernetes itself for functions that otherwise would be separately developed and deployed for storage or data management. Examples of functionality delivered by custom resource definitions or by Kubernetes itself include retry logic, delivered by Kubernetes itself, and the creation and maintenance of an inventory of available storage media and volumes, typically delivered via a custom resource definition.[51][52]
API
The design principles underlying Kubernetes allow one to programmatically create, configure, and manage Kubernetes clusters. This function is exposed via an API called the Cluster API. A key concept embodied in the API is the notion that the Kubernetes cluster is itself a resource / object that can be managed just like any other Kubernetes resources. Similarly, machines that make up the cluster are also treated as a Kubernetes resource. The API has two pieces – the core API, and a provider implementation. The provider implementation consists of cloud-provider specific functions that let Kubernetes provide the cluster API in a fashion that is well-integrated with the cloud-provider’s services and resources.
Uses
Kubernetes is commonly used as a way to host a microservice-based implementation, because it and its associated ecosystem of tools provide all the capabilities needed to address key concerns of any microservice architecture.
^ ab Abhishek Verma; Luis Pedrosa; Madhukar R. Korupolu; David Oppenheimer; Eric Tune; John Wilkes (April 21–24, 2015). “Large-scale cluster management at Google with Borg”. Proceedings of the European Conference on Computer Systems (EuroSys). Archived from the original on 2017-07-27.
^ Ellingwood, Justin (2 May 2018). “An Introduction to Kubernetes”. DigitalOcean. Archived from the original on 5 July 2018. Retrieved 20 July 2018. One of the most important primary services is an API server. This is the main management point of the entire cluster as it allows a user to configure Kubernetes’ workloads and organizational units. It is also responsible for making sure that the etcd store and the service details of deployed containers are in agreement. It acts as the bridge between various components to maintain cluster health and disseminate information and commands.
Udemy, Inc. is an American massive open online course (MOOC) provider aimed at professional adults and students. It was founded in May 2010 by Eren Bali, Gagan Biyani, and Oktay Caglar.
As of February 2021, the platform has more than 40 million students, 155,000 courses and 70,000 instructors teaching courses in over 65 languages. There have been over 480 million course enrollments. Students and instructors come from 180+ countries and 2/3 of the students are located outside of the U.S.[3]
Students take courses largely as a means of improving job-related skills.[4] Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.[5] As of 2021, there are more than 155,000 courses on the website.[6][3]
The headquarters of Udemy is located in San Francisco, California, with offices in Denver, Colorado; Dublin, Ireland; Ankara, Turkey; Sao Paulo, Brazil; and Gurugram, India.[7]
“DevOps is the buzzword these days in both software and business circles. Why? Because it has revolutionized the way modern businesses do business and, in the process, achieved milestones that weren’t possible before.” On this site, “you’ll learn what DevOps is, how it evolved, how your business can benefit from implementing it, and success stories of some of the world’s biggest and most popular companies that have embraced DevOps as part of their business.” (DMH)
“DevOps – or Development and Operations – is a term used in enterprise software development that refers to a kind of agile relationship between information technologies (IT) operations and development. The primary objective of DevOps is to optimize this relationship through fostering better collaboration and communication between development and IT operations. In particular, it seeks to integrate and activate important modifications into an enterprise’s production processes as well as to strictly monitor problems and issues as they occur so these can be addressed as soon as possible without having to disrupt other aspects of the enterprise’s operations. By doing so, DevOps can help enterprises register faster turnaround times, increase frequency of deployment of crucial new software or programs, achieve faster average recovery times, increase success rate for newly released programs, and minimize the lead time needed in between modifications or fixes to programs.” (DMH)
“DevOps is crucial for the success of any enterprise because, by nature, enterprises need to segregate business units as individually operating entities for a more efficient system of operations. However, part of such segregation is the tendency to tightly control and guard access to information, processes and management. And this can be a challenge, particularly for the IT operations unit that needs access to key information from all business units in order to provide the best IT service possible for the whole enterprise. Simply put, part of the challenge in segregating business units into individually operating ones that are independent of each other is the relatively slow flow of information to and from such units because of bureaucracy.” (DMH)
“Moving towards an organizational culture based on DevOps – one where the enterprise’s operations units and IT developers are considered as “partners” instead of unrelated units – is an effective way to break down the barriers between them. This is because an enterprise whose culture is based on DevOps is one that can help IT personnel provide organization with the best possible software with the least risk for glitches, hitches, or problems. Therefore, a DevOps-based organizational culture is one that can foster an environment where segregated business units can remain independent but, at the same time, work very well with others in order to optimize the organization’s efficiency and productivity.” (DMH)
Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.[1] The IT infrastructure managed by this process comprises both physical equipment, such as bare-metal servers, as well as virtual machines, and associated configuration resources. The definitions may be in a version control system. It can use either scripts or declarative definitions, rather than manual processes, but the term is more often used to promote declarative approaches.
Overview
IaC grew as a response to the difficulty posed by utility computing and second-generation web frameworks. In 2006, the launch of Amazon Web Services’ Elastic Compute Cloud and the 1.0 version of Ruby on Rails just months before[2] created widespread scaling problems in the enterprise that were previously experienced only at large, multi-national companies.[3] With new tools emerging to handle this ever growing field, the idea of IaC was born. The thought of modelling infrastructure with code, and then having the ability to design, implement, and deploy applications infrastructure with known software best practices appealed to both software developers and IT infrastructure administrators. The ability to treat infrastructure like code and use the same tools as any other software project would allow developers to rapidly deploy applications.[4]
Added value and advantages
The value of IaC can be broken down into three measurable categories: cost, speed, and risk.[citation needed] Cost reduction aims at helping not only the enterprise financially, but also in terms of people and effort, meaning that by removing the manual component, people are able to refocus their efforts towards other enterprise tasks.[citation needed] Infrastructure automation enables speed through faster execution when configuring your infrastructure and aims at providing visibility to help other teams across the enterprise work quickly and more efficiently. Automation removes the risk associated with human error, like manual misconfiguration; removing this can decrease downtime and increase reliability. These outcomes and attributes help the enterprise move towards implementing a culture of DevOps, the combined working of development and operations.[5]
Types of approaches
There are generally two approaches to IaC: declarative (functional) vs. imperative (procedural). The difference between the declarative and the imperative approach is essentially ‘what’ versus ‘how’ . The declarative approach focuses on what the eventual target configuration should be; the imperative focuses on how the infrastructure is to be changed to meet this.[6] The declarative approach defines the desired state and the system executes what needs to happen to achieve that desired state. Imperative defines specific commands that need to be executed in the appropriate order to end with the desired conclusion. [7]
Methods
There are two methods of IaC: ‘push‘ and ‘pull‘ . The main difference is the manner in which the servers are told how to be configured. In the pull method the server to be configured will pull its configuration from the controlling server. In the push method the controlling server pushes the configuration to the destination system.[8]
Tools
There are many tools that fulfill infrastructure automation capabilities and use IaC. Broadly speaking, any framework or tool that performs changes or configures infrastructure declaratively or imperatively based on a programmatic approach can be considered IaC.[9] Traditionally, server (lifecycle) automation and configuration management tools were used to accomplish IaC. Now enterprises are also using continuous configuration automation tools or stand-alone IaC frameworks, such as Microsoft’s PowerShell DSC[10] or AWS CloudFormation.[11]
Continuous configuration automation
All continuous configuration automation (CCA) tools can be thought of as an extension of traditional IaC frameworks. They leverage IaC to change, configure, and automate infrastructure, and they also provide visibility, efficiency and flexibility in how infrastructure is managed.[3] These additional attributes provide enterprise-level security and compliance.
An important aspect when considering CCA tools, if they are open source, is the community content. As Gartner states, the value of CCA tools is “as dependent on user-community-contributed content and support as it is on the commercial maturity and performance of the automation tooling.”[3] Vendors like Puppet and Chef, those that have been around a significant amount of time, have created their own communities. Chef has Chef Community Repository and Puppet has PuppetForge.[12] Other vendors rely on adjacent communities and leverage other IaC frameworks such as PowerShell DSC.[10] New vendors are emerging that are not content driven, but model driven with the intelligence in the product to deliver content. These visual, object-oriented systems work well for developers, but they are especially useful to production oriented DevOps and operations constituents that value models versus scripting for content. As the field continues to develop and change, the community based content will become ever important to how IaC tools are used, unless they are model driven and object oriented.
IaC can be a key attribute of enabling best practices in DevOps – Developers become more involved in defining configuration and Ops teams get involved earlier in the development process.[13] Tools that utilize IaC bring visibility to the state and configuration of servers and ultimately provide the visibility to users within the enterprise, aiming to bring teams together to maximize their efforts.[14] Automation in general aims to take the confusion and error-prone aspect of manual processes and make it more efficient, and productive. Allowing for better software and applications to be created with flexibility, less downtime, and an overall cost effective way for the company. IaC is intended to reduce the complexity that kills efficiency out of manual configuration. Automation and collaboration are considered central points in DevOps; Infrastructure automation tools are often included as components of a DevOps toolchain.[15]
Relationship to security
The 2020 Cloud Threat Report released by Unit 42 (the threat intelligence unit of cybersecurity provider Palo Alto Networks) identified around 200,000 potential vulnerabilities in infrastructure as code templates.[16]
^ Garner Market Trends: DevOps – Not a Market, but Tool-Centric Philosophy That supports a Continuous Delivery Value Chain (Report). Gartner. 18 February 2015.
In software engineering, software configuration management (SCM or S/W CM) is the task of tracking and controlling changes in the software, part of the larger cross-disciplinary field of configuration management.[1] SCM practices include revision control and the establishment of baselines. If something goes wrong, SCM can determine what was changed and who changed it. If a configuration is working well, SCM can determine how to replicate it across many hosts.
The acronym “SCM” is also expanded as source configuration management process and software change and configuration management.[2] However, “configuration” is generally understood to cover changes typically made by a system administrator.
Configuration control – Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline.
Configuration status accounting – Recording and reporting all the necessary information on the status of the development process.
Configuration auditing – Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals.
Build management – Managing the process and tools used for builds.
Process management – Ensuring adherence to the organization’s development process.
Environment management – Managing the software and hardware that host the system.
Teamwork – Facilitate team interactions related to the process.
Defect tracking – Making sure every defect has traceability back to the source.
With the introduction of cloud computing the purposes of SCM tools have become merged in some cases. The SCM tools themselves have become virtual appliances that can be instantiated as virtual machines and saved with state and version. The tools can model and manage cloud-based virtual resources, including virtual appliances, storage units, and software bundles. The roles and responsibilities of the actors have become merged as well with developers now being able to dynamically instantiate virtual servers and related resources.[3]
History
The history of software configuration management (SCM) in computing can be traced back as early as the 1950s, when CM (for Configuration Management), originally for hardware development and production control, was being applied to software development. Early software had a physical footprint, such as cards, tapes, and other media. The first software configuration management was a manual operation. With the advances in language and complexity, software engineering, involving configuration management and other methods, became a major concern due to issues like schedule, budget, and quality. Practical lessons, over the years, had led to the definition, and establishment, of procedures and tools. Eventually, the tools became systems to manage software changes.[4] Industry-wide practices were offered as solutions, either in an open or proprietary manner (such as Revision Control System). With the growing use of computers, systems emerged that handled a broader scope, including requirements management, design alternatives, quality control, and more; later tools followed the guidelines of organizations, such as the Capability Maturity Model of the Software Engineering Institute.
Aiello, R. (2010). Configuration Management Best Practices: Practical Methods that Work in the Real World (1st ed.). Addison-Wesley. ISBN0-321-68586-5.
Babich, W.A. (1986). Software Configuration Management, Coordination for Team Productivity. 1st edition. Boston: Addison-Wesley
Bersoff, E.H. (1997). Elements of Software Configuration Management. IEEE Computer Society Press, Los Alamitos, CA, 1-32
Dennis, A., Wixom, B.H. & Tegarden, D. (2002). System Analysis & Design: An Object-Oriented Approach with UML. Hoboken, New York: John Wiley & Sons, Inc.
Paul M. Duvall, Steve Matyas, and Andrew Glover (2007). Continuous Integration: Improving Software Quality and Reducing Risk. (1st ed.). Addison-Wesley Professional. ISBN0-321-33638-0.