See: Effortless Cloud-Native Apps Development using Skaffold: Simplify the development and deployment of cloud-native Spring Boot microservices on Kubernetes with Skaffold
|Developer(s)||Cloud Native Computing Foundation|
|Initial release||7 June 2014; 6 years ago|
|Stable release||1.20 / December 8, 2020; 3 months ago|
|Type||Cluster management software|
|License||Apache License 2.0|
Kubernetes (/ˌk(j)uːbərˈnɛtɪs, -ˈneɪtɪs, -ˈneɪtiːz/, commonly stylized as K8s) is an open-source container–orchestration system for automating computer application deployment, scaling, and management. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation. It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”. It works with a range of container tools and runs containers in a cluster, often with images built using Docker. Kubernetes originally interfaced with the Docker runtime through a “Dockershim”; however, the shim has since been deprecated in favor of directly interfacing with containerd or another CRI-compliant runtime.
Many cloud services offer a Kubernetes-based platform or infrastructure as a service (PaaS or IaaS) on which Kubernetes can be deployed as a platform-providing service. Many vendors also provide their own branded Kubernetes distributions.” (WP)
Kubernetes (κυβερνήτης, Greek for “helmsman” or “pilot” or “governor”, and the etymological root of cybernetics) was founded by Joe Beda, Brendan Burns, and Craig McLuckie, who were quickly joined by other Google engineers including Brian Grant and Tim Hockin, and was first announced by Google in mid-2014. Its development and design are heavily influenced by Google’s Borg system, and many of the top contributors to the project previously worked on Borg. The original codename for Kubernetes within Google was Project 7, a reference to the Star Trek ex-Borg character Seven of Nine. The seven spokes on the wheel of the Kubernetes logo are a reference to that codename. The original Borg project was written entirely in C++, but the rewritten Kubernetes system is implemented in Go.
Kubernetes v1.0 was released on July 21, 2015. Along with the Kubernetes v1.0 release, Google partnered with the Linux Foundation to form the Cloud Native Computing Foundation (CNCF) and offered Kubernetes as a seed technology. In February 2016 Helm package manager for Kubernetes was released. On March 6, 2018, Kubernetes Project reached ninth place in commits at GitHub, and second place in authors and issues, after the Linux kernel.
Up to v1.18, Kubernetes followed an N-2 support policy (meaning that the 3 most recent minor versions receive security and bug fixes)
From v1.19 onwards, Kubernetes will follow an N-3 support policy.
The chart below visualises the period for which each release is/was supported
Kubernetes defines a set of building blocks (“primitives”), which collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by internal components as well as extensions and containers that run on Kubernetes. The platform exerts its control over compute and storage resources by defining resources as Objects, which can then be managed as such.
The Kubernetes master is the main controlling unit of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components, each its own process, that can run both on a single master node or on multiple masters supporting high-availability clusters. The various components of the Kubernetes control plane are as follows:
- etcd: etcd is a persistent, lightweight, distributed, key-value data store developed by CoreOS that reliably stores the configuration data of the cluster, representing the overall state of the cluster at any given point of time. Just like Apache ZooKeeper, etcd is a system that favors consistency over availability in the event of a network partition (see CAP theorem). This consistency is crucial for correctly scheduling and operating services. The Kubernetes API Server uses etcd’s watch API to monitor the cluster and roll out critical configuration changes or simply restore any divergences of the state of the cluster back to what was declared by the deployer. As an example, if the deployer specified that three instances of a particular pod need to be running, this fact is stored in etcd. If it is found that only two instances are running, this delta will be detected by comparison with etcd data, and Kubernetes will use this to schedule the creation of an additional instance of that pod.
- API server: The API server is a key component and serves the Kubernetes API using JSON over HTTP, which provides both the internal and external interface to Kubernetes. The API server processes and validates REST requests and updates state of the API objects in etcd, thereby allowing clients to configure workloads and containers across Worker nodes.
- Scheduler: The scheduler is the pluggable component that selects which node an unscheduled pod (the basic entity managed by the scheduler) runs on, based on resource availability. The scheduler tracks resource use on each node to ensure that workload is not scheduled in excess of available resources. For this purpose, the scheduler must know the resource requirements, resource availability, and other user-provided constraints and policy directives such as quality-of-service, affinity/anti-affinity requirements, data locality, and so on. In essence, the scheduler’s role is to match resource “supply” to workload “demand”.
- Controller manager: A controller is a reconciliation loop that drives actual cluster state toward the desired cluster state, communicating with the API server to create, update, and delete the resources it manages (pods, service endpoints, etc.). The controller manager is a process that manages a set of core Kubernetes controllers. One kind of controller is a Replication Controller, which handles replication and scaling by running a specified number of copies of a pod across the cluster. It also handles creating replacement pods if the underlying node fails. Other controllers that are part of the core Kubernetes system include a DaemonSet Controller for running exactly one pod on every machine (or some subset of machines), and a Job Controller for running pods that run to completion, e.g. as part of a batch job. The set of pods that a controller manages is determined by label selectors that are part of the controller’s definition.
A Node, also known as a Worker or a Minion, is a machine where containers (workloads) are deployed. Every node in the cluster must run a container runtime such as Docker, as well as the below-mentioned components, for communication with the primary for network configuration of these containers.
- Kubelet: Kubelet is responsible for the running state of each node, ensuring that all containers on the node are healthy. It takes care of starting, stopping, and maintaining application containers organized into pods as directed by the control plane.
Kubelet monitors the state of a pod, and if not in the desired state, the pod re-deploys to the same node. Node status is relayed every few seconds via heartbeat messages to the primary. Once the primary detects a node failure, the Replication Controller observes this state change and launches pods on other healthy nodes.
- Kube-proxy: The Kube-proxy is an implementation of a network proxy and a load balancer, and it supports the service abstraction along with other networking operation. It is responsible for routing traffic to the appropriate container based on IP and port number of the incoming request.
- Container runtime: A container resides inside a pod. The container is the lowest level of a micro-service, which holds the running application, libraries, and their dependencies. Containers can be exposed to the world through an external IP address. Kubernetes has supported Docker containers since its first version, and in July 2016 the rkt container engine was added.
The basic scheduling unit in Kubernetes is a pod. A pod is a grouping of containerized components. A pod consists of one or more containers that are guaranteed to be co-located on the same node.
Each pod in Kubernetes is assigned a unique IP address within the cluster, which allows applications to use ports without the risk of conflict. Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it has to use the Pod IP Address. An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral – the specific pod that they are referencing may be assigned to another Pod IP address on restart. Instead, they should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.
A pod can define a volume, such as a local disk directory or a network disk, and expose it to the containers in the pod. Pods can be managed manually through the Kubernetes API, or their management can be delegated to a controller. Such volumes are also the basis for the Kubernetes features of ConfigMaps (to provide access to configuration through the filesystem visible to the container) and Secrets (to provide access to credentials needed to access remote resources securely, by providing those credentials on the filesystem visible only to authorized containers).
A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
The ReplicaSets can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a Replica Set uses a selector, whose evaluation will result in identifying all pods that are associated with it.
A Kubernetes service is a set of pods that work together, such as one tier of a multi-tier application. The set of pods that constitute a service are defined by a label selector. Kubernetes provides two modes of service discovery, using environmental variables or using Kubernetes DNS. Service discovery assigns a stable IP address and DNS name to the service, and load balances traffic in a round-robin manner to network connections of that IP address among the pods matching the selector (even as failures cause the pods to move from machine to machine). By default a service is exposed inside a cluster (e.g., back end pods might be grouped into a service, with requests from the front-end pods load-balanced among them), but a service can also be exposed outside a cluster (e.g., for clients to reach front-end pods).
Filesystems in the Kubernetes container provide ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes Volume provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the filesystem tree by different containers.
Kubernetes provides a partitioning of the resources it manages into non-overlapping sets called namespaces. They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.
ConfigMaps and Secrets
A common application challenge is deciding where to store and manage configuration information, some of which may contain sensitive data. Configuration data can be anything as fine-grained as individual properties or coarse-grained information like entire configuration files or JSON / XML documents. Kubernetes provides two closely related mechanisms to deal with this need: “configmaps” and “secrets”, both of which allow for configuration changes to be made without requiring an application build. The data from configmaps and secrets will be made available to every single instance of the application to which these objects have been bound via the deployment. A secret and / or a configmap is only sent to a node if a pod on that node requires it. Kubernetes will keep it in memory on that node. Once the pod that depends on the secret or configmap is deleted, the in-memory copy of all bound secrets and configmaps are deleted as well. The data is accessible to the pod through one of two ways: a) as environment variables (which will be created by Kubernetes when the pod is started) or b) available on the container filesystem that is visible only from within the pod.
The data itself is stored on the master which is a highly secured machine which nobody should have login access to. The biggest difference between a secret and a configmap is that the content of the data in a secret is base64 encoded. Recent versions of Kubernetes have introduced support for encryption to be used as well. Secrets are often used to store data like certificates, passwords, pull secrets (credentials to work with image registries), and ssh keys.
It is very easy to address the scaling of stateless applications: one simply adds more running pods—which is something that Kubernetes does very well. Stateful workloads are much harder, because the state needs to be preserved if a pod is restarted, and if the application is scaled up or down, then the state may need to be redistributed. Databases are an example of stateful workloads. When run in high-availability mode, many databases come with the notion of a primary instance and secondary instance(s). In this case, the notion of ordering of instances is important. Other applications like Kafka distribute the data amongst their brokers—so one broker is not the same as another. In this case, the notion of instance uniqueness is important. StatefulSets are controllers (see Controller Manager, below) that are provided by Kubernetes that enforce the properties of uniqueness and ordering amongst instances of a pod and can be used to run stateful applications.
Normally, the locations where pods are run are determined by the algorithm implemented in the Kubernetes Scheduler. For some use cases, though, there could be a need to run a pod on every single node in the cluster. This is useful for use cases like log collection, ingress controllers, and storage services. The ability to do this kind of pod scheduling is implemented by the feature called DaemonSets.
Labels and selectors
Kubernetes enables clients (users or internal components) to attach keys called “labels” to any API object in the system, such as pods and nodes. Correspondingly, “label selectors” are queries against labels that resolve to matching objects. When a service is defined, one can define the label selectors that will be used by the service router / load balancer to select the pod instances that the traffic will be routed to. Thus, simply changing the labels of the pods or changing the label selectors on the service can be used to control which pods get traffic and which don’t, which can be used to support various deployment patterns like blue-green deployments or A-B testing. This capability to dynamically control how services utilize implementing resources provides a loose coupling within the infrastructure.
For example, if an application’s pods have labels for a system
tier (with values such as
back-end, for example) and a
release_track (with values such as
production, for example), then an operation on all of
canary nodes can use a label selector, such as:
tier=back-end AND release_track=canary
Just like labels, field selectors also let one select Kubernetes resources. Unlike labels, the selection is based on the attribute values inherent to the resource being selected, rather than user-defined categorization.
metadata.namespace are field selectors that will be present on all Kubernetes objects. Other selectors that can be used depend on the object/resource type.
Replication Controllers and Deployments
A ReplicaSet declares the number of instances of a pod that is needed, and a Replication Controller manages the system so that the number of healthy pods that are running matches the number of pods declared in the ReplicaSet (determined by evaluating its selector).
Deployments are a higher level management mechanism for ReplicaSets. While the Replication Controller manages the scale of the ReplicaSet, Deployments will manage what happens to the ReplicaSet – whether an update has to be rolled out, or rolled back, etc. When deployments are scaled up or down, this results in the declaration of the ReplicaSet changing – and this change in declared state is managed by the Replication Controller.
Add-ons operate just like any other application running within the cluster: they are implemented via pods and services, and are only different in that they implement features of the Kubernetes cluster. The pods may be managed by Deployments, ReplicationControllers, and so on. There are many add-ons, and the list is growing. Some of the more important are:
- DNS: All Kubernetes clusters should have cluster DNS; it is a mandatory feature. Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
- Web UI: This is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.
- Container Resource Monitoring: Providing a reliable application runtime, and being able to scale it up or down in response to workloads, means being able to continuously and effectively monitor workload performance. Container Resource Monitoring provides this capability by recording metrics about containers in a central database, and provides a UI for browsing that data. The cAdvisor is a component on a slave node that provides a limited metric monitoring capability. There are full metrics pipelines as well, such as Prometheus, which can meet most monitoring needs.
- Cluster-level logging: Logs should have a separate storage and lifecycle independent of nodes, pods, or containers. Otherwise, node or pod failures can cause loss of event data. The ability to do this is called cluster-level logging, and such mechanisms are responsible for saving container logs to a central log store with search/browsing interface. Kubernetes provides no native storage for log data, but one can integrate many existing logging solutions into the Kubernetes cluster.
Containers emerged as a way to make software portable. The container contains all the packages you need to run a service. The provided filesystem makes containers extremely portable and easy to use in development. A container can be moved from development to test or production with no or relatively few configuration changes.
Historically Kubernetes was suitable only for stateless services. However, many applications have a database, which requires persistence, which leads to the creation of persistent storage for Kubernetes. Implementing persistent storage for containers is one of the top challenges of Kubernetes administrators, DevOps and cloud engineers. Containers may be ephemeral, but more and more of their data is not, so one needs to ensure the data’s survival in case of container termination or hardware failure. When deploying containers with Kubernetes or containerized applications, companies often realize that they need persistent storage. They need to provide fast and reliable storage for databases, root images and other data used by the containers.
In addition to the landscape, the Cloud Native Computing Foundation (CNCF), has published other information about Kubernetes Persistent Storage including a blog helping to define the container attached storage pattern. This pattern can be thought of as one that uses Kubernetes itself as a component of the storage system or service.
More information about the relative popularity of these and other approaches can be found on the CNCF’s landscape survey as well, which showed that OpenEBS from MayaData and Rook – a storage orchestration project – were the two projects most likely to be in evaluation as of the Fall of 2019.
Container Attached Storage is a type of data storage that emerged as Kubernetes gained prominence. The Container Attached Storage approach or pattern relies on Kubernetes itself for certain capabilities while delivering primarily block, file, object and interfaces to workloads running on Kubernetes.
Common attributes of Container Attached Storage include the use of extensions to Kubernetes, such as custom resource definitions, and the use of Kubernetes itself for functions that otherwise would be separately developed and deployed for storage or data management. Examples of functionality delivered by custom resource definitions or by Kubernetes itself include retry logic, delivered by Kubernetes itself, and the creation and maintenance of an inventory of available storage media and volumes, typically delivered via a custom resource definition.
The design principles underlying Kubernetes allow one to programmatically create, configure, and manage Kubernetes clusters. This function is exposed via an API called the Cluster API. A key concept embodied in the API is the notion that the Kubernetes cluster is itself a resource / object that can be managed just like any other Kubernetes resources. Similarly, machines that make up the cluster are also treated as a Kubernetes resource. The API has two pieces – the core API, and a provider implementation. The provider implementation consists of cloud-provider specific functions that let Kubernetes provide the cluster API in a fashion that is well-integrated with the cloud-provider’s services and resources.
Kubernetes is commonly used as a way to host a microservice-based implementation, because it and its associated ecosystem of tools provide all the capabilities needed to address key concerns of any microservice architecture.
- ^ “First GitHub commit for Kubernetes”. github.com. 2014-06-07. Archived from the original on 2017-03-01.
- ^ “GitHub Releases page”. github.com. Retrieved 2020-10-31.
- ^ “Kubernetes 1.20: The Raddest Release”. Kubernetes. Retrieved 2020-12-14.
- ^ “Kubernetes GitHub Repository”. GitHub. January 22, 2021.
- ^ “kubernetes/kubernetes”. GitHub. Archived from the original on 2017-04-21. Retrieved 2017-03-28.
- ^ a b “What is Kubernetes?”. Kubernetes. Retrieved 2017-03-31.
- ^ “Kubernetes v1.12: Introducing RuntimeClass”. kubernetes.io.
- ^ “Don’t Panic: Kubernetes and Docker”. Kubernetes Blog. Retrieved 2020-12-22.
- ^ “Google Made Its Secret Blueprint Public to Boost Its Cloud”. Archived from the original on 2016-07-01. Retrieved 2016-06-27.
- ^ “Google Open Sources Its Secret Weapon in Cloud Computing”. Wired. Archived from the original on 10 September 2015. Retrieved 24 September 2015.
- ^ a b Abhishek Verma; Luis Pedrosa; Madhukar R. Korupolu; David Oppenheimer; Eric Tune; John Wilkes (April 21–24, 2015). “Large-scale cluster management at Google with Borg”. Proceedings of the European Conference on Computer Systems (EuroSys). Archived from the original on 2017-07-27.
- ^ “Borg, Omega, and Kubernetes – ACM Queue”. queue.acm.org. Archivedfrom the original on 2016-07-09. Retrieved 2016-06-27.
- ^ “Early Stage Startup Heptio Aims to Make Kubernetes Friendly”. Retrieved 2016-12-06.
- ^ “As Kubernetes Hits 1.0, Google Donates Technology To Newly Formed Cloud Native Computing Foundation”. TechCrunch. Archived from the original on 23 September 2015. Retrieved 24 September 2015.
- ^ “Cloud Native Computing Foundation”. Archived from the original on 2017-07-03.
- ^ https://github.com/helm/helm/releases/tag/v1.0
- ^ https://www.wikieduonline.com/wiki/Helm_(package_manager)
- ^ https://helm.sh/
- ^ Conway, Sarah. “Kubernetes Is First CNCF Project To Graduate” (html). Cloud Native Computing Foundation. Archived from the original on 29 October 2018. Retrieved 3 December 2018.
Compared to the 1.5 million projects on GitHub, Kubernetes is No. 9 for commits and No. 2 for authors/issues, second only to Linux.
- ^ “Kubernetes version and version skew support policy”. Kubernetes. Retrieved 2020-03-03.
- ^ a b “Kubernetes 1.19 Release Announcement > Increase Kubernetes support window to one year”. Kubernetes. Retrieved 2020-08-28.
- ^ a b “Kubernetes Patch Releases”. 5 January 2021.
- ^ “Kubernetes 1.19 Release Announcement”. Kubernetes. Retrieved 2020-08-28.
- ^ Sharma, Priyanka (13 April 2017). “Autoscaling based on CPU/Memory in Kubernetes—Part II”. Powerupcloud Tech Blog. Medium. Retrieved 27 December2018.
- ^ “Configure Kubernetes Autoscaling With Custom Metrics”. Bitnami. BitRock. 15 November 2018. Retrieved 27 December 2018.
- ^ a b c d e f g h i “An Introduction to Kubernetes”. DigitalOcean. Archived from the original on 1 October 2015. Retrieved 24 September 2015.
- ^ a b c “Kubernetes Infrastructure”. OpenShift Community Documentation. OpenShift. Archived from the original on 6 July 2015. Retrieved 24 September2015.
- ^ Container Linux by CoreOS: Cluster infrastructure
- ^ a b Marhubi, Kamal (2015-09-26). “Kubernetes from the ground up: API server”. kamalmarhubi.com. Archived from the original on 2015-10-29. Retrieved 2015-11-02.
- ^ Ellingwood, Justin (2 May 2018). “An Introduction to Kubernetes”. DigitalOcean. Archived from the original on 5 July 2018. Retrieved 20 July 2018.
One of the most important primary services is an API server. This is the main management point of the entire cluster as it allows a user to configure Kubernetes’ workloads and organizational units. It is also responsible for making sure that the etcd store and the service details of deployed containers are in agreement. It acts as the bridge between various components to maintain cluster health and disseminate information and commands.
- ^ “The Three Pillars of Kubernetes Container Orchestration – Rancher Labs”. rancher.com. 18 May 2017. Archived from the original on 24 June 2017. Retrieved 22 May 2017.
- ^ a b “Overview of a Replication Controller”. Documentation. CoreOS. Archivedfrom the original on 2015-09-22. Retrieved 2015-11-02.
- ^ Sanders, Jake (2015-10-02). “Kubernetes: Exciting Experimental Features”. Livewyer. Archived from the original on 2015-10-20. Retrieved 2015-11-02.
- ^ a b “Intro: Docker and Kubernetes training – Day 2”. Red Hat. 2015-10-20. Archived from the original on 2015-10-29. Retrieved 2015-11-02.
- ^ Marhubi, Kamal (2015-08-27). “What [..] is a Kubelet?”. kamalmarhubi.com. Archived from the original on 2015-11-13. Retrieved 2015-11-02.
- ^ “rktnetes brings rkt container engine to Kubernetes”. kubernetes.io.
- ^ “Pods”. kubernetes.io.
- ^ a b Langemak, Jon (2015-02-11). “Kubernetes 101 – Networking”. Das Blinken Lichten. Archived from the original on 2015-10-25. Retrieved 2015-11-02.
- ^ Strachan, James (2015-05-21). “Kubernetes for Developers”. Medium (publishing platform). Archived from the original on 2015-09-07. Retrieved 2015-11-02.
- ^ “ReplicaSet”. kubernetes.io. Retrieved 2020-03-03.
- ^ “Deployments, ReplicaSets, and pods”.
- ^ “Service”. kubernetes.io.
- ^ Langemak, Jon (2015-02-15). “Kubernetes 101 – External Access Into The Cluster”. Das Blinken Lichten. Archived from the original on 2015-10-26. Retrieved 2015-11-02.
- ^ “Volumes”. kubernetes.io.
- ^ “Namespaces”. kubernetes.io.
- ^ “StatefulSets”. kubernetes.io.
- ^ “DaemonSet”. kubernetes.io.
- ^ https://www.cncf.io/blog/2018/04/19/container-attached-storage-a-primer/
- ^ https://www.cncf.io/wp-content/uploads/2020/03/CNCF_Survey_Report.pdf
- ^ “Container Attached Storage: A primer”. Cloud Native Computing Foundation. 2018-04-19. Retrieved 2020-10-09.
- ^ “Container Attached Storage | SNIA”. www.snia.org. Retrieved 2020-10-09.
- ^ “Cloud Native Application Checklist: Cloud Native Storage”. www.replex.io. Retrieved 2020-10-09.
- 2014 software
- Cloud infrastructure
- Containerization software
- Free software for cloud computing
- Free software programmed in Go
- Linux Containerization
- Linux Foundation projects
- Software using the Apache license
- Virtualization-related software for Linux
- Orchestration software
“A DevOps toolchain is a set or combination of tools that aid in the delivery, development, and management of software applications throughout the systems development life cycle, as coordinated by an organization that uses DevOps practices.
“In software, a toolchain is the set of programming tools that is used to perform a complex software development task or to create a software product, which is typically another computer program or a set of related programs. In general, the tools forming a toolchain are executed consecutively so the output or resulting environment state of each tool becomes the input or starting environment for the next one, but the term is also used when referring to a set of related tools that are not necessarily executed consecutively.
As DevOps is a set of practices that emphasizes the collaboration and communication of both software developers and other information technology (IT) professionals, while automating the process of software delivery and infrastructure changes, its implementation can include the definition of the series of tools used at various stages of the lifecycle; because DevOps is a cultural shift and collaboration between development and operations, there is no one product that can be considered a single DevOps tool. Instead a collection of tools, potentially from a variety of vendors, are used in one or more stages of the lifecycle.” (WP)
Stages of DevOps
Further information: DevOps
Plan is composed of two things: “define” and “plan”. This activity refers to the business value and application requirements. Specifically “Plan” activities include:
- Production metrics, objects and feedback
- Business metrics
- Update release metrics
- Release plan, timing and business case
- Security policy and requirement
A combination of the IT personnel will be involved in these activities: business application owners, software development, software architects, continual release management, security officers and the organization responsible for managing the production of IT infrastructure.
- Design of the software and configuration
- Coding including code quality (see coding conventions and coding best practices) and performance
- Software build and build performance
- Release candidate
Verify is directly associated with ensuring the quality of the software release; activities designed to ensure code quality is maintained and the highest quality is deployed to production. The main activities in this are:
- Acceptance testing
- Regression testing
- Security and vulnerability analysis
- Performance testing
- Configuration testing
Packaging refers to the activities involved once the release is ready for deployment, often also referred to as staging or Preproduction / “preprod”. This often includes tasks and activities such as:
- Package configuration
- Triggered releases
- Release staging and holding
Release related activities include schedule, orchestration, provisioning and deploying software into production and targeted environment. The specific Release activities include:
- Release coordination
- Deploying and promoting applications
- Fallbacks and recovery
- Scheduled/timed releases
Configure activities fall under the operation side of DevOps. Once software is deployed, there may be additional IT infrastructure provisioning and configuration activities required. Specific activities including:
- Infrastructure storage, database and network provisioning and configuring
- Application provision and configuration.
Monitoring is an important link in a DevOps toolchain. It allows IT organization to identify specific issues of specific releases and to understand the impact on end-users. A summary of Monitor related activities are:
- Performance of IT infrastructure
- End-user response and experience
- Production metrics and statistics
Information from monitoring activities often impacts Plan activities required for changes and for new release cycles.
Version Control is an important link in a DevOps toolchain and a component of software configuration management. Version Control is the management of changes to documents, computer programs, large web sites, and other collections of information. A summary of Version Control related activities are:
- Non-linear development
- Distributed development
- Compatibility with existent systems and protocols
- Toolkit-based design
Information from Version Control often supports Release activities required for changes and for new release cycles.
- ^ Edwards, Damon. “Integrating DevOps tools into a Service Delivery Platform”. dev2ops.org.
- ^ Seroter, Richard. “Exploring the ENTIRE DevOps Toolchain for (Cloud) Teams”. infoq.com.
- ^ “Toolchain Overview”. nongnu.org. 2012-01-03. Retrieved 2013-10-21.
- ^ “Toolchains”. elinux.org. 2013-09-08. Retrieved 2013-10-21.
- ^ Imran, Saed; Buchheit, Martin; Hollunder, Bernhard; Schreier, Ulf (2015-10-29). Tool Chains in Agile ALM Environments: A Short Introduction. Lecture Notes in Computer Science. 9416. pp. 371–380. doi:10.1007/978-3-319-26138-6_40. ISBN 978-3-319-26137-9.
- ^ Loukides, Mike (2012-06-07). “What is DevOps?”.
- ^ Garner Market Trends: DevOps – Not a Market, but Tool-Centric Philosophy That supports a Continuous Delivery Value Chain (Report). Gartner. 18 February 2015.
- ^ a b c d e f g Avoid Failure by Developing a Toolchain that Enables DevOps (Report). Gartner. 16 March 2016.
- ^ Best Practices in Change, Configuration and Release Management (Report). Gartner. 14 July 2010.
- ^ Roger S. Pressman (2009). Software Engineering: A Practitioner’s Approach (7th International ed.). New York: McGraw-Hill.
|Type of business||Subsidiary|
|Type of site||Collaborative version control|
|Founded||February 8, 2008; 13 years ago (as Logical Awesome LLC)|
|Headquarters||San Francisco, California, United States|
|Founder(s)||Tom Preston-WernerChris WanstrathP. J. HyettScott Chacon|
|Key people||Mike Taylor (CFO)|
|Industry||Collaborative version control (GitHub)|
Blog host (GitHub Pages)
Package repository (NPM)
|Registration||Optional (required for creating and joining repositories)|
|Users||56 million (Sep 2020)|
|Launched||April 10, 2008; 12 years ago|
GitHub, Inc. is a provider of Internet hosting for software development and version control using Git. It offers the distributed version control and source code management (SCM) functionality of Git, plus its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, continuous integration and wikis for every project. Headquartered in California, it has been a subsidiary of Microsoft since 2018.
GitHub offers its basic services free of charge. Its more advanced professional and enterprise services are commercial. Free GitHub accounts are commonly used to host open-source projects. As of January 2019, GitHub offers unlimited private repositories to all plans, including free accounts, but allowed only up to three collaborators per repository for free. Starting from April 15, 2020, the free plan allows unlimited collaborators, but restricts private repositories to 2,000 minutes of GitHub Actions per month. As of January 2020, GitHub reports having over 40 million users and more than 190 million repositories (including at least 28 million public repositories), making it the largest host of source code in the world.
The GitHub service was developed by Chris Wanstrath, P. J. Hyett, Tom Preston-Werner and Scott Chacon using Ruby on Rails, and started in February 2008. The company, GitHub, Inc., has existed since 2007 and is located in San Francisco.The shading of the map illustrates the number of users as a proportion of each country’s Internet population. The circular charts surrounding the two hemispheres depict the total number of GitHub users (left) and commits (right) per country.
On February 24, 2009, GitHub announced that within the first year of being online, GitHub had accumulated over 46,000 public repositories, 17,000 of which were formed in the previous month. At that time, about 6,200 repositories had been forked at least once and 4,600 had been merged.
That same year, the site was harnessed by over 100,000 users, according to Github, and had grown to host 90,000 unique public repositories, 12,000 having been forked at least once, for a total of 135,000 repositories.
In 2010, GitHub was hosting 1 million repositories. A year later, this number doubled. ReadWriteWeb reported that GitHub had surpassed SourceForge and Google Code in total number of commits for the period of January to May 2011. On January 16, 2013, GitHub passed the 3 million users mark and was then hosting more than 5 million repositories. By the end of the year, the number of repositories were twice as much, reaching 10 million repositories.
In 2012, GitHub raised $100 million in funding from Andreessen Horowitz with $750 million valuation. Peter Levine, general partner at Andreessen Horowitz, stated that GitHub had been growing revenue at 300% annually since 2008 “profitably nearly the entire way”. On July 29, 2015, GitHub stated it had raised $250 million in funding in a round led by Sequoia Capital. Other investors of that round included Andreessen Horowitz, Thrive Capital, and IVP (Institutional Venture Partners). The round valued the company at approximately $2 billion.
In 2015, GitHub opened an office in Japan that is its first office outside of the U.S. In 2016, GitHub was ranked No. 14 on the Forbes Cloud 100 list. It has not been featured on the 2018, 2019 and 2020 lists.
Acquisition by Microsoft
From 2012 Microsoft became a significant user of GitHub, using it to host open-source projects and development tools such as .NET Core, Chakra Core, MSBuild, PowerShell, PowerToys, Visual Studio Code, Windows Calculator, Windows Terminal and the bulk of its product documentation (now to be found on Microsoft Docs).
On June 4, 2018, Microsoft announced its intent to acquire GitHub for US$7.5 billion. The deal closed on October 26, 2018. GitHub continued to operate independently as a community, platform and business. Under Microsoft, the service was led by Xamarin‘s Nat Friedman, reporting to Scott Guthrie, executive vice president of Microsoft Cloud and AI. GitHub’s CEO, Chris Wanstrath, was retained as a “technical fellow”, also reporting to Guthrie.
This acquisition was in line with Microsoft’s business strategy under CEO Satya Nadella, which has seen a larger focus on the cloud computing services, alongside development of and contributions to open-source software. Harvard Business Review argued that Microsoft was intending to acquire GitHub to get access to its user base, so it can be used as a loss leader to encourage use of its other development products and services.
Concerns over the sale bolstered interest in competitors: Bitbucket (owned by Atlassian), GitLab (a commercial open source product that also runs a hosted service version) and SourceForge (owned by BIZX, LLC) reported that they had seen spikes in new users intending to migrate projects from GitHub to their respective services.
In early July 2020, the GitHub Archive Program was established, to archive its open source code in perpetuity.
Development of the GitHub.com platform began on October 19, 2007. The site was launched in April 2008 by Tom Preston-Werner, Chris Wanstrath, P. J. Hyett and Scott Chacon after it had been made available for a few months prior as a beta release.
Projects on GitHub.com can be accessed and managed using the standard Git command-line interface; all standard Git commands work with it. GitHub.com also allows users to browse public repositories on the site. Multiple desktop clients and Git plugins are also available. The site provides social networking-like functions such as feeds, followers, wikis (using wiki software called Gollum) and a social network graph to display how developers work on their versions (“forks“) of a repository and what fork (and branch within that fork) is newest.
Anyone can browse and download public repositories but only registered users can contribute content to repositories. With a registered user account, users are able to have discussions, manage repositories, submit contributions to others’ repositories, and review changes to code. GitHub.com began offering unlimited private repositories at no cost in January 2019 (limited to three contributors per project). Previously, only public repositories were free. On April 14, 2020, GitHub made “all of the core GitHub features” free for everyone, including “private repositories with unlimited collaborators”.
The fundamental software that underpins GitHub is Git itself, written by Linus Torvalds, creator of Linux. The additional software that provides the GitHub user interface was written using Ruby on Rails and Erlang by GitHub, Inc. developers Wanstrath, Hyett, and Preston-Werner.
The main purpose of GitHub.com is to facilitate the version control and issue tracking aspects of software development. Labels, milestones, responsibility assignment, and a search engine are available for issue tracking. For version control, Git (and by extension GitHub.com) allows pull requests to propose changes to the source code. Users with the ability to review the proposed changes can see a diff of the requested changes and approve them. In Git terminology, this action is called “committing” and one instance of it is a “commit”. A history of all commits are kept and can be viewed at a later time.
In addition, GitHub supports the following formats and features:
- Documentation, including automatically rendered README files in a variety of Markdown-like file formats (see README § On GitHub)
- GitHub Actions, which allows building continuous integration and continuous deployment pipelines for testing, releasing and deploying software without the use of third-party websites/platforms
- Graphs: pulse, contributors, commits, code frequency, punch card, network, members
- Integrations Directory
- Email notifications
- Option to subscribe someone to notifications by @ mentioning them.
- Nested task-lists within files
- Visualization of geospatial data
- 3D render files that can be previewed using a new integrated STL file viewer that displays the files on a “3D canvas”. The viewer is powered by WebGL and Three.js.
- Photoshop’s native PSD format can be previewed and compared to previous versions of the same file.
- PDF document viewer
- Security Alerts of known Common Vulnerabilities and Exposures in different packages
GitHub’s Terms of Service do not require public software projects hosted on GitHub to meet the Open Source Definition. The terms of service state, “By setting your repositories to be viewed publicly, you agree to allow others to view and fork your repositories.”
GitHub Enterprise is a self-managed version of GitHub.com with similar functionality. It can be run on an organization’s own hardware or on a cloud provider, and it has been available since November 2011. In November 2020, source code for GitHub Enterprise Server was leaked online in apparent protest against DMCA takedown of YouTube-dl. According to GitHub, the source code came from GitHub accidentally sharing the code with Enterprise customers themselves, not from an attack on GitHub servers.
All GitHub Pages content is stored in a Git repository, either as files served to visitors verbatim or in Markdown format. GitHub is seamlessly integrated with Jekyll static web site and blog generator and GitHub continuous integration pipelines. Each time the content source is updated, Jekyll regenerates the website and automatically serves it via GitHub Pages infrastructure.
As with the rest of GitHub, it includes both free and paid tiers of service, instead of being supported by web advertising. Web sites generated through this service are hosted either as subdomains of the github.io domain, or as custom domains bought through a third-party domain name registrar. When custom domain is set on a GitHub Pages repo a Let’s Encrypt certificate for it is generated automatically. Once the certificate has been generated Enforce HTTPS can be set for the repository’s website to transparently redirect all HTTP requests to HTTPS.
Tom Preston-Werner presented the then-new Gist feature at a punk rock Ruby conference in 2008. Gist builds on the traditional simple concept of a pastebin by adding version control for code snippets, easy forking, and TLS encryption for private pastes. Because each “gist” has its own Git repository, multiple code snippets can be contained in a single paste and they can be pushed and pulled using Git. Further, forked code can be pushed back to the original author in the form of a patch, so gists (pastes) can become more like mini-projects.
GitHub launched a new program called the GitHub Student Developer Pack to give students free access to popular development tools and services. GitHub partnered with Bitnami, Crowdflower, DigitalOcean, DNSimple, HackHands, Namecheap, Orchestrate, Screenhero, SendGrid, Stripe, Travis CI and Unreal Engine to launch the program.
In 2016 GitHub announced the launch of the GitHub Campus Experts program to train and encourage students to grow technology communities at their universities. The Campus Experts program is open to university students of 18 years and older across the world. GitHub Campus Experts are one of the primary ways that GitHub funds student oriented events and communities, Campus Experts are given access to training, funding, and additional resources to run events and grow their communities. To become a Campus Expert applicants must complete an online training course consisting of multiple modules designed to grow community leadership skills.
GitHub Marketplace service
GitHub also provides some software as a service integrations for adding extra features to projects. Those services include:
- Waffle.io: Project management for software teams. Automatically see pull requests, automated builds, reviews, and deployments across all of your repositories in GitHub.
- GitLocalize: Developed for teams that are translating their content from one point to another. GitLocalize automatically syncs with your repository so you can keep your workflow on GitHub. It also keeps you updated on what needs to be translated.
GitHub Sponsors allows users to make monthly money donations to projects hosted on GitHub. The public beta was announced on May 23, 2019 and currently the project accepts wait list registrations. The Verge said that GitHub Sponsors “works exactly like Patreon” because “developers can offer various funding tiers that come with different perks, and they’ll receive recurring payments from supporters who want to access them and encourage their work” except with “zero fees to use the program”. Furthermore, GitHub offer incentives for early adopters during the first year: it pledges to cover payment processing costs, and match sponsorship payments up to $5,000 per developer. Furthermore, users still can use other similar services like Patreon and Open Collective and link to their own websites.
GitHub Archive Program
In July 2020, GitHub stored a February archive of the site in an abandoned mountain mine in Svalbard, Norway, part of the Arctic World Archive and not far from the Svalbard Global Seed Vault. The archive contained the code of all active public repositories, as well as that of dormant, but significant public repositories. The 21TB of data was stored on piqlFilm archival film reels as QR codes, and is expected to last 500–1,000 years.
The GitHub Archive Program is also working with partners on Project Silica, in an attempt to store all public repositories for 10,000 years. It aims to write archives into the molecular structure of quartz glass platters, using a high-precision laser that pulses a quadrillion (1,000,000,000,000,000) times per second.
- Atom, a free and open-source text and source code editor
Some prominent open source organizations and projects use GitHub as a primary place for collaboration, including:
- Apertium (migrated from SourceForge)
- The Apache Software Foundation (finished migration in February 2019)
- Bootstrap (front-end framework)
- National Security Agency
- Swift (by Apple)
- uBlock Origin
- U.S. Immigration and Customs Enforcement
- HM Government
- Collaborative innovation network
- Collaborative intelligence
- Commons-based peer production
- Comparison of source code hosting facilities
- 2018 mergers and acquisitions
- Bug and issue tracking software
- Cloud computing providers
- Collaborative projects
- Computing websites
- Cross-platform software
- Git (software)
- Internet properties established in 2008
- Microsoft acquisitions
- Microsoft subsidiaries
- Microsoft websites
- Open-source software hosting facilities
- Project hosting websites
- Project management software
- Remote companies
- South of Market, San Francisco
- Version control