Categories
DevOps Java Kotlin Software Engineering

Apache Maven build automation tool

Maven logo.svg
Developer(s)Apache Software Foundation
Initial release13 July 2004; 16 years ago
Stable release3.6.3 / 25 November 2019; 15 months ago[1]
RepositoryMaven Repository
Written inJava
TypeBuild tool
LicenseApache License 2.0
Websitemaven.apache.org

Maven is a build automation tool used primarily for Java projects. Maven can also be used to build and manage projects written in C#RubyScala, and other languages. The Maven project is hosted by the Apache Software Foundation, where it was formerly part of the Jakarta Project.

Maven addresses two aspects of building software: how software is built, and its dependencies. Unlike earlier tools like Apache Ant, it uses conventions for the build procedure, and only exceptions need to be written down. An XML file describes the software project being built, its dependencies on other external modules and components, the build order, directories, and required plug-ins. It comes with pre-defined targets for performing certain well-defined tasks such as compilation of code and its packaging. Maven dynamically downloads Java libraries and Maven plug-ins from one or more repositories such as the Maven 2 Central Repository, and stores them in a local cache.[2] This local cache of downloaded artifacts can also be updated with artifacts created by local projects. Public repositories can also be updated.

Maven is built using a plugin-based architecture that allows it to make use of any application controllable through standard input. A C/C++ native plugin is maintained for Maven 2.[3]

Alternative technologies like Gradle and sbt as build tools do not rely on XML, but keep the key concepts Maven introduced. With Apache Ivy, a dedicated dependency manager was developed as well that also supports Maven repositories.[4]

Apache Maven has support for reproducible builds.[5][6]

History

The number of artifacts on Maven’s central repository has grown rapidly

Maven, created by Jason van Zyl, began as a sub-project of Apache Turbine in 2002. In 2003, it was voted on and accepted as a top level Apache Software Foundation project. In July 2004, Maven’s release was the critical first milestone, v1.0. Maven 2 was declared v2.0 in October 2005 after about six months in beta cycles. Maven 3.0 was released in October 2010 being mostly backwards compatible with Maven 2.

Maven 3.0 information began trickling out in 2008. After eight alpha releases, the first beta version of Maven 3.0 was released in April 2010. Maven 3.0 has reworked the core Project Builder infrastructure resulting in the POM’s file-based representation being decoupled from its in-memory object representation. This has expanded the possibility for Maven 3.0 add-ons to leverage non-XML based project definition files. Languages suggested include Ruby (already in private prototype by Jason van Zyl), YAML, and Groovy.

Special attention was given to ensuring backward compatibility of Maven 3 to Maven 2. For most projects, upgrading to Maven 3 will not require any adjustments of their project structure. The first beta of Maven 3 saw the introduction of a parallel build feature which leverages a configurable number of cores on a multi-core machine and is especially suited for large multi-module projects.

Syntax[edit]

A directory structure for a Java project auto-generated by Maven

Maven projects are configured using a Project Object Model, which is stored in a pom.xml-file. An example file looks like:

<project>
  <!-- model version is always 4.0.0 for Maven 2.x POMs -->
  <modelVersion>4.0.0</modelVersion>
  <!-- project coordinates, i.e. a group of values which uniquely identify this project -->
  <groupId>com.mycompany.app</groupId>
  <artifactId>my-app</artifactId>
  <version>1.0</version>
  <!-- library dependencies -->
  <dependencies>
    <dependency>
      <!-- coordinates of the required library -->
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <!-- this dependency is only used for running and compiling tests -->
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

This POM only defines a unique identifier for the project (coordinates) and its dependency on the JUnit framework. However, that is already enough for building the project and running the unit tests associated with the project. Maven accomplishes this by embracing the idea of Convention over Configuration, that is, Maven provides default values for the project’s configuration.

The directory structure of a normal idiomatic Maven project has the following directory entries:

Directory namePurpose
project homeContains the pom.xml and all subdirectories.
src/main/javaContains the deliverable Java sourcecode for the project.
src/main/resourcesContains the deliverable resources for the project, such as property files.
src/test/javaContains the testing Java sourcecode (JUnit or TestNG test cases, for example) for the project.
src/test/resourcesContains resources necessary for testing.

The command mvn package will compile all the Java files, run any tests, and package the deliverable code and resources into target/my-app-1.0.jar (assuming the artifactId is my-app and the version is 1.0.)

Using Maven, the user provides only configuration for the project, while the configurable plug-ins do the actual work of compiling the project, cleaning target directories, running unit tests, generating API documentation and so on. In general, users should not have to write plugins themselves. Contrast this with Ant and make, in which one writes imperative procedures for doing the aforementioned tasks.

Design[edit]

Project Object Model[edit]

A Project Object Model (POM) provides all the configuration for a single project. General configuration covers the project’s name, its owner and its dependencies on other projects. One can also configure individual phases of the build process, which are implemented as plugins. For example, one can configure the compiler-plugin to use Java version 1.5 for compilation, or specify packaging the project even if some unit tests fail.

Larger projects should be divided into several modules, or sub-projects, each with its own POM. One can then write a root POM through which one can compile all the modules with a single command. POMs can also inherit configuration from other POMs. All POMs inherit from the Super POM[7] by default. The Super POM provides default configuration, such as default source directories, default plugins, and so on.

Plug-ins[edit]

Most of Maven’s functionality is in plug-ins. A plugin provides a set of goals that can be executed using the command mvn [plugin-name]:[goal-name]. For example, a Java project can be compiled with the compiler-plugin’s compile-goal[8] by running mvn compiler:compile.

There are Maven plugins for building, testing, source control management, running a web server, generating Eclipse project files, and much more.[9] Plugins are introduced and configured in a <plugins>-section of a pom.xml file. Some basic plugins are included in every project by default, and they have sensible default settings.

However, it would be cumbersome if the archetypal build sequence of building, testing and packaging a software project required running each respective goal manually:

  • mvn compiler:compile
  • mvn surefire:test
  • mvn jar:jar

Maven’s lifecycle concept handles this issue.

Plugins are the primary way to extend Maven. Developing a Maven plugin can be done by extending the org.apache.maven.plugin.AbstractMojo class. Example code and explanation for a Maven plugin to create a cloud-based virtual machine running an application server is given in the article Automate development and management of cloud virtual machines.[10]

Build lifecycles[edit]

The build lifecycle is a list of named phases that can be used to give order to goal execution. One of Maven’s standard lifecycles is the default lifecycle, which includes the following phases, in this order:[11]

  • validate
  • generate-sources
  • process-sources
  • generate-resources
  • process-resources
  • compile
  • process-test-sources
  • process-test-resources
  • test-compile
  • test
  • package
  • install
  • deploy

Goals provided by plugins can be associated with different phases of the lifecycle. For example, by default, the goal “compiler:compile” is associated with the “compile” phase, while the goal “surefire:test” is associated with the “test” phase. When the mvn test command is executed, Maven runs all goals associated with each of the phases up to and including the “test” phase. In such a case, Maven runs the “resources:resources” goal associated with the “process-resources” phase, then “compiler:compile”, and so on until it finally runs the “surefire:test” goal.

Maven also has standard phases for cleaning the project and for generating a project site. If cleaning were part of the default lifecycle, the project would be cleaned every time it was built. This is clearly undesirable, so cleaning has been given its own lifecycle.

Standard lifecycles enable users new to a project the ability to accurately build, test and install every Maven project by issuing the single command mvn install. By default, Maven packages the POM file in generated JAR and WAR files. Tools like diet4j[12] can use this information to recursively resolve and run Maven modules at run-time without requiring an “uber”-jar that contains all project code.

Dependencies[edit]

A central feature in Maven is dependency management. Maven’s dependency-handling mechanism is organized around a coordinate system identifying individual artifacts such as software libraries or modules. The POM example above references the JUnit coordinates as a direct dependency of the project. A project that needs, say, the Hibernate library simply has to declare Hibernate’s project coordinates in its POM. Maven will automatically download the dependency and the dependencies that Hibernate itself needs (called transitive dependencies) and store them in the user’s local repository. Maven 2 Central Repository[2] is used by default to search for libraries, but one can configure the repositories to be used (e.g., company-private repositories) within the POM.

The fundamental difference between Maven and Ant is that Maven’s design regards all projects as having a certain structure and a set of supported task work-flows (e.g., getting resources from source control, compiling the project, unit testing, etc.). While most software projects in effect support these operations and actually do have a well-defined structure, Maven requires that this structure and the operation implementation details be defined in the POM file. Thus, Maven relies on a convention on how to define projects and on the list of work-flows that are generally supported in all projects.[13]

There are search engines such as The Central Repository Search Engine[14] which can be used to find out coordinates for different open-source libraries and frameworks.

Projects developed on a single machine can depend on each other through the local repository. The local repository is a simple folder structure that acts both as a cache for downloaded dependencies and as a centralized storage place for locally built artifacts. The Maven command mvn install builds a project and places its binaries in the local repository. Then other projects can utilize this project by specifying its coordinates in their POMs.

Interoperability[edit]

Add-ons to several popular integrated development environments targeting the Java programming language exist to provide integration of Maven with the IDE’s build mechanism and source editing tools, allowing Maven to compile projects from within the IDE, and also to set the classpath for code completion, highlighting compiler errors, etc. Examples of popular IDEs supporting development with Maven include:

These add-ons also provide the ability to edit the POM or use the POM to determine a project’s complete set of dependencies directly within the IDE.

Some built-in features of IDEs are forfeited when the IDE no longer performs compilation. For example, Eclipse’s JDT has the ability to recompile a single Java source file after it has been edited. Many IDEs work with a flat set of projects instead of the hierarchy of folders preferred by Maven. This complicates the use of SCM systems in IDEs when using Maven.[15][16][17]

See also[edit]

References[edit]

  1. ^ “Apache Projects Releases”projects.apache.org.
  2. a b “Index of /maven2/”. Archived from the original on 2018-09-17. Retrieved 2009-04-15.
  3. ^ Laugstol, Trygve. “MojoHaus Native Maven Plugin”.
  4. ^ “IBiblio Resolver | Apache Ivy™”.
  5. ^ “Reproducible/Verifiable Builds – Apache Maven – Apache Software Foundation”cwiki.apache.org.
  6. ^ “Reproducible Builds in Java – DZone Java”dzone.com.
  7. ^ Super POM
  8. ^ Punzalan, Edwin. “Apache Maven Compiler Plugin – Introduction”.
  9. ^ Marbaise, Brett Porter Jason van Zyl Dennis Lundberg Olivier Lamy Benson Margulies Karl-Heinz. “Maven – Available Plugins”.
  10. ^ Amies, Alex; Zou P X; Wang Yi S (29 Oct 2011). “Automate development and management of cloud virtual machines”IBM developerWorks. IBM.
  11. ^ Porter, Brett. “Maven – Introduction to the Build Lifecycle”.
  12. ^ “diet4j – put Java JARs on a diet, and load maven modules as needed”.
  13. ^ “Maven: The Complete Reference”. Sonatype. Archived from the original on 21 April 2013. Retrieved 11 April 2013.
  14. ^ The Central Repository Search Engine,
  15. ^ “maven.apache.org/eclipse-plugin.html”. Archived from the original on May 7, 2015.
  16. ^ “IntelliJ IDEA :: Features”.
  17. ^ “MavenBestPractices – NetBeans Wiki”.

Further reading[edit]

External links[edit]

vteApache Software Foundation
Top-level
projects
AccumuloActiveMQAirflowAmbariAntAriesApache HTTP ServerAPRAvroAxisAxis2BeamBloodhoundBrooklynBuildrCalciteCamelCarbonDataCassandraCayenneChemistryCloudStackCocoonCordovaCouchDBcTAKESCXFDerbyDirectoryDrillDruidEmpire-dbFelixFlexFlinkFlumeGeronimoGiraphGumpHadoopHBaseHelixHiveImpalaJackrabbitJamesJenaJiniJMeterKafkaKarafKuduKylinLuceneMahoutMarmottaMavenMINAmod_perlMyFacesNetBeansNutchOFBizOozieOpenEJBOpenJPAOpenNLPOрenOfficeORCPDFBoxParquetPhoenixPOIPigPivotQpidRollerRocketMQSamzaServiceMixShiroSINGASlingSolrSparkStormSpamAssassinSqoopStruts 1Struts 2SubversionSupersetSystemMLTapestryThriftTikaTomcatTrafodionTraffic ServerUIMAVelocityWicketXalanXercesXMLBeansYetusZooKeeper
CommonsBCELBSFDaemonJellyLogging
IncubatorIcebergMXNetNuttXTavernaXAP
Other projectsBatikChainsawFOPIvyLog4j
AtticAbderaApexAxKitBeehiveBlueskyiBATISC++ Standard LibraryCactusClickContinuumDeltacloudEtchExcaliburForrestHamaHarmonyHiveMindJakartaLenyaODEShaleShindigSlideStanbolTuscanyWaveWink
LicensesApache License
Category Category

Categories

” (WP)

Sources:

Fair Use Sources:

Categories
Software Engineering

Package manager – Package management system – Software package (installation)

package manager or package-management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer‘s operating system in a consistent manner.[1]

A package manager deals with packages, distributions of software and data in archive files. Packages contain metadata, such as the software’s name, description of its purpose, version number, vendor, checksum (preferably a cryptographic hash function), and a list of dependencies necessary for the software to run properly. Upon installation, metadata is stored in a local package database. Package managers typically maintain a database of software dependencies and version information to prevent software mismatches and missing prerequisites. They work closely with software repositoriesbinary repository managers, and app stores.

Package managers are designed to eliminate the need for manual installs and updates. This can be particularly useful for large enterprises whose operating systems are typically consisting of hundreds or even tens of thousands of distinct software packages.[2]

Functions

Illustration of a package manager being used to download new software. Manual actions can include accepting a license agreement or selecting some package-specific configuration options.

A software package is an archive file containing a computer program as well as necessary metadata for its deployment. The computer program can be in source code that has to be compiled and built first.[3] Package metadata include package description, package version, and dependencies (other packages that need to be installed beforehand).

Package managers are charged with the task of finding, installing, maintaining or uninstalling software packages upon the user’s command. Typical functions of a package management system include:

  • Working with file archivers to extract package archives
  • Ensuring the integrity and authenticity of the package by verifying their checksums and digital certificates, respectively
  • Looking up, downloading, installing, or updating existing software from a software repository or app store
  • Grouping packages by function to reduce user confusion
  • Managing dependencies to ensure a package is installed with all packages it requires, thus avoiding “dependency hell

Challenges with shared libraries

Computer systems that rely on dynamic library linking, instead of static library linking, share executable libraries of machine instructions across packages and applications. In these systems, complex relationships between different packages requiring different versions of libraries results in a challenge colloquially known as “dependency hell“. On Microsoft Windows systems, this is also called “DLL hell” when working with dynamically linked libraries. Good package management is vital on these systems.[4] The Framework system from OPENSTEP was an attempt at solving this issue, by allowing multiple versions of libraries to be installed simultaneously, and for software packages to specify which version they were linked against.

Front-ends for locally compiled packages

System administrators may install and maintain software using tools other than package management software. For example, a local administrator may download unpackaged source code, compile it, and install it. This may cause the state of the local system to fall out of synchronization with the state of the package manager’s database. The local administrator will be required to take additional measures, such as manually managing some dependencies or integrating the changes into the package manager.

There are tools available to ensure that locally compiled packages are integrated with the package management. For distributions based on .deb and .rpm files as well as Slackware Linux, there is CheckInstall, and for recipe-based systems such as Gentoo Linux and hybrid systems such as Arch Linux, it is possible to write a recipe first, which then ensures that the package fits into the local package database.[citation needed]

Maintenance of configuration

Particularly troublesome with software upgrades are upgrades of configuration files. Since package managers, at least on Unix systems, originated as extensions of file archiving utilities, they can usually only either overwrite or retain configuration files, rather than applying rules to them. There are exceptions to this that usually apply to kernel configuration (which, if broken, will render the computer unusable after a restart). Problems can be caused if the format of configuration files changes; for instance, if the old configuration file does not explicitly disable new options that should be disabled. Some package managers, such as Debian‘s dpkg, allow configuration during installation. In other situations, it is desirable to install packages with the default configuration and then overwrite this configuration, for instance, in headless installations to a large number of computers. This kind of pre-configured installation is also supported by dpkg.

Repositories

To give users more control over the kinds of software that they are allowing to be installed on their system (and sometimes due to legal or convenience reasons on the distributors’ side), software is often downloaded from a number of software repositories.[5]

Upgrade suppression

When a user interacts with the package management software to bring about an upgrade, it is customary to present the user with the list of actions to be executed (usually the list of packages to be upgraded, and possibly giving the old and new version numbers), and allow the user to either accept the upgrade in bulk, or select individual packages for upgrades. Many package managers can be configured to never upgrade certain packages, or to upgrade them only when critical vulnerabilities or instabilities are found in the previous version, as defined by the packager of the software. This process is sometimes called version pinning.

For instance:

  • yum supports this with the syntax exclude=openoffice*[6]
  • pacman with IgnorePkg= openoffice[7] (to suppress upgrading openoffice in both cases)
  • dpkg and dselect support this partially through the hold flag in package selections
  • APT extends the hold flag through the complex “pinning” mechanism[8] (Users can also blacklist a package[9])
  • aptitude has “hold” and “forbid” flags
  • portage supports this through the package.mask configuration file

Cascading package removal

Some of the more advanced package management features offer “cascading package removal”,[7] in which all packages that depend on the target package and all packages that only the target package depends on, are also removed.

Comparison of commands

Although the commands are specific for every particular package manager, they are to a large extent translatable, as most package managers offer similar functions.

Actionzypper[10]pacmanaptdnf (yum)portage
install packagezypper in PKGpacman -S PACKAGEapt install PACKAGEdnf install PACKAGEemerge PACKAGE
remove packagezypper rm -RU PKGpacman -R PACKAGEapt remove PACKAGEdnf remove --nodeps PACKAGEemerge -C PACKAGE or
emerge --unmerge PACKAGE
remove package+orphanszypper rm -u --force-resolution PKGpacman -Rs PACKAGEapt autoremove PACKAGEdnf remove PACKAGEemerge -c PACKAGE or
emerge --depclean PACKAGE
update software databasezypper refpacman -Syapt updatednf check-updateemerge --sync
show updatable packageszypper lupacman -Quapt list --upgradablednf check-updateemerge -avtuDN --with-bdeps=y @world or
emerge --update --pretend @world
delete orphans+configzypper rm -upacman -Rsn $(pacman -Qdtq)apt autoremovednf erase PKGemerge --depclean
show orphanszypper pa --orphaned --unneededpacman -Qdtpackage-cleanup --quiet --leaves --exclude-binemerge -caD or
emerge --depclean --pretend
update allzypper uppacman -Syuapt upgradednf updateemerge --update --deep --with-bdeps=y @world

The Arch Linux Pacman/Rosetta wiki offers an extensive overview.[11]

Prevalence

Package managers like dpkg have existed as early as 1994.[12]

Linux distributions oriented to binary packages rely heavily on package management systems as their primary means of managing and maintaining software. Mobile operating systems such as Android (Linux-based), iOS (Unix-like), and Windows Phone rely almost exclusively on their respective vendors’ app stores and thus use their own dedicated package management systems.

Comparison with installers

A package manager is often called an “install manager”, which can lead to a confusion between package managers and installers. The differences include:This box: 

CriterionPackage managerInstaller
Shipped withUsually, the operating systemEach computer program
Location of installation informationOne central installation databaseIt is entirely at the discretion of the installer. It could be a file within the app’s folder, or among the operating system’s files and folders. At best, they may register themselves with an uninstallers list without exposing installation information.
Scope of maintenancePotentially all packages on the systemOnly the product with which it was bundled
Developed byOne package manager vendorMultiple installer vendors
Package formatA handful of well-known formatsThere could be as many formats as the number of apps
Package format compatibilityCan be consumed as long as the package manager supports it. Either newer versions of the package manager keep supporting it or the user does not upgrade the package manager.The installer is always compatible with its archive format, if it uses any. However, installers, like all computer programs, may be affected by software rot.

Comparison with build automation utility

Most software configuration management systems treat building software and deploying software as separate, independent steps. A build automation utility typically takes human-readable source code files already on a computer, and automates the process of converting them into a binary executable package on the same computer. Later a package manager typically running on some other computer downloads those pre-built binary executable packages over the internet and installs them.

However, both kinds of tools have many commonalities:

  • For example, the dependency graph topological sorting used in a package manager to handle dependencies between binary components is also used in a build manager to handle the dependency between source components.
  • For example, many makefiles support not only building executables, but also installing them with make install.
  • For example, every package manager for a source-based distribution – PortageSorceryHomebrew, etc. – supports converting human-readable source code to binary executables and installing it.

A few tools, such as Maak and A-A-P, are designed to handle both building and deployment, and can be used as either a build automation utility or as a package manager or both.[13]

Common package managers and formats

Universal package manager

Also known as binary repository manager, it is a software tool designed to optimize the download and storage of binary files, artifacts and packages used and produced in the software development process.[14] These package managers aim to standardize the way enterprises treat all package types. They give users the ability to apply security and compliance metrics across all artifact types. Universal package managers have been referred to as being at the center of a DevOps toolchain.[15]

Package formats

Main articles: Package format and File archive

Each package manager relies on the format and metadata of the packages it can manage. That is, package managers need groups of files to be bundled for the specific package manager along with appropriate metadata, such as dependencies. Often, a core set of utilities manages the basic installation from these packages and multiple package managers use these utilities to provide additional functionality.

For example, yum relies on rpm as a backend. Yum extends the functionality of the backend by adding features such as simple configuration for maintaining a network of systems. As another example, the Synaptic Package Manager provides a graphical user interface by using the Advanced Packaging Tool (apt) library, which, in turn, relies on dpkg for core functionality.

Alien is a program that converts between different Linux package formats, supporting conversion between Linux Standard Base (LSB) compliant .rpm packages, .deb, Stampede (.slp), Solaris (.pkg) and Slackware (.tgz.txz, .tbz, .tlz) packages.

In mobile operating systems, Google Play consumes Android application package (APK) package format while Windows Store uses APPX and XAP formats. (Both Google Play and Windows Store have eponymous package managers.)

Free and open source software systems

By the nature of free and open source software, packages under similar and compatible licenses are available for use on a number of operating systems. These packages can be combined and distributed using configurable and internally complex packaging systems to handle many permutations of software and manage version-specific dependencies and conflicts. Some packaging systems of free and open source software are also themselves released as free and open source software. One typical difference between package management in proprietary operating systems, such as Mac OS X and Windows, and those in free and open source software, such as Linux, is that free and open source software systems permit third-party packages to also be installed and upgraded through the same mechanism, whereas the package managers of Mac OS X and Windows will only upgrade software provided by Apple and Microsoft, respectively (with the exception of some third party drivers in Windows). The ability to continuously upgrade third party software is typically added by adding the URL of the corresponding repository to the package management’s configuration file.

Application-level package managers

See also: List of software package management systems § Application-level package managers

Beside the system-level application managers, there are some add-on package managers for operating systems with limited capabilities and for programming languages in which developers need the latest libraries.

In contrast to system-level package managers, application-level package managers focus on a small part of the software system. They typically reside within a directory tree that is not maintained by the system-level package manager, such as c:\cygwin or /usr/local/fink. However, this might not be the case for the package managers that deal with programming libraries, leading to a possible conflict as both package managers may claim to “own” a file and might break upgrades.

Impact

Ian Murdock had commented that package management is “the single biggest advancement Linux has brought to the industry”, that it blurs the boundaries between operating system and applications, and that it makes it “easier to push new innovations […] into the marketplace and […] evolve the OS”.[16]

See also

” (WP)

Sources:

Fair Use Sources: