Installation

C++ library

Before version 0.6 gemmi was a header-only library. Some functions are still in headers. If you use only such function, you only need to ensure that the include directory is in your include path when compiling your program. For example:

git clone https://github.com/project-gemmi/gemmi.git
c++ -Igemmi/include -O2 my_program.cpp

Otherwise, you either need to build gemmi_cpp library, or add (selected) files from src/ to your project.

If you use CMake, you may

  • use find_package for installed gemmi:

    find_package(gemmi 0.6.4 CONFIG REQUIRED)
    
  • or add gemmi as a git submodule and use add_subdirectory:

    add_subdirectory(gemmi EXCLUDE_FROM_ALL)
    
  • or use FetchContent:

    add_subdirectory(gemmi EXCLUDE_FROM_ALL)
    include(FetchContent)
    FetchContent_Declare(
      gemmi
      GIT_REPOSITORY https://github.com/project-gemmi/gemmi.git
      GIT_TAG        ...
    )
    FetchContent_GetProperties(gemmi)
    if (NOT gemmi_POPULATED)
      FetchContent_Populate(gemmi)
      add_subdirectory(${gemmi_SOURCE_DIR} ${gemmi_BINARY_DIR} EXCLUDE_FROM_ALL)
    endif()
    

Then, to find headers and link your target with the library, use:

target_link_libraries(example PRIVATE gemmi::gemmi_cpp)

If only headers are needed, do:

target_link_libraries(example PRIVATE gemmi::headers)

The gemmi::headers interface, which is also included in gemmi::gemmi_cpp, adds two things: include dictory and compile feature cxx_std_11 (a minimal requirement for the compilation).

Gemmi can be compiled with either zlib or zlib-ng. The only difference is that zlib-ng is faster. Here are the relevant cmake options:

  • FETCH_ZLIB_NG – download, build statically, and use zlib-ng.

  • USE_ZLIB_NG – find zlib-ng installed on the system.

  • INTERNAL_ZLIB – compile third_party/zlib (a subset of zlib distributed with gemmi).

  • None of the above – find zlib installed on the system; if not found, use third_party/zlib.

On Windows, when a program or library is linked with zlib(-ng) DLL, it may require the DLL to be in the same directory. It is simpler to build zlib-ng statically or use -D FETCH_ZLIB_NG=ON.


Note on Unicode: if a file name is passed to Gemmi (through std::string) it is assumed to be in ASCII or UTF-8.

Python module

From PyPI

To install the gemmi module do:

pip install gemmi

We have binary wheels for several Python versions (for all supported CPython versions and one PyPy version), so the command usually downloads binaries. If a matching wheel is not available, the module is compiled from source – it takes several minutes and requires a C++ compiler.

Other binaries

If you use the CCP4 suite, you can find gemmi there.

If you use Anaconda Python, you can install package conda from conda-forge:

conda install -c conda-forge gemmi

These distribution channels may have an older version of gemmi.

From git

The latest version can be installed directly from the repository. Either use:

pip install git+https://github.com/project-gemmi/gemmi.git

or clone the project (or download a zip file) and from the top-level directory do:

pip install .

On Windows, Python should automatically find an appropriate compiler (MSVC). If the compiler is not installed, pip shows a message with a download link.

Building with pip uses scikit-build-core and CMake underneath. You might pass options to CMake either as the --config-settings option of pip (in recent pip versions only):

pip install . --config-settings="cmake.args=-DFETCH_ZLIB_NG=ON"

or using environment variables such as CMAKE_ARGS. See scikit-build-core docs for details.

If gemmi is already installed, uninstall the old version first (pip uninstall) or add option --upgrade.

Alternatively, you can build a cloned project directly with CMake:

cmake -D USE_PYTHON=1 .
make -j4 py

Fortran and C bindings

The Fortran bindings are in early stage and are not documented yet. They use the ISO_C_BINDING module introduced in Fortran 2003 and shroud. You may see the fortran/ directory to know what to expect. This directory contains Makefile – run make to built the bindings. (They are currently not integrated with the cmake build.)

The C bindings are used only for making Fortran bindings, but they should be usable on their own.

Program

The library comes with a command-line program also named gemmi.

Binaries

Binaries are distributed with the CCP4 suite and with Global Phasing software. They are also in PyPI (pip install gemmi-program) and conda-forge packages.

The very latest builds (as well as a little older ones) can be downloaded from CI jobs:

  • for Windows – click the first (green) job in AppVeyor CI and find gemmi.exe in the Artifacts tab,

  • for Linux and Mac – sign in to GitHub (no special permissions are needed, but GitHub requires sign-in for artifacts), click the first job (with ✅) in GitHub Actions and download a zip file from the Artifacts section.

From source

To build it from source, first make sure you have git, cmake and C++ compiler installed (on Ubuntu: sudo apt install git cmake make g++), then:

git clone https://github.com/project-gemmi/gemmi.git
cd gemmi
cmake .
make

Testing

The main automated tests are in Python:

python3 -m unittest discover -v tests/

We also have doctest tests in the documentation, and some others. All of them can be run from the run-tests.sh script in the repository.

Credits

This project is using code from a number of third-party open-source projects.

Projects used in the C++ library, included under include/gemmi/third_party/ (if used in headers) or third_party/:

  • PEGTL – library for creating PEG parsers. License: MIT.

  • sajson – high-performance JSON parser. License: MIT.

  • PocketFFT – FFT library. License: 3-clause BSD.

  • stb_sprintf – locale-independent snprintf() implementation. License: Public Domain.

  • fast_float – locale-independent number parsing. License: Apache 2.0.

  • tinydir – directory (filesystem) reader. License: 2-clause BSD.

Code derived from the following projects is used in the library:

  • ksw2 – sequence alignment in seqalign.hpp is based on the ksw_gg function from ksw2. License: MIT.

  • QCProt – superposition method in qcp.hpp is taken from QCProt and adapted to our project. License: BSD.

  • Larch – calculation of f’ and f” in fprime.hpp is based on CromerLiberman code from Larch. License: 2-clause BSD.

Projects included under third_party/ that are not used in the library itself, but are used in command-line utilities, python bindings or tests:

  • The Lean Mean C++ Option Parser – command-line option parser. License: MIT.

  • doctest – testing framework. License: MIT.

  • linalg.h – linear algebra library. License: Public Domain.

  • zlib – a subset of the zlib library for decompressing gz files, used as a fallback when the zlib library is not found in the system. License: zlib.

Not distributed with Gemmi:

  • pybind11 – used for creating Python bindings. License: 3-clause BSD.

  • zlib-ng – optional, can be used instead of zlib for faster reading of gzipped files.

  • cctbx – used in tests (if cctbx is not present, these tests are skipped) and in scripts that generated space group data and 2-fold twinning operations. License: 3-clause BSD.

Email me if I forgot about something.

List of C++ headers

Here is a list of C++ headers in gemmi/include/. This list also gives an overview of the library.

gemmi/addends.hpp

Addends to scattering form factors used in DensityCalculator and in StructureFactorCalculator.

gemmi/align.hpp

Sequence alignment, label_seq_id assignment, structure superposition.

gemmi/assembly.hpp

Generating biological assemblies by applying operations from struct Assembly to a Model. Includes chain (re)naming utilities.

gemmi/asudata.hpp

AsuData for storing reflection data.

gemmi/asumask.hpp

AsuBrick and MaskedGrid that is used primarily as direct-space asu mask.

gemmi/atof.hpp

Functions that convert string to floating-point number ignoring locale. Simple wrappers around fastfloat::from_chars().

gemmi/atox.hpp

Locale-independent functions that convert string to integer, equivalents of standard isspace and isdigit, and a few helper functions.

gemmi/bessel.hpp

Functions derived from modified Bessel functions I1(x) and I0(x).

gemmi/binner.hpp

Binning - resolution shells for reflections.

gemmi/blob.hpp

Finding maxima or “blobs” in a Grid (map). Similar to CCP4 PEAKMAX and COOT’s “Unmodelled blobs”.

gemmi/bond_idx.hpp

BondIndex: for checking which atoms are bonded, calculating graph distance.

gemmi/c4322.hpp

Electron scattering factor coefficients from the International Tables.

gemmi/calculate.hpp

Calculate various properties of the model.

gemmi/ccp4.hpp

CCP4 format for maps and masks.

gemmi/cellred.hpp

Unit cell reductions: Buerger, Niggli, Selling-Delaunay.

gemmi/chemcomp.hpp

ChemComp - chemical component that represents a monomer from Refmac monomer library, or from PDB CCD.

gemmi/chemcomp_xyz.hpp

Reading coordinates from chemical component or Refmac monomer library files.

gemmi/cif.hpp

CIF parser (based on PEGTL) with pluggable actions, and a set of actions that prepare Document.

gemmi/cif2mtz.hpp

A class for converting SF-mmCIF to MTZ (merged or unmerged).

gemmi/cifdoc.hpp

struct Document that represents the CIF file (but can be also read from JSON file, such as CIF-JSON or mmJSON).

gemmi/contact.hpp

Contact search, based on NeighborSearch from neighbor.hpp.

gemmi/crd.hpp

Generate Refmac intermediate (prepared) files crd and rst

gemmi/ddl.hpp

Using DDL1/DDL2 dictionaries to validate CIF/mmCIF files.

gemmi/dencalc.hpp

Tools to prepare a grid with values of electron density of a model.

gemmi/dirwalk.hpp

Classes for iterating files in a directory tree, top-down, in an alphabetical order. It wraps the tinydir library (as we cannot depend on C++17 <filesystem> yet).

gemmi/ecalc.hpp

Normalization of amplitudes F->E (“Karle” approach, similar to CCP4 ECALC).

gemmi/eig3.hpp

Eigen decomposition code for symmetric 3x3 matrices.

gemmi/elem.hpp

Elements from the periodic table.

gemmi/enumstr.hpp

Converts between enums (EntityType, PolymerType, Connection::Type, SoftwareItem::Classification) and mmCIF strings.

gemmi/fail.hpp

fail(), unreachable() and __declspec/__attribute__ macros

gemmi/fileutil.hpp

File-related utilities.

gemmi/floodfill.hpp

The flood fill (scanline fill) algorithm for Grid. Assumes periodic boundary conditions in the grid and 6-way connectivity.

gemmi/formfact.hpp

Calculation of atomic form factors approximated by a sum of Gaussians. Tables with numeric coefficient are in it92.hpp and c4322.hpp.

gemmi/fourier.hpp

Fourier transform applied to map coefficients.

gemmi/fprime.hpp

C++ implementation of Cromer-Liberman calculation of anomalous scattering factors, with corrections from Kissel & Pratt, Acta Cryst. A46, 170 (1990). Single header. No dependencies.

gemmi/fstream.hpp

Ofstream and Ifstream: wrappers around std::ofstream and std::ifstream.

gemmi/grid.hpp

3d grids used by CCP4 maps, cell-method search and hkl data.

gemmi/gz.hpp

Functions for transparent reading of gzipped files. Uses zlib.

gemmi/input.hpp

Input abstraction. Used to decouple file reading and uncompression.

gemmi/intensit.hpp

Class Intensities that reads multi-record data from MTZ, mmCIF or XDS_ASCII and merges it into mean or anomalous intensities. It can also read merged data.

gemmi/interop.hpp

Interoperability between Model (MX) and SmallStructure (SX).

gemmi/it92.hpp

X-ray scattering factor coefficients from International Tables for Crystallography Volume C, edition from 1992 or later.

gemmi/iterator.hpp

Bidirectional iterators (over elements of any container) that can filter, uniquify, group, or iterate with a stride.

gemmi/json.hpp

Reading CIF-JSON (COMCIFS) and mmJSON (PDBj) formats into cif::Document.

gemmi/levmar.hpp

Least-squares fitting - Levenberg-Marquardt method.

gemmi/linkhunt.hpp

Searching for links based on the _chem_link table from monomer dictionary.

gemmi/math.hpp

Math utilities. 3D linear algebra.

gemmi/metadata.hpp

Metadata from coordinate files.

gemmi/mmcif.hpp

Read mmcif (PDBx/mmCIF) file into a Structure from model.hpp.

gemmi/mmcif_impl.hpp

Function used in both mmcif.hpp and refln.hpp (for coordinate and reflection mmCIF files).

gemmi/mmdb.hpp

Converts between gemmi::Structure and mmdb::Manager.

gemmi/mmread.hpp

Read any supported coordinate file.

gemmi/mmread_gz.hpp

Functions for reading possibly gzipped coordinate files. Trivial wrappers that can make compilation faster by having a separate implementation file src/mmread_gz.cpp.

gemmi/model.hpp

Data structures to keep macromolecular structure model.

gemmi/modify.hpp

Modify various properties of the model.

gemmi/monlib.hpp

Monomer library - (Refmac) restraints dictionary, which is made of monomers (chemical components), links and modifications.

gemmi/mtz.hpp

MTZ reflection file format.

gemmi/mtz2cif.hpp

A class for converting MTZ (merged or unmerged) to SF-mmCIF

gemmi/neighbor.hpp

Cell-linked lists method for atom searching (a.k.a. grid search, binning, bucketing, cell technique for neighbor search, etc).

gemmi/neutron92.hpp

Neutron coherent scattering lengths of the elements, from Neutron News, Vol. 3, No. 3, 1992.

gemmi/numb.hpp

Utilities for parsing CIF numbers (the CIF spec calls it ‘numb’).

gemmi/pdb.hpp

Read PDB file format and store it in Structure.

gemmi/pdb_id.hpp

handling PDB ID and $PDB_DIR: is_pdb_code(), expand_pdb_code_to_path()

gemmi/pirfasta.hpp

Read sequence from PIR or (multi-)FASTA format.

gemmi/polyheur.hpp

Heuristic methods for working with chains and polymers. Includes also a few well-defined functions, such as removal of waters.

gemmi/qcp.hpp

Structural superposition, the QCP method.

gemmi/read_cif.hpp

Functions for reading possibly gzipped CIF files. Trivial wrappers that can make compilation faster by having a separate implementation file src/read_cif.cpp.

gemmi/read_map.hpp

Functions for reading possibly gzipped CCP4 map files. Trivial wrappers that can make compilation faster.

gemmi/recgrid.hpp

ReciprocalGrid – grid for reciprocal space data.

gemmi/reciproc.hpp

Reciprocal space helper functions.

gemmi/refln.hpp

Reads reflection data from the mmCIF format.

gemmi/remarks.hpp

Function read_metadata_from_remarks() that interprets REMARK 3 and REMARK 200/230/240 filling in Metadata.

gemmi/resinfo.hpp

List of common residues with basic data.

gemmi/riding_h.hpp

Place hydrogens according to bond lengths and angles from monomer library.

gemmi/scaling.hpp

Anisotropic scaling of data (includes scaling of bulk solvent parameters)

gemmi/select.hpp

Selections.

gemmi/seqalign.hpp

Simple pairwise sequence alignment.

gemmi/seqid.hpp

SeqId – residue number and insertion code together.

gemmi/seqtools.hpp

Functions for working with sequences (other than alignment).

gemmi/serialize.hpp

Binary serialization for Structure (as well as Model, UnitCell, etc)

gemmi/sfcalc.hpp

Direct calculation of structure factors.

gemmi/small.hpp

Representation of small molecule or inorganic crystal. Flat list of atom sites. Minimal functionality.

gemmi/smcif.hpp

Read small molecule CIF file into SmallStructure (from small.hpp).

gemmi/solmask.hpp

Flat bulk solvent mask. With helper tools that modify data on grid.

gemmi/span.hpp

Span - span of array or std::vector. MutableVectorSpan - span of std::vector with insert() and erase()

gemmi/sprintf.hpp

interface to stb_sprintf: snprintf_z, to_str(float|double)

gemmi/stats.hpp

Statistics utilities: classes Covariance, Correlation, DataStats

gemmi/symmetry.hpp

Crystallographic Symmetry. Space Groups. Coordinate Triplets.

gemmi/to_chemcomp.hpp

Create cif::Block with monomer library _chem_comp* categories from struct ChemComp.

gemmi/to_cif.hpp

Writing cif::Document or its parts to std::ostream.

gemmi/to_json.hpp

Writing cif::Document or its parts as JSON (mmJSON, CIF-JSON, etc).

gemmi/to_mmcif.hpp

Create cif::Document (for PDBx/mmCIF file) from Structure.

gemmi/to_pdb.hpp

Writing PDB file format (Structure -> pdb file).

gemmi/topo.hpp

Topo(logy) - restraints (from a monomer library) applied to a model.

gemmi/twin.hpp

Twinning laws.

gemmi/unitcell.hpp

Unit cell.

gemmi/utf.hpp

Conversion between UTF-8 and wchar. Used only for file names on Windows.

gemmi/util.hpp

Utilities. Mostly for working with strings and vectors.

gemmi/version.hpp

Version number.

gemmi/xds_ascii.hpp

Read unmerged XDS files: XDS_ASCII.HKL and INTEGRATE.HKL.