Installation

C++ library

It is a header-only library. You need to ensure that the include directory is in your include path when compiling your program. For example:

git clone https://github.com/project-gemmi/gemmi.git
c++ -Igemmi/include -O2 my_program.cpp

If you want Gemmi to uncompress gzipped files on the fly (i.e. if you #include <gemmi/gz.hpp>) you will also need to link your program with the zlib library.

If a file name is passed to Gemmi (through std::string) it is assumed to be in ASCII or UTF-8.

Python module

From PyPI

To install the gemmi module do:

pip install gemmi

We have binary wheels for several Python versions (for all supported CPython versions and one PyPy version), so the command usually downloads binaries. If a matching wheel is not available, the module is compiled from source – it takes several minutes and requires a C++ compiler.

Other binaries

If you use the CCP4 suite, you can find gemmi there.

If you use Anaconda Python, you can install package conda from conda-forge:

conda install -c conda-forge gemmi

These distribution channels may have an older version of gemmi.

From git

The latest version can be installed directly from the repository. Either use:

pip install git+https://github.com/project-gemmi/gemmi.git

or clone the project (or download a zip file) and from the top-level directory do:

pip install .

On Windows Python should automatically find an appropriate compiler (MSVC). If the compiler is not installed, pip shows a message with a download link.

If gemmi is already installed, uninstall the old version first (pip uninstall) or add option --upgrade.

Setuptools compile only one unit at a time and the whole process takes several minutes. To make it faster, build in parallel with CMake. Clone the project and do:

cmake -D USE_PYTHON=1 .
make -j4 py

Fortran and C bindings

The Fortran bindings are in early stage and are not documented yet. They use the ISO_C_BINDING module introduced in Fortran 2003. You may see the fortran/ directory to know what to expect. The bindings and usage examples can be compiled with CMake:

cmake -D USE_FORTRAN=1 .
make

The C bindings are used only for making Fortran bindings, but they should be usable on their own. If you use cmake to build the project you get a static library libcgemmi.a that can be used from C, together with the fortran/*.h headers.

Program

The library comes with a command-line program also named gemmi.

Binaries

Binaries are distributed with the CCP4 suite and with Global Phasing software. They are also included in conda-forge packages.

The very latest builds (as well as a little older ones) can be downloaded from CI jobs:

  • for Windows – click the first (green) job in AppVeyor CI and find gemmi.exe in the Artifacts tab,

  • for Linux and Mac – sign in to GitHub (no special permissions are needed, but GitHub requires sign-in for artifacts), click the first job (with ✅) in GitHub Actions and download a zip file from the Artifacts section.

From source

To build it from source, first make sure you have git, cmake and C++ compiler installed (on Ubuntu: sudo apt install git cmake make g++), then:

git clone https://github.com/project-gemmi/gemmi.git
cd gemmi
cmake .
make

Testing

The main automated tests are in Python:

python3 -m unittest discover -v tests/

We also have doctest tests in the documentation, and some others. All of them can be run from the run-tests.sh script in the repository.

Credits

This project is using code from a number of third-party open-source projects.

Projects used in the C++ library and included under include/gemmi/third_party/:

  • PEGTL – library for creating PEG parsers. License: MIT.

  • sajson – high-performance JSON parser. License: MIT.

  • PocketFFT – FFT library. License: 3-clause BSD.

  • stb_sprintf – locale-independent snprintf() implementation. License: Public Domain.

  • fast_float – locale-independent number parsing. License: Apache 2.0.

  • tinydir – directory (filesystem) reader. License: 2-clause BSD.

Code derived from the following projects is used in the library:

  • ksw2 – sequence alignment in seqalign.hpp is based on the ksw_gg function from ksw2. License: MIT.

  • QCProt – superposition method in qcp.hpp is taken from QCProt and adapted to our project. License: BSD.

  • Larch – calculation of f’ and f” in fprime.hpp is based on CromerLiberman code from Larch. License: 2-clause BSD.

Projects included under third_party/, not used in the library itself, but used in command-line utilities, python bindings or tests:

  • The Lean Mean C++ Option Parser – command-line option parser. License: MIT.

  • doctest – testing framework. License: MIT.

  • linalg.h – linear algebra library. License: Public Domain.

  • zlib – a subset of the zlib library for uncompressing gz files, used as a fallback when the zlib library is not found in the system. License: zlib.

Not distributed with Gemmi:

  • pybind11 – used for creating Python bindings. License: 3-clause BSD.

  • cctbx – used in tests and in scripts that generated space group data and 2-fold twinning operations. License: 3-clause BSD.

Email me if I forgot about something.

List of C++ headers

Here is a list of C++ headers in gemmi/include/. This list also gives an overview of the library.

gemmi/addends.hpp

Addends to scattering form factors used in DensityCalculator and in StructureFactorCalculator.

gemmi/align.hpp

Sequence alignment, label_seq_id assignment, structure superposition.

gemmi/assembly.hpp

Generating biological assemblies by applying operations from struct Assembly to a Model. Includes chain (re)naming utilities.

gemmi/asudata.hpp

AsuData for storing reflection data.

gemmi/asumask.hpp

AsuBrick and MaskedGrid that is used primarily as direct-space asu mask.

gemmi/atof.hpp

Functions that convert string to floating-point number ignoring locale. Simple wrappers around fastfloat::from_chars().

gemmi/atox.hpp

Locale-independent functions that convert string to integer, equivalents of standard isspace and isdigit, and a few helper functions.

gemmi/bessel.hpp

Functions derived from modified Bessel functions I1(x) and I0(x).

gemmi/binner.hpp

Binning - resolution shells for reflections.

gemmi/blob.hpp

Finding maxima or “blobs” in a Grid (map). Similar to CCP4 PEAKMAX and COOT’s “Unmodelled blobs”.

gemmi/c4322.hpp

Electron scattering factor coefficients from the International Tables.

gemmi/calculate.hpp

Calculate various properties of the model.

gemmi/ccp4.hpp

CCP4 format for maps and masks.

gemmi/cellred.hpp

Unit cell reductions: Buerger, Niggli, Selling-Delaunay.

gemmi/chemcomp.hpp

ChemComp - chemical component that represents a monomer from Refmac monomer library, or from PDB CCD.

gemmi/chemcomp_xyz.hpp

Reading coordinates from chemical component or Refmac monomer library files.

gemmi/cif.hpp

CIF parser (based on PEGTL) with pluggable actions, and a set of actions that prepare Document.

gemmi/cif2mtz.hpp

A class for converting SF-mmCIF to MTZ (merged or unmerged).

gemmi/cifdoc.hpp

struct Document that represents the CIF file (but can be also read from JSON file, such as CIF-JSON or mmJSON).

gemmi/contact.hpp

Contact search, based on NeighborSearch from neighbor.hpp.

gemmi/crd.hpp

Generate Refmac intermediate (prepared) files crd and rst

gemmi/dencalc.hpp

Tools to prepare a grid with values of electron density of a model.

gemmi/dirwalk.hpp

Classes for iterating files in a directory tree, top-down, in an alphabetical order. It wraps the tinydir library (as we cannot depend on C++17 <filesystem> yet).

gemmi/eig3.hpp

Eigen decomposition code for symmetric 3x3 matrices.

gemmi/elem.hpp

Elements from the periodic table.

gemmi/enumstr.hpp

Converts between enums (EntityType, PolymerType, Connection::Type, SoftwareItem::Classification) and mmCIF strings.

gemmi/fail.hpp

fail() and unreachable()

gemmi/fileutil.hpp

File-related utilities.

gemmi/floodfill.hpp

The flood fill (scanline fill) algorithm for Grid. Assumes periodic boundary conditions in the grid and 6-way connectivity.

gemmi/formfact.hpp

Calculation of atomic form factors approximated by a sum of Gaussians. Tables with numeric coefficient are in it92.hpp and c4322.hpp.

gemmi/fourier.hpp

Fourier transform applied to map coefficients.

gemmi/fprime.hpp

C++ implementation of Cromer-Liberman calculation of anomalous scattering factors, with corrections from Kissel & Pratt, Acta Cryst. A46, 170 (1990). Single header. No dependencies.

gemmi/fstream.hpp

Ofstream and Ifstream: wrappers around std::ofstream and std::ifstream.

gemmi/grid.hpp

3d grids used by CCP4 maps, cell-method search and hkl data.

gemmi/gz.hpp

Functions for transparent reading of gzipped files. Uses zlib.

gemmi/input.hpp

Input abstraction. Used to decouple file reading and uncompression.

gemmi/interop.hpp

Interoperability between Model (MX) and SmallStructure (SX).

gemmi/it92.hpp

X-ray scattering factor coefficients from International Tables for Crystallography Volume C, edition from 1992 or later.

gemmi/iterator.hpp

Bidirectional iterators (over elements of any container) that can filter, uniquify, group, or iterate with a stride.

gemmi/json.hpp

Reading CIF-JSON (COMCIFS) and mmJSON (PDBj) formats into cif::Document.

gemmi/levmar.hpp

Least-squares fitting - Levenberg-Marquardt method.

gemmi/linkhunt.hpp

Searching for links based on the _chem_link table from monomer dictionary.

gemmi/math.hpp

Math utilities. 3D linear algebra.

gemmi/merge.hpp

Class Intensities that reads multi-record data from MTZ, mmCIF or XDS_ASCII and merges it into mean or anomalous intensities. It can also read merged data.

gemmi/metadata.hpp

Metadata from coordinate files.

gemmi/mmcif.hpp

Read mmcif (PDBx/mmCIF) file into a Structure from model.hpp.

gemmi/mmcif_impl.hpp

Function used in both mmcif.hpp and refln.hpp (for coordinate and reflection mmCIF files).

gemmi/mmdb.hpp

Converts between gemmi::Structure and mmdb::Manager.

gemmi/mmread.hpp

Read any supported coordinate file.

gemmi/model.hpp

Data structures to keep macromolecular structure model.

gemmi/modify.hpp

Modify various properties of the model.

gemmi/monlib.hpp

Monomer library - (Refmac) restraints dictionary, which is made of monomers (chemical components), links and modifications.

gemmi/mtz.hpp

MTZ reflection file format.

gemmi/mtz2cif.hpp

A class for converting MTZ (merged or unmerged) to SF-mmCIF

gemmi/neighbor.hpp

Cell-linked lists method for atom searching (a.k.a. grid search, binning, bucketing, cell technique for neighbor search, etc).

gemmi/neutron92.hpp

Neutron coherent scattering lengths of the elements, from Neutron News, Vol. 3, No. 3, 1992.

gemmi/numb.hpp

Utilities for parsing CIF numbers (the CIF spec calls it ‘numb’).

gemmi/pdb.hpp

Read PDB file format and store it in Structure.

gemmi/pirfasta.hpp

Read sequence from PIR or FASTA format.

gemmi/placeh.hpp

Place hydrogens according to bond lengths and angles from monomer library.

gemmi/polyheur.hpp

Heuristic methods for working with chains and polymers. Includes also a few well-defined functions, such as removal of waters.

gemmi/qcp.hpp

Structural superposition, the QCP method.

gemmi/read_cif.hpp

Functions for reading possibly gzipped CIF files. Trivial wrappers that can make compilation faster.

gemmi/read_coor.hpp

Functions for reading possibly gzipped coordinate files. Trivial wrappers that can make compilation faster.

gemmi/read_map.hpp

Functions for reading possibly gzipped CCP4 map files. Trivial wrappers that can make compilation faster.

gemmi/recgrid.hpp

ReciprocalGrid – grid for reciprocal space data.

gemmi/reciproc.hpp

Reciprocal space helper functions.

gemmi/refln.hpp

Reads reflection data from the mmCIF format.

gemmi/reindex.hpp

Reindex merged or unmerged MTZ file.

gemmi/remarks.hpp

Function read_metadata_from_remarks() that interprets REMARK 3 and REMARK 200/230/240 filling in Metadata.

gemmi/resinfo.hpp

List of common residues with basic data.

gemmi/scaling.hpp

Anisotropic scaling of data (includes scaling of bulk solvent parameters)

gemmi/select.hpp

Selections.

gemmi/seqalign.hpp

Simple pairwise sequence alignment.

gemmi/seqid.hpp

SeqId – residue number and insertion code together.

gemmi/sfcalc.hpp

Direct calculation of structure factors.

gemmi/small.hpp

Representation of small molecule or inorganic crystal. Flat list of atom sites. Minimal functionality.

gemmi/smcif.hpp

Read small molecule CIF file into SmallStructure (from small.hpp).

gemmi/solmask.hpp

Flat bulk solvent mask. With helper tools that modify data on grid.

gemmi/span.hpp

Span - span of array or std::vector. MutableVectorSpan - span of std::vector with insert() and erase()

gemmi/sprintf.hpp

to_str(float|double), gf_snprintf - wrappers around stb_sprintf.

gemmi/symmetry.hpp

Crystallographic Symmetry. Space Groups. Coordinate Triplets.

gemmi/to_chemcomp.hpp

Create cif::Block with monomer library _chem_comp* categories from struct ChemComp.

gemmi/to_cif.hpp

Writing cif::Document or its parts to std::ostream.

gemmi/to_json.hpp

Writing cif::Document or its parts as JSON (mmJSON, CIF-JSON, etc).

gemmi/to_mmcif.hpp

Create cif::Document (for PDBx/mmCIF file) from Structure.

gemmi/to_pdb.hpp

Writing PDB file format (Structure -> pdb file).

gemmi/topo.hpp

Topo(logy) - restraints (from a monomer library) applied to a model.

gemmi/twin.hpp

Twinning laws.

gemmi/unitcell.hpp

Unit cell.

gemmi/utf.hpp

Conversion between UTF-8 and wchar. Used only for file names on Windows.

gemmi/util.hpp

Utilities. Mostly for working with strings and vectors.

gemmi/version.hpp

Version number.

gemmi/xds_ascii.hpp

Read XDS_ASCII.HKL. For now, only unmerged files are read.