It is a header-only library. You need to ensure that
include directory is in your include path
when compiling your program. For example:
git clone https://github.com/project-gemmi/gemmi.git c++ -Igemmi/include -O2 my_program.cpp
If you want Gemmi to uncompress gzipped files on the fly
(i.e. if you
you will also need to link your program with the zlib library.
If a file name is passed to Gemmi (through
it is assumed to be in ASCII or UTF-8.
To install the gemmi module do:
pip install gemmi
We have binary wheels for several Python versions (for all supported CPython versions and one PyPy version), so the command usually downloads binaries. If a matching wheel is not available, the module is compiled from source – it takes several minutes and requires a C++ compiler.
If you use the CCP4 suite, you can find gemmi there.
If you use Anaconda Python, you can install package conda from conda-forge:
conda install -c conda-forge gemmi
These distribution channels may have an older version of gemmi.
The latest version can be installed directly from the repository. Either use:
pip install git+https://github.com/project-gemmi/gemmi.git
or clone the project (or download a zip file) and from the top-level directory do:
pip install .
On Windows Python should automatically find an appropriate compiler (MSVC). If the compiler is not installed, pip shows a message with a download link.
If gemmi is already installed, uninstall the old version first
pip uninstall) or add option
Setuptools compile only one unit at a time and the whole process takes several minutes. To make it faster, build in parallel with CMake. Clone the project and do:
cmake -D USE_PYTHON=1 . make -j4 py
Fortran and C bindings¶
The Fortran bindings are in early stage and are not documented yet.
They use the ISO_C_BINDING module introduced in Fortran 2003.
You may see the
fortran/ directory to know what to expect.
The bindings and usage examples can be compiled with CMake:
cmake -D USE_FORTRAN=1 . make
The C bindings are used only for making Fortran bindings,
but they should be usable on their own.
If you use cmake to build the project
you get a static library
libcgemmi.a that can be used from C,
together with the
The library comes with a command-line program also named
Binaries are distributed with the CCP4 suite and with Global Phasing software. They are also included in conda-forge packages.
The very latest builds (as well as a little older ones) can be downloaded from CI jobs:
for Windows – click the first (green) job in AppVeyor CI and find gemmi.exe in the Artifacts tab,
for Linux and Mac – sign in to GitHub (no special permissions are needed, but GitHub requires sign-in for artifacts), click the first job (with ✅) in GitHub Actions and download a zip file from the Artifacts section.
To build it from source, first make sure you have git, cmake and C++ compiler
installed (on Ubuntu:
sudo apt install git cmake make g++), then:
git clone https://github.com/project-gemmi/gemmi.git cd gemmi cmake . make
The main automated tests are in Python:
python3 -m unittest discover -v tests/
We also have doctest tests in the documentation, and some others.
All of them can be run from the
run-tests.sh script in the repository.
This project is using code from a number of third-party open-source projects.
Projects used in the C++ library and included under
PEGTL – library for creating PEG parsers. License: MIT.
sajson – high-performance JSON parser. License: MIT.
PocketFFT – FFT library. License: 3-clause BSD.
stb_sprintf – locale-independent snprintf() implementation. License: Public Domain.
fast_float – locale-independent number parsing. License: Apache 2.0.
tinydir – directory (filesystem) reader. License: 2-clause BSD.
Code derived from the following projects is used in the library:
ksw2 – sequence alignment in
seqalign.hppis based on the ksw_gg function from ksw2. License: MIT.
QCProt – superposition method in
qcp.hppis taken from QCProt and adapted to our project. License: BSD.
Larch – calculation of f’ and f” in
fprime.hppis based on CromerLiberman code from Larch. License: 2-clause BSD.
Projects included under
third_party/, not used in the library itself,
but used in command-line utilities, python bindings or tests:
The Lean Mean C++ Option Parser – command-line option parser. License: MIT.
doctest – testing framework. License: MIT.
linalg.h – linear algebra library. License: Public Domain.
zlib – a subset of the zlib library for uncompressing gz files, used as a fallback when the zlib library is not found in the system. License: zlib.
Not distributed with Gemmi:
pybind11 – used for creating Python bindings. License: 3-clause BSD.
cctbx – used in tests and in scripts that generated space group data and 2-fold twinning operations. License: 3-clause BSD.
Email me if I forgot about something.
List of C++ headers¶
Here is a list of C++ headers in
This list also gives an overview of the library.
Addends to scattering form factors used in DensityCalculator and in StructureFactorCalculator.
Sequence alignment, label_seq_id assignment, structure superposition.
Generating biological assemblies by applying operations from struct Assembly to a Model. Includes chain (re)naming utilities.
AsuData for storing reflection data.
AsuBrick and MaskedGrid that is used primarily as direct-space asu mask.
Functions that convert string to floating-point number ignoring locale. Simple wrappers around fastfloat::from_chars().
Locale-independent functions that convert string to integer, equivalents of standard isspace and isdigit, and a few helper functions.
Functions derived from modified Bessel functions I1(x) and I0(x).
Binning - resolution shells for reflections.
Finding maxima or “blobs” in a Grid (map). Similar to CCP4 PEAKMAX and COOT’s “Unmodelled blobs”.
Electron scattering factor coefficients from the International Tables.
Calculate various properties of the model.
CCP4 format for maps and masks.
Unit cell reductions: Buerger, Niggli, Selling-Delaunay.
ChemComp - chemical component that represents a monomer from Refmac monomer library, or from PDB CCD.
Reading coordinates from chemical component or Refmac monomer library files.
CIF parser (based on PEGTL) with pluggable actions, and a set of actions that prepare Document.
A class for converting SF-mmCIF to MTZ (merged or unmerged).
struct Document that represents the CIF file (but can be also read from JSON file, such as CIF-JSON or mmJSON).
Contact search, based on NeighborSearch from neighbor.hpp.
Generate Refmac intermediate (prepared) files crd and rst
Tools to prepare a grid with values of electron density of a model.
Classes for iterating files in a directory tree, top-down, in an alphabetical order. It wraps the tinydir library (as we cannot depend on C++17 <filesystem> yet).
Eigen decomposition code for symmetric 3x3 matrices.
Elements from the periodic table.
Converts between enums (EntityType, PolymerType, Connection::Type, SoftwareItem::Classification) and mmCIF strings.
fail() and unreachable()
The flood fill (scanline fill) algorithm for Grid. Assumes periodic boundary conditions in the grid and 6-way connectivity.
Calculation of atomic form factors approximated by a sum of Gaussians. Tables with numeric coefficient are in it92.hpp and c4322.hpp.
Fourier transform applied to map coefficients.
C++ implementation of Cromer-Liberman calculation of anomalous scattering factors, with corrections from Kissel & Pratt, Acta Cryst. A46, 170 (1990). Single header. No dependencies.
Ofstream and Ifstream: wrappers around std::ofstream and std::ifstream.
3d grids used by CCP4 maps, cell-method search and hkl data.
Functions for transparent reading of gzipped files. Uses zlib.
Input abstraction. Used to decouple file reading and uncompression.
Interoperability between Model (MX) and SmallStructure (SX).
X-ray scattering factor coefficients from International Tables for Crystallography Volume C, edition from 1992 or later.
Bidirectional iterators (over elements of any container) that can filter, uniquify, group, or iterate with a stride.
Reading CIF-JSON (COMCIFS) and mmJSON (PDBj) formats into cif::Document.
Least-squares fitting - Levenberg-Marquardt method.
Searching for links based on the _chem_link table from monomer dictionary.
Math utilities. 3D linear algebra.
Class Intensities that reads multi-record data from MTZ, mmCIF or XDS_ASCII and merges it into mean or anomalous intensities. It can also read merged data.
Metadata from coordinate files.
Read mmcif (PDBx/mmCIF) file into a Structure from model.hpp.
Function used in both mmcif.hpp and refln.hpp (for coordinate and reflection mmCIF files).
Converts between gemmi::Structure and mmdb::Manager.
Read any supported coordinate file.
Data structures to keep macromolecular structure model.
Modify various properties of the model.
Monomer library - (Refmac) restraints dictionary, which is made of monomers (chemical components), links and modifications.
MTZ reflection file format.
A class for converting MTZ (merged or unmerged) to SF-mmCIF
Cell-linked lists method for atom searching (a.k.a. grid search, binning, bucketing, cell technique for neighbor search, etc).
Neutron coherent scattering lengths of the elements, from Neutron News, Vol. 3, No. 3, 1992.
Utilities for parsing CIF numbers (the CIF spec calls it ‘numb’).
Read PDB file format and store it in Structure.
Read sequence from PIR or FASTA format.
Place hydrogens according to bond lengths and angles from monomer library.
Heuristic methods for working with chains and polymers. Includes also a few well-defined functions, such as removal of waters.
Structural superposition, the QCP method.
Functions for reading possibly gzipped CIF files. Trivial wrappers that can make compilation faster.
Functions for reading possibly gzipped coordinate files. Trivial wrappers that can make compilation faster.
Functions for reading possibly gzipped CCP4 map files. Trivial wrappers that can make compilation faster.
ReciprocalGrid – grid for reciprocal space data.
Reciprocal space helper functions.
Reads reflection data from the mmCIF format.
Reindex merged or unmerged MTZ file.
Function read_metadata_from_remarks() that interprets REMARK 3 and REMARK 200/230/240 filling in Metadata.
List of common residues with basic data.
Anisotropic scaling of data (includes scaling of bulk solvent parameters)
Simple pairwise sequence alignment.
SeqId – residue number and insertion code together.
Direct calculation of structure factors.
Representation of small molecule or inorganic crystal. Flat list of atom sites. Minimal functionality.
Read small molecule CIF file into SmallStructure (from small.hpp).
Flat bulk solvent mask. With helper tools that modify data on grid.
Span - span of array or std::vector. MutableVectorSpan - span of std::vector with insert() and erase()
to_str(float|double), gf_snprintf - wrappers around stb_sprintf.
Crystallographic Symmetry. Space Groups. Coordinate Triplets.
Create cif::Block with monomer library _chem_comp* categories from struct ChemComp.
Writing cif::Document or its parts to std::ostream.
Writing cif::Document or its parts as JSON (mmJSON, CIF-JSON, etc).
Create cif::Document (for PDBx/mmCIF file) from Structure.
Writing PDB file format (Structure -> pdb file).
Topo(logy) - restraints (from a monomer library) applied to a model.
Conversion between UTF-8 and wchar. Used only for file names on Windows.
Utilities. Mostly for working with strings and vectors.
Read XDS_ASCII.HKL. For now, only unmerged files are read.