Overview

What is it for?

Gemmi is a library, accompanied by a set of programs, developed primarily for use in structural biology, and in particular in macromolecular crystallography (MX). For working with:

  • macromolecular models (content of PDB, PDBx/mmCIF and mmJSON files),

  • refinement restraints (CIF files) and small molecule models,

  • reflection data (MTZ and mmCIF formats),

  • crystallographic symmetry,

  • data on a 3D grid with crystallographic symmetry (electron density maps, masks, MRC/CCP4 format)

Parts of this library can be useful in structural bioinformatics (for symmetry-aware analysis of protein models), in chemical crystallography and in other molecular-structure sciences that use CIF files (we have the fastest open-source CIF parser).

Gemmi is open-source (MPL) and portable – it runs on Linux, Windows, macOS and even inside a web browser when compiled to WebAssembly. It is written in C++14, with Python (3.8+) bindings and a partial C and Fortran 2003 interface.

The project also maintains web-based tools and fancy PDB statistics.

Gemmi is a joint project of Global Phasing Ltd and CCP4. It is named after Gemmi Pass. The name can also be expanded as GEneral MacroMolecular I/o.

Source code repository: https://github.com/project-gemmi/gemmi

Note

You can ask questions in Discussions or Issues on GitHub. Alternatively, send me an email.

Contents

Credits

This project is using code from a number of third-party open-source projects.

Projects used in the C++ library, included under include/gemmi/third_party/ (if used in headers) or third_party/:

  • PEGTL – library for creating PEG parsers. License: MIT.

  • sajson – high-performance JSON parser. License: MIT.

  • PocketFFT – FFT library. License: 3-clause BSD.

  • stb_sprintf – locale-independent snprintf() implementation. License: Public Domain.

  • fast_float – locale-independent number parsing. License: Apache 2.0.

  • tinydir – directory (filesystem) reader. License: 2-clause BSD.

Code derived from the following projects is used in the library:

  • ksw2 – sequence alignment in seqalign.hpp is based on the ksw_gg function from ksw2. License: MIT.

  • QCProt – superposition method in qcp.hpp is taken from QCProt and adapted to our project. License: BSD.

  • Larch – calculation of f’ and f” in fprime.cpp is based on CromerLiberman code from Larch. License: 2-clause BSD.

Projects included under third_party/ that are not used in the library itself, but are used in command-line utilities, python bindings or tests:

  • zpp serializer – serialization framework. License: MIT.

  • The Lean Mean C++ Option Parser – command-line option parser. License: MIT.

  • doctest – testing framework. License: MIT.

  • linalg.h – linear algebra library. License: Public Domain.

  • zlib – a subset of the zlib library for decompressing gz files, used as a fallback when the zlib library is not found in the system. License: zlib.

Not distributed with Gemmi:

  • nanobind – used for creating Python bindings. License: 3-clause BSD.

  • zlib-ng – optional, can be used instead of zlib for faster reading of gzipped files.

  • cctbx – used in tests (if cctbx is not present, these tests are skipped) and in scripts that generated space group data and 2-fold twinning operations. License: 3-clause BSD.

Mentions:

  • NLOpt was used to try out various optimization methods for class Scaling. License: MIT.

Email me if I forgot about something.