Symmetry

The Gemmi symmetry module provides space group related functionality needed in other parts of the library – when working with coordinate files, electron density maps and reflections.

Although the Gemmi project is developed for macromolecular crystallography, for which only 65 space groups are relevant, we cover all the 230 crystallographic space groups for the sake of completeness.

For C++: this part of Gemmi has no dependencies, all is in a single header symmetry.hpp.

Space group table

Gemmi tabulates 550+ settings of the 230 crystallographic space groups. Each entry includes:

  • number – space group number (1-230),
  • ccp4 – ccp4 number (assigned to particular settings; modulo 1000 they give space group number: 3 and 1003 correspond to P 1 2 1 and P 1 1 2; 0 means none),
  • hm – Herman-Mauguin (H-M) symbol a.k.a. the international notation (I a -3 d, C 1 2 1),
  • ext – extension to the H-M notations (none, 1, 2, H or R) that make extended H-M symbol (R 3:H, P 4/n:1),
  • hall – the Hall symbol (-I 4bd 2c 3, C 2y, P 32 2c (0 0 -1)) used to generate the space group operations,
  • and basisop – change of basis operator from the reference setting.

This data is derived primarily from the CCP4 syminfo.lib file, which in turn is based on the data from sgtbx that was augmented with the old data from a CCP4 file named symop.lib.

The data from sgtbx is also available in the older SgInfo library, as well as in the International Tables for Crystallography Vol. B ch. 1.4 (in 2010 ed.). It has 530 entries including 3 duplicates (different names for the same settings) in the space group 68.

Gemmi includes also settings from OpenBabel that are absent in syminfo.lib. If needed we will add more entries in the future. For example, we do not include all the C- and F-centred tetragonal space groups featured in Crystallographic Space Group Diagrams and Tables by Jeremy Karl Cockcroft (they are also listed on this page written by R.W. Grosse-Kunstleve, and mentioned in the 2015 edition of ITfC Vol.A, Table 1.5.4.4).

We also tabulated alternative names. For now this only includes new standard names introduced in 1990’s by the IUCr committee. For example, the space group no. 39 is now officially, in the Volume A of the International Tables, named Aem2 not Abm2. Most of the crystallographic software (as well as the Volume B of the Tables) still use the old names. (spglib uses new ones, sgtbx reads new names with the option ad_hoc_1992).

The usual way to access a space group from the table is to search it by name. In C++:

#include <gemmi/symmetry.hpp>
// ...
const SpaceGroup* sg = find_spacegroup_by_name(name);

and in Python:

>>> import gemmi
>>> gemmi.find_spacegroup_by_name('I2')
<gemmi.SpaceGroup("I 1 2 1")>

Note

The rest of this section has only Python examples mixed with the text. One longer C++ example is at the end.

The name above is expected to be either a full international Herman-Mauguin symbol or a short symbol (I2 instead of I 1 2 1). This functions also searches the tabulated alternative names:

>>> gemmi.find_spacegroup_by_name('C m m e') # new names have 'e' and 'g'
<gemmi.SpaceGroup("C m m a")>

Sometimes in the PDB, the setting of the hexagonal crystal system is not clear from the symmetry symbol alone. For instance, the H-M symbol “R 3” can mean either hexagonal or rhombohedral setting. This ambiguity can be resolved by comparing angles of the unit cell. The ratio of gamma to alpha angles is 120:90 in the hexagonal system and 1:1 in rhombohedral. Therefore, find_spacegroup_by_name() accepts also alpha and gamma angles. If the angles are not passed, the hexagonal system is returned:

>>> gemmi.find_spacegroup_by_name('R 3 2')
<gemmi.SpaceGroup("R 3 2:H")>
>>> gemmi.find_spacegroup_by_name('R 3 2', alpha=92.02, gamma=92.02)
<gemmi.SpaceGroup("R 3 2:R")>
>>> gemmi.find_spacegroup_by_name('R 3 2', alpha=90, gamma=120)
<gemmi.SpaceGroup("R 3 2:H")>
>>> # of course, you do not need angles if you use extended H-M symbol
>>> gemmi.find_spacegroup_by_name('R 3 2:R')
<gemmi.SpaceGroup("R 3 2:R")>

You can also get space group by number:

>>> gemmi.find_spacegroup_by_number(5)
<gemmi.SpaceGroup("C 1 2 1")>

The number is the ccp4 number mentioned above, so some 4-digit numbers are also recognized:

>>> gemmi.find_spacegroup_by_number(4005)
<gemmi.SpaceGroup("I 1 2 1")>

For values 1-230 the number corresponds to the first setting of this space group in the table in ITfC vol. B. Usually, it is the reference (standard) setting of the space group, but unfortunately not always. So we have another function that returns always the reference setting:

>>> gemmi.get_spacegroup_reference_setting(48)
<gemmi.SpaceGroup("P n n n:2")>

The next function for searching the space group table checks symmetry operations:

>>> gemmi.symops_from_hall('C 2y (x,y,-x+z)')  
<gemmi.GroupOps object at 0x...>
>>> gemmi.find_spacegroup_by_ops(_)
<gemmi.SpaceGroup("I 1 2 1")>
>>> _.hall
'I 2y'

This example shows also how to find space group corresponding to a Hall symbol. The Hall notation encodes all the group operations. Unfortunately, in a non-unique way. Different Hall symbols can be used to encode the same symmetry operations. In the example above “C 2y (x,y,-x+z)” is equivalent to “I 2y”. That’s why we compare operations not symbols.

The last function for searching space group is also comparing operations. It takes two arguments: a space group and a change-of-basis operator, and searches for space group settings that match the transformed operations of the original space group:

>>> # I2 -> C2
>>> gemmi.find_spacegroup_by_change_of_basis(gemmi.SpaceGroup('I2'), gemmi.Op('x,y,x+z'))
<gemmi.SpaceGroup("C 1 2 1")>
>>> # enantiomorphic pair
>>> gemmi.find_spacegroup_by_change_of_basis(gemmi.SpaceGroup('P 41'), gemmi.Op('-x,-y,-z'))
<gemmi.SpaceGroup("P 43")>

Finally, we may iterate over the space group table:

>>> for sg in gemmi.spacegroup_table():
...     if sg.ext == 'H':
...         print(sg)
<gemmi.SpaceGroup("R 3:H")>
<gemmi.SpaceGroup("R -3:H")>
<gemmi.SpaceGroup("R 3 2:H")>
<gemmi.SpaceGroup("R 3 m:H")>
<gemmi.SpaceGroup("R 3 c:H")>
<gemmi.SpaceGroup("R -3 m:H")>
<gemmi.SpaceGroup("R -3 c:H")>

gemmi.SpaceGroup represents an entry in the space group table. It has the properties listed at the beginning of this section (number, ccp4, hm, ext, hall) and a few methods:

>>> sg = gemmi.SpaceGroup('R3') # equivalent to find_spacegroup_by_name()
>>> sg.xhm()                    # extended Hermann-Mauguin name
'R 3:H'
>>> sg.short_name()             # short name
'H3'
>>> sg.is_enantiomorphic()      # is it one of 22 chiral space groups?
False
>>> sg.is_sohncke()             # is it one of 65 Sohncke space groups?
True
>>> sg.point_group_hm()         # H-M name of the point group
'3'
>>> sg.laue_str()               # name of the Laue class
'-3'
>>> sg.crystal_system_str()     # name of the crystal system
'trigonal'
>>> sg.is_reference_setting()
True
>>> # and the most important...
>>> sg.operations()             
<gemmi.GroupOps object at 0x...>

Categories related to chirality can be confusing. Here, we follow the IUCr dictionary:

  • Sohncke groups (a.k.a. non-enantiogenic space groups) are “the three-dimensional space groups containing only operations of the first kind (rotations, rototranslations, translations)”; 65 groups in which chiral structures crystallize.
  • Enantiomorphic space groups (a.k.a. chiral groups) are those whose group structure is chiral; 22 groups forming 11 enantiomorphic pairs. (So chiral structures can crystallize not only in the chiral space groups but also in 43 of the achiral ones.)

If you would like to ignore entries that are absent in SgInfo, sgtbx, spglib and in the International Tables vol. B, use only the first 530 entries of the gemmi table. In Python, we have a helper function for this:

>>> for sg in gemmi.spacegroup_table_itb():
...     pass
>>> sg.ccp4  # sg here is the last space group that was iterated
230

Implementation notes

The choice between having an explicit list of all the operations and generating them from a smaller set of symbols is a trade-off between simplicity of the code and the amount of the tabulated data.

The number of symmetry operations per space group is between 1 and 192, but they can be split into symmetry operations (max. 48 for point group m-3m) and so-called centring vectors (max. 4 for the face-centered lattice).

  • The simplest way of storing the operations is to list them all (i.e. 192 triplets for no. 228) in a text file. This approach is used by OpenBabel (in space-groups.txt).

  • It makes sense to keep the centring vectors separately (192 becomes 48 + 4). This is done in the CCP4 syminfo.lib file (539 entries), which is used by csymlib (part of libccp4) and a few other projects.

  • The operations can be as well tabulated in the code. This approach is used (with one or two layers of indirection to reduce the data size) by spglib (C library) and NGL.

  • The inversion center (if applicable) can be kept separately, to reduce the maximum number of stored operations (48 -> 24+1). This is how the symmetry data is encoded in Fityk.

  • Actually all the operations can be generated from only a few generators, at the expense of more complex code. Example: Mantid (SpaceGroupFactory.cpp).

  • Finally, one can use one of the two computer-adapted descriptions from ITfC. The so-called explicit notation (ICC$I3Q000$P4C393$P2D933) is the longer of the two, but easier to parse by the computer. It is used in the SPGGEN program.

  • The Hall notation (-I 4bd 2c 3), first proposed by Sydney R. Hall in 1981, is shorter and more popular. It can be interpreted by a few libraries:

    • SgInfo and SgLite (old C libraries from Ralf W. Grosse-Kunstleve recently re-licensed to BSD),
    • sgtbx (successor of SgInfo written in C++/Python, part of cctbx),
    • CCP4 Clipper,

    and by many programs. On the bad side, the conciseness is achieved by complex rules of interpreting the symbols; the choice of a Hall symbol for given settings is not unambiguous and the symbols differ between editions of ITfC, and between sgtbx and syminfo.lib.

After contemplating all the possibilities we ended up implementing the most complex solution: Hall symbols. The relative complexity does not mean it is slow: translating a Hall notation to generators takes less than a microsecond on a typical desktop machine. Closing a group is also below a microsecond for most of the groups, and up to a few microseconds for the highest symmetry group Fm3̅m (Gemmi uses Dimino’s algorithm for this).

Operations

Crystallographic symmetry operations have a few notations. Gemmi undertands only coordinate triplets (sometimes called the Jones’ faithful notation) such as x,x-y,z+1/2. The symmetry operation is represented by class Op. It can be created in C++ as:

#include <gemmi/symmetry.hpp>
...
gemmi::Op op = gemmi::parse_triplet("-y,x-y,z");

and in Python as:

>>> import gemmi
>>> op = gemmi.Op('-y,x-y,z+1/3')

We can also do the opposite:

>>> op.triplet()
'-y,x-y,z+1/3'

The operation consists of a 3x3 rotation matrix and a translation vector, both stored internally as integers that need to be divided by DEN == 24 to get the actual values.

>>> op.rot
[[0, -24, 0], [24, -24, 0], [0, 0, 24]]
>>> op.tran
[0, 0, 8]

Alternatively, the operation can be expressed as a single 4x4 transformation matrix (which in crystallography is called Seitz matrix).

>>> op.seitz()
[[0, -1, 0, 0], [1, -1, 0, 0], [0, 0, 1, Fraction(1, 3)], [0, 0, 0, 1]]
>>> op.float_seitz()
[[0.0, -1.0, 0.0, 0.0],
 [1.0, -1.0, 0.0, 0.0],
 [0.0, 0.0, 1.0, 0.3333333333333333],
 [0.0, 0.0, 0.0, 1.0]]

Operations can be combined, inverted and wrapped:

>>> gemmi.Op('x-y,x,z+1/6') * '-x,-y,z+1/2'
<gemmi.Op("-x+y,-x,z+2/3")>
>>> _.inverse()
<gemmi.Op("-y,x-y,z-2/3")>
>>> _.wrap()
<gemmi.Op("-y,x-y,z+1/3")>

Wrapping applies modulo 1 to the translational part. Which is usually desirable for crystallographic symmetry. But the triplets and matrices may represent also, for example, generation of a biological assembly. For this reason, x,y+1,z is not automatically reduced to the identity. But when operations are combined, they are assumed to be symmetry operations and the result is wrapped to [0,1):

>>> op
<gemmi.Op("-y,x-y,z+1/3")>
>>> op * op
<gemmi.Op("-x+y,-x,z+2/3")>
>>> op * op * op  # without wrapping we'd have z+1
<gemmi.Op("x,y,z")>

The Op.rot matrix is called “rotation matrix”, because that’s the primary purpose, but it can also represent different linear transformations:

>>> enlarging_op = gemmi.Op("-y+z,x+z,-x+y+z")
>>> enlarging_op.inverse()
<gemmi.Op("-1/3*x+2/3*y-1/3*z,-2/3*x+1/3*y+1/3*z,1/3*x+1/3*y+1/3*z")>
>>> _ * enlarging_op
<gemmi.Op("x,y,z")>

In the real space, a crystal symmetry operation can be applied to the fractional atom position to get an equivalent position:

>>> op.apply_to_xyz([0.25, 0.21875, 0.3])
[-0.21875, 0.03125, 0.6333333333333333]

In the reciprocal space, the same operation relates equivalent reflections. The rotational part determines Miller indices, the translational part – phase shift.

>>> hkl = [3, 0, 1]
>>> op.apply_to_hkl(hkl)
[0, -3, 1]
>>> op.phase_shift(hkl)  # -120 degrees in radians
-2.0943951023931953

Groups of Operations

Each space group setting corresponds to a unique set of operations. This set is represented by class GroupOps.

Symmetry operations (rotation + translation) and centring vectors (translation only) are stored separately:

>>> ops = gemmi.find_spacegroup_by_name('I2').operations()
>>> ops  
<gemmi.GroupOps object at 0x...>
>>> list(ops.sym_ops)
[<gemmi.Op("x,y,z")>, <gemmi.Op("-x,y,-z")>]
>>> list(ops.cen_ops)
[[0, 0, 0], [12, 12, 12]]

but they are be combined on the fly:

>>> len(ops)
4
>>> for op in ops:
...   print(op.triplet())
...
x,y,z
-x,y,-z
x+1/2,y+1/2,z+1/2
-x+1/2,y+1/2,-z+1/2

We can apply a change-of-basis operator to GroupOps:

>>> ops.change_basis(gemmi.Op('x,y,x+z'))  # I2 -> C2
>>> gemmi.find_spacegroup_by_ops(ops)
<gemmi.SpaceGroup("C 1 2 1")>

We can create GroupOps from a list of operations:

>>> op_list = ['x,y,z', 'x,-y,z+1/2', 'x+1/2,y+1/2,z', 'x+1/2,-y+1/2,z+1/2']
>>> new_ops = gemmi.GroupOps([gemmi.Op(o) for o in op_list])

or from a Hall symbol:

>>> gemmi.symops_from_hall('P 4w 2c')  
<gemmi.GroupOps object at 0x...>
>>> len(_)
8

The Hall symbols encode generators which are then used to obtain all the operations. If you’d wonder what generators are encoded, use:

>>> list(gemmi.generators_from_hall('P 4w 2c'))  # no.93
[<gemmi.Op("x,y,z")>, <gemmi.Op("-y,x,z+1/4")>, <gemmi.Op("x,-y,-z+1/2")>]

Combining these 3 generators reconstructs all the 8 symmetry operations.

The GroupOps object has a couple of functions:

>>> new_ops.is_centric()
False
>>> new_ops.find_centering()
'C'

and, again, it can be used to search in the space group table:

>>> gemmi.find_spacegroup_by_ops(new_ops)
<gemmi.SpaceGroup("C 1 c 1")>

C++ Example

Since the code snippets above were only in Python, here we compensate it with one longer example in C++.

#include <assert.h>
#include <iostream>

#include <gemmi/symmetry.hpp>

int main() {
  // operators
  gemmi::Op op = gemmi::parse_triplet("-y,x-y,z"); // one operation from P 3
  assert(op * op == gemmi::parse_triplet("-x+y,-x,z")); // and the other one
  assert(op * op == op.inverse());

  // iteration over all tabulated settings
  for (const gemmi::SpaceGroup& sg : gemmi::spacegroup_tables::main)
    std::cout << sg.number << ' ' << sg.xhm() << "   " << sg.hall << '\n';

  // selecting a space group
  const gemmi::SpaceGroup* c2 = gemmi::find_spacegroup_by_number(5);
  assert(c2->xhm() == "C 1 2 1");
  assert(c2->ccp4 == 5);

  const gemmi::SpaceGroup* i2 = gemmi::find_spacegroup_by_name("I2");
  assert(i2->number == 5);
  assert(i2->ccp4 == 4005);
  assert(i2->xhm() == "I 1 2 1");

  gemmi::GroupOps ops = i2->operations();
  for (const gemmi::Op& operation : ops)
    std::cout << "   " << operation.triplet();
  // output:   x,y,z   -x,y,-z   x+1/2,y+1/2,z+1/2   -x+1/2,y+1/2,-z+1/2
  ops.change_basis(gemmi::parse_triplet("x,y,x+z"));
  assert(gemmi::find_spacegroup_by_ops(ops) == c2);

  assert(gemmi::find_spacegroup_by_name("C m m e") ==  // new name with 'e'
         gemmi::find_spacegroup_by_name("C m m a"));
}