Mar 272017

The CRYSTALS v14.6236 installer is now available.

This update fixes a number of bugs and eliminates a few mysterious crashes. Please report problems. The minor component or the version number now refers to a specific snapshot of the source code enabling us to better identify and fix bugs.

Key changes between v14.5841 and 14.6236
Ensure copied text is flushed to clipboard on CRYSTALS exit (otherwise it gets lost).
Upgrade xinvert script to accommodate the 11 enantiomorphic space groups and the 7 which require origin shifts.
Diffin: Set default for Z in case no formula found.
Diffin: Extend filename length to 512 to help when running in folders with long paths
Catch crash on matrix inversion of singular matrices.
Xcif: Disable ‘Include HKLI’ and ‘Include squeeze’ if include files not present in folder.
Xafour: Change limiting density question to a dialog to avoid new user confusion.
Close xpublish window after CIF dialog is dismissed.
Add CHANGE RESIDUE to pop-up menus
Speed up plots of internal vs external sigmas
Fix style and centre of rotation of zoomed objects. Add more ‘zoom’ and ‘unzoom’ options to menus
Improve popup text on model toolbar buttons
Add SWAP directive to #EDIT. This swaps the atomic and thermal parameters of pairs of atoms.
Insert residues numbers of generated hydrogen atoms
Sanity check of parameter values when loading from disk. Replace NaN with 0.0 or 1.0 as appropriate.
Show scale factor for electron densities, fix NANs in(Fo^20Fc^2) pattersons.
Cameron: Keep outline of small atoms at smaller scales.
Added header to design matrix ASCII output
Tabbed plot of Internal vs External variances
Insert appropriate occupancy and part # for H’s adjacent to split groups.
Spot clashing H-serial numbers on the fly and correct.
Increase allocation for lexical list processing (was falling over at 500 CONTINUATION lines, should now stretch to 2000).
Extend scattering factor look-up table out to Rho = 3.0
Enable unusual anomalous scattering factors to be input directly from the Cell/SG tab.
Add code to List 12 processing to catch singularities in rigid body code. Least useful rotational parameters are removed from the least-squares.
Enable LIST 2 to be output in full (#PUNCH 2) or just as the symbol (#PUNCH 2 B). This latter enables CRYSTALS to create an appropriate LIST 14 on re-input, otherwise the user must create it.
Changed handling of reflection sigmas. #LIST 6 now has an extra item, L30UP, which controls whether LIST 30 is updated as reflections are read in.
xregulh left SCPQ open, which meant that xwrite5 failed to delete it, resulting in the H optimisation list of refinable atoms being left in there and used to setup L12 if, and only if, there were no OH or NH atoms to be refined. FIXED.
Context menu entry to swap two atom labels.
Read SHELX T .res files (ignore peak height – last column in atom-coordinate list)
Added EXPORT button to FO vs FC graph (csv format).
Extended support for 6-digit H atom serial numbers to xwrite5.scp (not supported everywhere, but here is quite critical).
Improve treatment of A&B parts. Phase shift is included in A&B as they are stored for compatibility with PLATON SQUEEZE, then removed as they are used and the Phase shift applied separately.
Provide clear warning when link for PLATON fails because of a missing LIST
Fix reported cell volume e.s.d for tetragonal and cubic systems (includes correlation of length parameters).
NEW #PUNCH 6 I punches all stored keys.
Use bond list for auto H addition. Use part numbers to generate multiple H conformations on nearby atoms.
Manual ADD H (right-click Add hydrogens) now puts the new H’s into a sensible PART based on connected atom parts. Quite useful.
Allow automatic ‘ride’ of multiple part H’s on same carbon atom (previous version repeated the pivot atom in the constraints list).
Keeping CheckCIF happy: Suppress ESD output on H-bond angle (D-H–A) if the H is riding or fixed.
Add KEYS to reflection statistic plots. Add +/- 10% bands to npp.

Apr 082016

group16The 2016 British Crystallographic Meeting Spring Meeting took place at the University of Nottingham from 4th – 7th April. Contributions from Chem. Cryst. staff and students were:

Jerome G. P. Wicker, Bill I. F. David & Richard I. Cooper
When will it Crystallise? (Talk in session: From Amorphous to Crystal)

Jo Baker & Richard I. Cooper
Making and Measuring Photoswitchable Materials (Talk in session: Young Crystallographers’ Satellite)

Pascal Parois, Karim J. Sutton & Richard I. Cooper
On the application of leverage analysis to parameter precision using area detector strategies (Poster)

Oliver Robshaw & Richard I. Cooper
The role of molecular similarity in crystal structure packing (Poster)

Katie McInally & Richard I. Cooper
Linking crystallization prediction, theory and experiment using solubility curve determination (Poster)

Richard I. Cooper, Pascal Parois & David J. Watkin
Non-routine single crystal structure analyses using CRYSTALS (Poster)

Alex Mercer & Richard I. Cooper
Fitting Disordered Crystal Structures by Simulated Annealing of an Ensemble Model (Poster)


Mar 072016

Acta. Cryst. (2016) C72, 261-267 [ doi:10.1107/S2053229616003570 ]

snipA study of post-refinement absolute structure determination using previously published data was carried out using the CRYSTALS software package. We show that absolute structure determination may be carried out optimally using the analyses available in CRYSTALS, and that it is not necessary to have the separate procedures absolute structure determination and no inter­est in absolute structure as proposed by Flack [Chimia (2014), 68, 26–30].

Publisher’s copy

Oct 082014

v14.5481The CRYSTALS v14.5481 installer is now available.

This update fixes usability problems related to new features in the latest series of releases. There may be a problem with OpenGL rendering on ATI graphics cards – please report problems if you see any ‘artefacts’ while viewing crystal structure models.

Version 14.60 onwards are built with a new compiler and libraries – therefore please report any unusual installation or usage problems.

Note the slight change in version numbering (v14.62 -> v14.5481) The minor component now refers to a specific snapshot of the source code enabling us to better identify and fix bugs in older releases.

See v1460 release for more changes.

Aug 172014

v1460The CRYSTALS v1461 installer is now available.

There are still some usability issues in this release. v1462 is expected soon (October 1st).

This is a bugfix release. It fixes the following problems in v1460: a crash on loading large datasets; line-ending issues with the new built-in editor; freeze in Cameron; display issues with atom labels in CRYSTALS.

Version 1460 and this release are built with a new compiler and libraries – therefore please report any installation or usage problems.

See v1460 release for changes.

May 152013

pascalPascal is a senior post-doctoral researcher working on refinement and analysis of diffraction from very short lived excited state chemical species. He obtained a PhD with Dr Mark Murrie at Glasgow University studying the effects of pressure on single molecular magnets, and has since held posts at Utrecht University and University of Nancy working on software development and time-resolved diffraction. Pascal maintains a personal blog on chemical and crystallographic software matters, and in his spare time enjoys hiking and genealogy.

May 152013

Crystallographic structure refinement can involve hundreds of millions of calculations for a single iteration of structure refinement; careful optimisation plays an important role in determining how efficiently the software makes uses of the available CPU power. The following freely available tools help identify bottlenecks in software implementations, and allow testing potentially faster algorithms and compiler options. On recent CPUs, a carefully optimised algorithm can easily be ten times faster than a naïve implementation. Not only does this save time, but it also enables the use of larger data sets and more complicated models to tackle ever more complicated problems. A time-critical portion of code from the crystallographic refinement package CRYSTALS is analysed here.

All the software tools discussed are free and open source, running on the Linux operating system. Some of them are not available on Windows.


The first optimisation step is profiling the execution of the existing code and algorithms. This will reveal exactly how much time is spent in functions, lines of code, and even assembly instructions. Two approaches are common:

  1. Emulating a CPU in software and then counting every instruction executed. This is the method used in valgrind. It can also look for memory leaks. Because the CPU is emulated, it can take up to 200 times longer than normal code execution.
  2. Exploiting hardware counters directly inside the CPU. These counters can be checked at fixed intervals and then recorded. While there is almost no performance penalty compared to the normal execution, it can be inaccurate. The software perf from the linux kernel can exploit them.

KCacheGrind: the coloured regions correspond to different functions in the software while the area of each corresponds to the time spent in that function during execution.

This example uses the least-squares refinement routine (\sfls) in the crystallographic analysis package CRYSTALS. A decent size data set ( from the journal Acta Crystallographica Section E has been used. The command line version of CRYSTALS (compiled with COMPCODE=LIN) was compiled on Linux using the open source compiler gfortran. The executable was then profiled using the software valgrind and the profiling data were analysed with kcachegrind. The output includes a graphical map (shown below) in which coloured regions correspond to different functions in the software and the area of each region corresponds to the time spent in that function during execution.


The same data has been analysed using the software perf with the following result:

89.94% crystals crystals      [.] adlhsblock_
 3.71% crystals crystals      [.] xchols_
 2.90% crystals crystals      [.] xsflsx_
 0.89% crystals  [.] __expf_finite
 0.44% crystals crystals      [.] xzerof_

In both cases, the profile analysis reveals that about 90% of the time is spent in the adlhsblock function, which is just 21 lines long including declarations. The body of the function is shown below (accumula.F, revision 1.8).

I = 1
do ROW=1, BLOCKdimension  ! Loop over all the rows of the block
    CONST = DERIVS(ROW)   ! Get the constent term
    do COLUMN = ROW, BLOCKdimension
        MATBLOCK(I) = MATBLOCK(I) + CONST*DERIVS(COLUMN) ! Sum on the next term.
        I = I + 1         ! Move to the next position in the matrix
    end do
end do

Instruction level analysis

The adlhsblock function is forming the normal matrix from the design matrix and is mathematically doing the matrix multiplication Zt Z. To save memory, the design matrix is not stored completely and the calculation is done reflection by reflection, multiplying and accumulating the outer product of one row of Z. Furthermore, only the upper triangle of the normal matrix is stored, which reduces the number of operations, but makes for convoluted row/column indexing of the elements.

Further investigation of the code profiling within the function indicates that the bottleneck is on the line:


The assembly instructions also revealed the used of scalar instructions.

 0.09 │70:  vmovss (%rsi),%xmm0
21.21 │     add $0x4,%rsi
 0.05 │78:  vmulss %xmm0,%xmm1,%xmm0
12.24 │     movslq %ecx,%rcx 
 0.12 │     lea -0x4(%rdx,%rcx,4),%rcx
20.42 │     vaddss (%rcx),%xmm0,%xmm0
13.35 │     vmovss %xmm0,(%rcx)
      │         I = I + 1
20.91 │     mov %eax,%ecx
 0.06 │     add $0x1,%eax

Modern CPUs include two kind of processing units: scalar units (which process one input at a time) and vector units (which can process multiple input at the same time with the same operation). The latter instructions are called SIMD. The performance of SIMD can be outstanding compare to scalar instructions: On a Sandy bridge Intel processor the vector instructions can operate on up to eight single precision numbers at the same time. Compilers can automatically use these instructions based on patterns in the source code (see However it is advisable to always check if a loop has been vectorized as expected as some restrictions may apply (see The use of scalar instructions is symptomatic of a suboptimal optimisation.

Optimisation and analysis

The standard optimisation level compiler switch for CRYSTALS in the Linux makefile is ‘-O2’ which does not include autovectorisation (autovectorisation is enabled at ‘O3’ level). CRYSTALS was therefore compiled with autovectorisation enabled (-ftree-vectorize -msse2) and did not give any speed up. Applying the flag (-ftree-vectorizer-verbose) and checking the output during compilation confirmed that no loop had been vectorised. In order to improve the situation the inner loop was removed and replaced with array operations and the recursive dependency on the indices was removed.

do ROW=1, BLOCKdimension
   i = ((row-1)*(2*blockdimension-row+2))/2+1
   j = i + blockdimension - row
   MATBLOCK(i:j) = MATBLOCK(i:j)+DERIVS(ROW)* derivs(row:BLOCKdimension)
end do

The new version has been compared to the original given different level of optimisation:

Compilation flag Original code (Wall clock time in s) New code (Wall clock time in s)
-O2 16 12
-O2 -ftree-vectorize -msse2 16 6.7
-O2 -ftree-vectorize -mavx 16 5.0

The improvement without vectorization (16s to 12s) is surprising: Because each cycle in the loop is independent the greater flexibility could be exploited by the scheduler to reorder instructions for greater efficiency. When using sse2 or avx instructions the new version is much faster still. The double size of the avx vector compare to sse is also clearly visible.

The new code was profiled using perf and compared to the original one. The bottleneck remains in the adlhsblock function, but the assembly output confirms the use of the vectorised avx intructions (vmulps and vaddps for example).

85.75% crystals crystals     [.] adlhsblock_
 5.40% crystals crystals     [.] xchols_
 4.40% crystals crystals     [.] xsflsx_
 1.30% crystals [.] __expf_finite
 0.61% crystals crystals     [.] xzerof_
      |         MATBLOCK(i:j) = MATBLOCK(i:j)+DERIVS(ROW)*derivs(row:BLOCKdimension)
 2.35 |15a: vmovup (%r11,%rcx,1),%xmm1
 8.42 |     add $0x1,%r8
 4.73 |     vinser $0x1,0x10(%r11,%rcx,1),%ymm1,%ymm1
 7.91 |     vmulps %ymm2,%ymm1,%ymm1
 8.82 |     vaddps (%r14,%rcx,1),%ymm1,%ymm1
41.82 |     vmovap %ymm1,(%r14,%rcx,1)
12.10 |     add $0x20,%rcx 
 2.62 |     cmp %r8,%r13
      |   ? ja 15a


Using code profiling to identify a bottleneck, followed by optimisation of the algorithm and appropriate choice of compiler switches result in least-squares refinement that is up to three times faster.

Sep 212012

J. Appl. Cryst. (2012). 45, 1057–1060. [ doi:10.1107/S0021889812035790 ]

The traditional Waser distance restraint, the rigid-bond restraint and atomic displacement parameter (ADP) similarity restraints have an equal influence on both atoms involved in the restraint. This may be inappropriate in cases where it can reasonably be expected that the precision of the determination of the positional parameters and ADPs is not equal, e.g. towards the extremities of a librating structure or where one atom is a significantly stronger scatterer than the other. In these cases, the traditional restraint feeds information from the poorly defined atom to the better defined atom, with the possibility that its characteristics become degraded. The modified restraint described here feeds information from the better defined atom to the more poorly defined atom with minimal feedback.

Electronic reprints

Publisher’s copy