TeraFLOPS

log in |

Message boards : Science : TeraFLOPS

Author | Message |
---|---|

I noticed on your server status page it had 225 teraFLOPs listed as the compute speed. I find this very odd as I am seeing many fast cards each capable of 10+ teraFLOPs like the 1080ti in single precision. Therefore it leads me to believe that this project is entirely double precision, of which nvidia consumer gpus are not well known for having very high double precision rates (1/32). Can you explain what the application itself is actually doing and what the beginning of the data consists of and what the output is? | |

ID: 461 · Rating: 0 · rate:
/ Reply
Quote
| |

This project uses Integer (int64) calculations. | |

ID: 464 · Rating: 0 · rate:
/ Reply
Quote
| |

I noticed on your server status page it had 225 teraFLOPs listed as the compute speed. I find this very odd as I am seeing many fast cards each capable of 10+ teraFLOPs like the 1080ti in single precision. Therefore it leads me to believe that this project is entirely double precision, of which nvidia consumer gpus are not well known for having very high double precision rates (1/32). Can you explain what the application itself is actually doing and what the beginning of the data consists of and what the output is? Double-precision floating point operations are not used on the GPU. Below is information on input and output data of the apps, as well as the computational algorithm. Input data: 1. XML file containing the unit cell configuration (extracted from CIF file) and parameters of simulation, 2. Text file containing the x-ray atomic form factors (for x-ray diffraction) or neutron scattering lengths (for neutron diffraction). These data are provided by periodictable python module. Output data: 1. stdout file containing some information on the performed calculations, GPU properties (for GPU apps) and errors (if any). 2. file with powder diffraction pattern. It contains two-column array. The first column is the scattering vector magnitude and the second column is the scattering intensity (could be one-column actually cause the first column is the same for all WUs). The calculation algorithm consists of 3 steps: 1. calculating the atomic ensemble using the lattice parameters and unit cell configuration given in the XML file, 2. calculating the histogram of interatomic distances for this atomic ensemble, 3. calculating the scattering intensity using the histogram of interatomic distances. The first step is always performed on a CPU. Here, most of intermediate operations are double-precision, but the final atomic ensemble is a 32-bit float array. The second step is performed on a GPU in the GPU apps. The distances are calculated using floating point operations, of course, but the histogram of interatomic distances is a 64-bit integer array. Updating the bins of this histogram requires atomic operations in global memory on 64-bit integers. This step takes most of the computational time. The third step is also performed on a GPU in the GPU apps. Here, most of the calculations use floating-point operations. Thanks to the intrinsic trigonometry functions this step performed very fast on GPUs. Also, you can see my presentation on BOINC:FAST 2017 conference if you want more details. According to this page, the GigaFLOPs number shown in the status page is calculated using the simple formula: (Total RAC)/200. So, this is a very approximate value. Also, most of the users do not crunch for this project all day long. | |

ID: 465 · Rating: 0 · rate:
/ Reply
Quote
| |

Thank you for your well written response | |

ID: 466 · Rating: 0 · rate:
/ Reply
Quote
| |

Message boards :
Science :
TeraFLOPS

© 2021 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)