Bug fixing and changes in GPU plan classes
log in

Advanced search

Message boards : News : Bug fixing and changes in GPU plan classes

Author Message
Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 287
Credit: 103,382
RAC: 0
Message 36 - Posted: 19 Nov 2016, 18:54:11 UTC
Last modified: 19 Nov 2016, 22:40:54 UTC

Two bugs were fixed thanks to Steve Hawker* report:

    a bug preventing the mt_mac app version from running on Mac hosts with no GCC runtime,

    a bug preventing all OpenCL versions from querying some old AMD GPUs if they can’t report their wavefront width value (CL_DEVICE_WAVEFRONT_WIDTH_AMD).

Also, to address this issue, all GPU versions now require a dedicated CPU core. Please, let me know, if this will cause problems.

zombie67 [MM]
Avatar
Send message
Joined: 14 Nov 16
Posts: 2
Credit: 10,919,717
RAC: 0
Message 38 - Posted: 20 Nov 2016, 3:42:59 UTC
Last modified: 20 Nov 2016, 3:44:00 UTC

First: Thanks for the Intel HD support! Especially for OSX. That is kinda rare for projects to provide that.

A couple of requests:

1) MT CPU tasks are limited to 16 threads. Any reason to limit it to 16? With my 24 and 56 thread machines, that leaves threads idle, because they aren't multiples of 16. It would be nice if the app used all threads. Or maybe make the multiple user-selectable? Or maybe make them use a smaller max per task, say 8?

2) Either work generation is very slow, or there is some artificial limit to tasks given to a machine at any given time. With such a small amount downloaded and they finish so quickly, it causes a lot of idle time waiting for work.
____________
Dublin, California
Team: SETI.USA

[VENETO] boboviz
Send message
Joined: 16 Nov 16
Posts: 37
Credit: 1,140,113
RAC: 0
Message 39 - Posted: 20 Nov 2016, 7:47:06 UTC - in response to Message 36.

[/list]Also, to address this issue, all GPU versions now require a dedicated CPU core. Please, let me know, if this will cause problems.


Just a curiosity.
I tried gpu app with and without cpu core dedicated. I've not seen big differences in executions time (not like the user in the thread), so, have you seen some difference in the server?

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 287
Credit: 103,382
RAC: 0
Message 40 - Posted: 20 Nov 2016, 10:03:47 UTC - in response to Message 38.

First: Thanks for the Intel HD support! Especially for OSX. That is kinda rare for projects to provide that.

A couple of requests:

1) MT CPU tasks are limited to 16 threads. Any reason to limit it to 16? With my 24 and 56 thread machines, that leaves threads idle, because they aren't multiples of 16. It would be nice if the app used all threads. Or maybe make the multiple user-selectable? Or maybe make them use a smaller max per task, say 8?

2) Either work generation is very slow, or there is some artificial limit to tasks given to a machine at any given time. With such a small amount downloaded and they finish so quickly, it causes a lot of idle time waiting for work.

First of all, thank you for participating!

Regarding your requests:

1) I've increased the limit of CPU threads to 64, so let’s see whether this will cause problems or not. The reason for limiting the number of CPU threads is a memory consumption. In the current implementation of the algorithm, the total memory used is proportional to the number of CPU threads. This allows to avoid the synchronized writes to histogram array for CPU threads. The memory used by one thread for this array is proportional to the linear size of the crystallite and to the square of the total number of different chemical elements in the sample. I suggest, it would be about 100 MB in the worst case. So, the total memory consumption would be about 5.6 GB for the machine with 56 threads.

2) The number of WUs generated per hour is severely limited at alpha stage (the WUs are generated each 5 minutes). There are two reasons for that:

(a) the current server (Amazon t2.micro instance) is simply unable to process more results;
(b) the WUs are multi-sized but the MultiSize feature is not implemented yet. Like [VENETO] boboviz suggested, increasing the number of WUs will overload the server if the scheduler is not optimized.

My plan is to implement the MultiSize first, test that it is working properly, then upgrade the server to c4.large instance (the transition should be easy, but let’s see) and increase the number of WUs generated per hour by a factor of 10 for a couple of days. After that, the alpha testing will be over and the project will be suspended for a long-term maintenance. If the performance of c4.large would be enough, it will be used for this project after the restart.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 287
Credit: 103,382
RAC: 0
Message 41 - Posted: 20 Nov 2016, 10:13:18 UTC - in response to Message 39.

[/list]Also, to address this issue, all GPU versions now require a dedicated CPU core. Please, let me know, if this will cause problems.


Just a curiosity.
I tried gpu app with and without cpu core dedicated. I've not seen big differences in executions time (not like the user in the thread), so, have you seen some difference in the server?

No, I've not noticed any difference so far.

zombie67 [MM]
Avatar
Send message
Joined: 14 Nov 16
Posts: 2
Credit: 10,919,717
RAC: 0
Message 43 - Posted: 20 Nov 2016, 14:58:35 UTC - in response to Message 40.

1) I've increased the limit of CPU threads to 64, so let’s see whether this will cause problems or not


Thanks again! Seems to be working.
____________
Dublin, California
Team: SETI.USA

[VENETO] boboviz
Send message
Joined: 16 Nov 16
Posts: 37
Credit: 1,140,113
RAC: 0
Message 44 - Posted: 21 Nov 2016, 9:39:02 UTC - in response to Message 41.
Last modified: 21 Nov 2016, 9:39:22 UTC

I tried gpu app with and without cpu core dedicated. I've not seen big differences in executions time (not like the user in the thread), so, have you seen some difference in the server?

No, I've not noticed any difference so far.


So, please, remove the cpu core reservation...
If some gpu needs a core, users create their anonimous platform.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 287
Credit: 103,382
RAC: 0
Message 49 - Posted: 21 Nov 2016, 15:41:28 UTC - in response to Message 44.

So, please, remove the cpu core reservation...
If some gpu needs a core, users create their anonimous platform.

Well, I'm not sure that the problem is GPU-related. In some BOINC applications, the CPU code is so heavy that running any task in parallel is almost impossible. I think, the problem occurs when the GPU app is running in parallel with one of those applications.
At least for debugging, I need to minimize the number of possible fails.

Message boards : News : Bug fixing and changes in GPU plan classes


Main page · Your account · Message boards


© 2019 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)