Only 1 of 2 Nvidia GPU's running
log in

Advanced search

Message boards : Nvidia : Only 1 of 2 Nvidia GPU's running

Author Message
JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 189 - Posted: 21 Jun 2017, 20:57:00 UTC
Last modified: 21 Jun 2017, 21:03:04 UTC

As the title says I have 2 Nvidia GTX 970's in this box but only 1 is crunching the work. win7 x64 + Boinc 7.6.33

In boinc manager the two work units look to have been assigned to the two different devices and both work units seem to be crunching away happily.

But whats really happening is that both work units are crunching on just one GPU. According to afterburner & GPU-z only one GPU is doing any work.

I even tried putting in an app_config like this...

<app_config>
<app>
<name>xansons_gpu</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

But all it did was run 4 work units on the one GPU?

Anyone else seeing this??

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 190 - Posted: 21 Jun 2017, 21:50:21 UTC - in response to Message 189.

As the title says I have 2 Nvidia GTX 970's in this box but only 1 is crunching the work. win7 x64 + Boinc 7.6.33

The CUDA app gets the gpu number from the --device N option passed by the client to the app. If this option is not passed then the device 0 will be used. I checked in standalone mode that the app reads this option correctly, but never checked whether this option is passed by the client or not. I'll check this.

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 191 - Posted: 21 Jun 2017, 22:24:42 UTC - in response to Message 190.

Thanks Vlad..

Jim1348
Send message
Joined: 17 Nov 16
Posts: 9
Credit: 241,545
RAC: 0
Message 192 - Posted: 22 Jun 2017, 1:35:39 UTC - in response to Message 189.
Last modified: 22 Jun 2017, 1:39:07 UTC

You might try a cc_config.xml file, placed in the BOINC Data directory:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

I have needed it in the past with two cards, but I don't recall whether it was needed for Nvidia or AMD cards.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 195 - Posted: 22 Jun 2017, 6:20:39 UTC - in response to Message 192.

You might try a cc_config.xml file, placed in the BOINC Data directory:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

I have needed it in the past with two cards, but I don't recall whether it was needed for Nvidia or AMD cards.

As I remember, if you have few different GPUs, by default BOINC will use only the fastest one, i.e. crunch only one GPU task. This option tells it to also use the slower one(s). You can try it, but I doubt it will help here.
____________

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 196 - Posted: 22 Jun 2017, 6:20:48 UTC - in response to Message 192.
Last modified: 22 Jun 2017, 6:47:51 UTC

Hi Jim & Daniel, yea all my rigs have this setting plus their identical cards too but thanks for trying.

This seems to only happen with xansons GPU tasks if I suspend xansons & start 2 Einstein tasks they run properly and both GPU's are loaded & fully utilised.

Can anyone confirm that they have 2 or more of the same Nvidia graphics cards that are all crunching properly?
As I mentioned if I look in boinc manager they seem to be running as they should be, but when I look at GPU usage in any GPU monitoring app such as afterburner then only one GPU is showing any activity. Meaning both work units are running on the same GPU.

There must be someone who can confirm if their rig with 2(or more) of the same Nvidia GPUs is working properly or not? Anyone?

mmonnin
Send message
Joined: 28 Nov 16
Posts: 19
Credit: 5,313,490
RAC: 0
Message 200 - Posted: 22 Jun 2017, 10:07:16 UTC - in response to Message 196.

I have used a 1070 and 970 in the same system before on several projects and haven't had issues. E@H being one. Just the cc_config file is needed. I don't have two GPUs in the one system both crunching BOINC atm though.

In BM it says 0.987 CPU + 1NV (d0) and other tasks say (d1)?

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 202 - Posted: 22 Jun 2017, 13:08:26 UTC - in response to Message 200.
Last modified: 22 Jun 2017, 13:27:50 UTC

Hey mmonnin how goes it?

My BM says the same IE: 0.987 CPU + 1NV (d0) and 0.987 CPU + 1NV (d1)

Which is kinda strange because in this box (D0)=physical PCI-e slot2 and (D1)=physical slot1(closest to the CPU)

Soo it appears like it's the GPU in slot (D0)(the 2nd physical slot) doing all the work?? With (D1)(physical slot 1) that's doing nothing at all. Weird!!

Although boinc probably assigns ID numbers by logical PCI ID's numbers which has little to do with it's physical location but... whatever it's doing it's now starting to confuse the heck out of me. I think I know less now than when I started looking.LOL

The same thing happened years ago at POEM@Home. Their GPU app had the same problem and for many months users could only use one GPU no matter how many they had. Eventually with the help of a new dev they figured it out. Maybe there dev could be contacted?

Or maybe i'm jumping the gun, so far no one else has confirmed that they even have this problem.

So i'm still after anyone else with say 2 960's, 2 970's, 2 980's, 2, 1070's ect ect to see if they are having this problem as well? Or even people with 3 or 4 Nvidia GPU's of the same kind in the same box would be very revealing also. Just check you GPU usage for each GPU being used.

Any help would be greatly appreciated.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 203 - Posted: 22 Jun 2017, 15:46:22 UTC - in response to Message 202.

Or maybe i'm jumping the gun, so far no one else has confirmed that they even have this problem.

From my part, I can do the following:
1) Redirect the command line arguments passed to the app to the log, so you'll see (on the result's page), what GPU number was passed to the app via --device.
2) Include the selected GPU number and the total number, ngpu, of all CUDA devices obtained with cudaGetDeviceCount(%ngpu) to the log.
So instead of:
Selected CUDA device:
you'll see
Selected CUDA device #0 of 1:

This should help, I think.

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 206 - Posted: 22 Jun 2017, 16:03:11 UTC - in response to Message 203.

Good one,

We'll see what happens now,

whatever happens thank you for the help Vlad :)

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 208 - Posted: 22 Jun 2017, 19:42:46 UTC - in response to Message 206.

Good one,

We'll see what happens now,

whatever happens thank you for the help Vlad :)

Ок, I got it. The BOINC client does not pass --device argument to the app anymore, so this page is outdated. I should get the device number from the APP_INIT_DATA structure instead. As it is said here. It's strange that the BOINC documentation contains two mutually exclusive pages.

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 209 - Posted: 22 Jun 2017, 20:37:44 UTC - in response to Message 206.
Last modified: 22 Jun 2017, 20:43:59 UTC

Nope same problem :(

This is the original PC with 2x GTX 970's.. http://xansons4cod.com/xansons4cod/show_host_detail.php?hostid=782

This is the latest PC I just attached that has 2 x GTX 580's. It's completely different to the other PC and is much older too. Still works fine though.. http://xansons4cod.com/xansons4cod/show_host_detail.php?hostid=794

Both these PC's now have the same problem as mentioned in previous posts. It's back to the drawing board i'm affraid.

All my rigs are dedicated full time crunchers so If I can't find a fix within the next day or so i'll try using an exclude statement in cc_config. That way at least I could get back the non functioning GPUs for other projects. Well that's the plan anyway.

I have two other PC's attached, one has just one GPU so it's fine & the other is a mixed rig with two completely different GPU's(AMD HD 7970 + a 1080 ti) and it works fine too.

But in the end why would two completely different PC's behave the same? the only similarity is they both have a pair identical Nvidia GPU's in them?

Anyhoo it's time for bed here so i'll get back to it later. Any thoughts would be thankfully received.

Cheers..
____________


Crunching today for a better today.

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 210 - Posted: 22 Jun 2017, 21:09:26 UTC - in response to Message 209.
Last modified: 22 Jun 2017, 21:14:23 UTC

Whoo Hoo we have a winner..

The latest release app v1.02 fixed the above mention problems.

Thanks Vlad :)
____________


Crunching today for a better today.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 211 - Posted: 22 Jun 2017, 21:12:44 UTC - in response to Message 209.

Nope same problem :(

No, I think it's working. Check out these two results:
http://xansons4cod.com/xansons4cod/result.php?resultid=1571856 device #0
http://xansons4cod.com/xansons4cod/result.php?resultid=1570978 device #1
The different computational time is caused by different size of WUs: "...solid material..." WUs are much harder to compute than "...single_particle..." ones.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 212 - Posted: 22 Jun 2017, 21:14:06 UTC - in response to Message 210.
Last modified: 22 Jun 2017, 21:14:33 UTC

Whoo Hoo we have a winner..

The latest release app v1.02 fixed the above mention problems.

Thanks Vlad :)

Thanks for reporting this! All multi-gpu Nvidia users were affected.

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 213 - Posted: 22 Jun 2017, 21:53:25 UTC - in response to Message 212.
Last modified: 22 Jun 2017, 21:58:51 UTC

Good detective work there Vlad, enjoy your victory..

I'm just glad it wasn't a problem that was only on my systems ;)

mmonnin
Send message
Joined: 28 Nov 16
Posts: 19
Credit: 5,313,490
RAC: 0
Message 215 - Posted: 23 Jun 2017, 2:42:22 UTC - in response to Message 212.

Whoo Hoo we have a winner..

The latest release app v1.02 fixed the above mention problems.

Thanks Vlad :)

Thanks for reporting this! All multi-gpu Nvidia users were affected.


I must say, its great to see such an active project admin. I'd much rather participate in a project where questions get answered and there is someone around to fix/update the project. Sometimes its disheartening to see questions go unanswered or problems unsolved for months at a time at other projects.

Cheers!

JugNut
Send message
Joined: 21 Jun 17
Posts: 23
Credit: 64,280,261
RAC: 284
Message 221 - Posted: 24 Jun 2017, 8:20:10 UTC - in response to Message 215.

Whoo Hoo we have a winner..

The latest release app v1.02 fixed the above mention problems.

Thanks Vlad :)

Thanks for reporting this! All multi-gpu Nvidia users were affected.


I must say, its great to see such an active project admin. I'd much rather participate in a project where questions get answered and there is someone around to fix/update the project. Sometimes its disheartening to see questions go unanswered or problems unsolved for months at a time at other projects.

Cheers!


Well said mmonnin, I couldn't agree more.

ChristianVirtual
Send message
Joined: 9 Aug 17
Posts: 1
Credit: 146,461
RAC: 0
Message 407 - Posted: 11 Aug 2017, 4:41:46 UTC

I never get any nV WU on my dual-rig with 980Ti/1080Ti under CentOS with CUDA 8.

Working well on both GPU with other applications like GPUgrid or SETI.

From the event log
Fri 11 Aug 12:50:01 2017 | XANSONS for COD | Sending scheduler request: To fetch work.
Fri 11 Aug 12:50:01 2017 | XANSONS for COD | Requesting new tasks for NVIDIA GPU
Fri 11 Aug 12:50:03 2017 | XANSONS for COD | Scheduler request completed: got 0 new tasks


CPU-WUs are assigned and crunched ok.

Is this linked ?

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 408 - Posted: 11 Aug 2017, 11:10:29 UTC - in response to Message 407.
Last modified: 11 Aug 2017, 12:17:31 UTC

I never get any nV WU on my dual-rig with 980Ti/1080Ti under CentOS with CUDA 8.

Working well on both GPU with other applications like GPUgrid or SETI.

From the event log
Fri 11 Aug 12:50:01 2017 | XANSONS for COD | Sending scheduler request: To fetch work.
Fri 11 Aug 12:50:01 2017 | XANSONS for COD | Requesting new tasks for NVIDIA GPU
Fri 11 Aug 12:50:03 2017 | XANSONS for COD | Scheduler request completed: got 0 new tasks


CPU-WUs are assigned and crunched ok.

Is this linked ?

I do not know why you are not getting WUs but I think this is not connected with dual-GPU system. On Cent OS 7 with Linux kernel 3.10 you should be able to get OpenCL app (CUDA app works only on kernel 3.16 or later). So, your plan class is opencl_nvidia_102_linux. Here is the plan class configuration:
<plan_class> <name>opencl_nvidia_102_linux</name> <gpu_type>nvidia</gpu_type> <opencl/> <min_core_client_version>70000</min_core_client_version> <min_opencl_version>102</min_opencl_version> <min_gpu_ram_mb>512</min_gpu_ram_mb> <gpu_peak_flops_scale>.99</gpu_peak_flops_scale> <cpu_frac>.05</cpu_frac> <min_os_version>30700</min_os_version> <max_os_version>31599</max_os_version> </plan_class>

Your host meets the requirements.
If you use app_config.xml, check your plan class, it must be opencl_nvidia_102_linux and not cuda65_linux. Also, check your project preferences, make sure that receiving jobs for xansons_boinc_gpu app is enabled.

Message boards : Nvidia : Only 1 of 2 Nvidia GPU's running


Main page · Your account · Message boards


© 2019 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)