No tasks sent to Linux CentOS 7 hosts
log in

Advanced search

Message boards : Linux : No tasks sent to Linux CentOS 7 hosts

1 · 2 · Next
Author Message
Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 147 - Posted: 19 Jun 2017, 18:59:51 UTC
Last modified: 19 Jun 2017, 19:00:02 UTC

I have two Linux CentOS 7 hosts attached, one of them with GPU. For some reason they do not get any work. One of them runs tasks from backup project only so it should get something. I also have Windows host attached, and that one gets tasks. Please take a look on this.
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 148 - Posted: 19 Jun 2017, 19:35:05 UTC - in response to Message 147.
Last modified: 19 Jun 2017, 19:40:05 UTC

I have two Linux CentOS 7 hosts attached, one of them with GPU. For some reason they do not get any work. One of them runs tasks from backup project only so it should get something. I also have Windows host attached, and that one gets tasks. Please take a look on this.

Linux apps were built on Debian 8 with 3.16 linux kernel. For compatibility reasons this kernel version is set as minimal supported one in all linux plan classes for this project.
I think, CentOS 7 is based on 3.10 kernel.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 151 - Posted: 19 Jun 2017, 20:24:49 UTC - in response to Message 148.
Last modified: 19 Jun 2017, 20:42:43 UTC

I have two Linux CentOS 7 hosts attached, one of them with GPU. For some reason they do not get any work. One of them runs tasks from backup project only so it should get something. I also have Windows host attached, and that one gets tasks. Please take a look on this.

Linux apps were built on Debian 8 with 3.16 linux kernel. For compatibility reasons this kernel version is set as minimal supported one in all linux plan classes for this project.
I think, CentOS 7 is based on 3.10 kernel.

Yes, they are on kernel 3.10.0-514.21.1.el7.x86_64 now. I use them for months, crunched various projects on them and did not have any significant issues. I recall some issues with 32-bit apps (had to install extra 32-bit libs) and special permissions needed to access USB devices used by apps, but I managed to resolve them all. So it should be safe to run tasks on this system too.

Edit: I have manually downloaded app and created app_info.xml for it. Unfortunately server refuses this, I got following reply from server:

This project doesn't support computers of type anonymous

I also checked if all required libs are in place and it turned out that newer glibc (GLIBCXX_3.4.20) is needed. But this problem can be resolved if you recompile app with options -static-libgcc -static-libstdc++ or if I recompile it locally on my host.

BTW, this server configuration will not allow to crunch on any ARM devices, they will not be able to download any WUs too.
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 155 - Posted: 19 Jun 2017, 22:31:48 UTC - in response to Message 151.


Edit: I have manually downloaded app and created app_info.xml for it. Unfortunately server refuses this, I got following reply from server:

This project doesn't support computers of type anonymous

I've added the anonymous platform to project.xml and run xadd. Is that helped?

I also checked if all required libs are in place and it turned out that newer glibc (GLIBCXX_3.4.20) is needed. But this problem can be resolved if you recompile app with options -static-libgcc -static-libstdc++ or if I recompile it locally on my host.

Through some trial and error I finally succeeded to link the linux app for CPU statically to libstdc++. Note that simple approach with -static-libgcc -static-libstdc++ didn't work in my case for some reasons. I need to do some testing and if everything will be ok I'll update the linux apps.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 157 - Posted: 19 Jun 2017, 23:04:56 UTC - in response to Message 155.


Edit: I have manually downloaded app and created app_info.xml for it. Unfortunately server refuses this, I got following reply from server:

This project doesn't support computers of type anonymous

I've added the anonymous platform to project.xml and run xadd. Is that helped?

No, I still get the same error from server.

I also checked if all required libs are in place and it turned out that newer glibc (GLIBCXX_3.4.20) is needed. But this problem can be resolved if you recompile app with options -static-libgcc -static-libstdc++ or if I recompile it locally on my host.

Through some trial and error I finally succeeded to link the linux app for CPU statically to libstdc++. Note that simple approach with -static-libgcc -static-libstdc++ didn't work in my case for some reasons. I need to do some testing and if everything will be ok I'll update the linux apps.

Good to know! Let me know when it will be ready. Or maybe upload it somewhere, so I could take it and put in on my host?
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 158 - Posted: 20 Jun 2017, 0:19:18 UTC - in response to Message 157.

Good to know! Let me know when it will be ready. Or maybe upload it somewhere, so I could take it and put in on my host?

Can you try them out on your CentOS host? It would be great! Here is a link. The archive includes input data required to run the test.
You can run the apps in standalone mode. To run the CPU version, do:

./xansons_boinc_OMP --nthreads N

where N is the number of parallel CPU threads.

To run the CUDA version, do:

./xansons_boinc_CUDA --device N

where N is the device number (0 if you have only one GPU).

Also you can run the OpenCL version:

./xansons_boinc_OCL

OpenCL version will work on Nvidia GPU too.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 162 - Posted: 20 Jun 2017, 7:50:57 UTC - in response to Message 158.

Good to know! Let me know when it will be ready. Or maybe upload it somewhere, so I could take it and put in on my host?

Can you try them out on your CentOS host? It would be great! Here is a link. The archive includes input data required to run the test.
You can run the apps in standalone mode. To run the CPU version, do:

./xansons_boinc_OMP --nthreads N

where N is the number of parallel CPU threads.

To run the CUDA version, do:

./xansons_boinc_CUDA --device N

where N is the device number (0 if you have only one GPU).

Also you can run the OpenCL version:

./xansons_boinc_OCL

OpenCL version will work on Nvidia GPU too.

I tried, bud all versions still require GLIBCXX_3.4.20. Results for other binaries are the similar:
# ldd -d -r xansons_boinc_CUDA ./xansons_boinc_CUDA: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./xansons_boinc_CUDA) linux-vdso.so.1 => (0x00007ffd235a2000) libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fcb7c079000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcb7be5d000) librt.so.1 => /lib64/librt.so.1 (0x00007fcb7bc54000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fcb7ba50000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fcb7b747000) libm.so.6 => /lib64/libm.so.6 (0x00007fcb7b444000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcb7b22e000) libc.so.6 => /lib64/libc.so.6 (0x00007fcb7ae6d000) /lib64/ld-linux-x86-64.so.2 (0x00007fcb7ca6c000) libnvidia-fatbinaryloader.so.367.57 => /lib64/libnvidia-fatbinaryloader.so.367.57 (0x00007fcb7ac1e000) symbol _ZSt24__throw_out_of_range_fmtPKcz, version GLIBCXX_3.4.20 not defined in file libstdc++.so.6 with link time reference (./xansons_boinc_CUDA)

All versions needs this one symbol:
# c++filt _ZSt24__throw_out_of_range_fmtPKcz std::__throw_out_of_range_fmt(char const*, ...)

____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 164 - Posted: 20 Jun 2017, 11:58:12 UTC - in response to Message 162.

All versions needs this one symbol:
# c++filt _ZSt24__throw_out_of_range_fmtPKcz std::__throw_out_of_range_fmt(char const*, ...)

Ок, I rebuilt BOINC libraries and now able to build CPU and OpenCL versions with -static-libgcc -static-libstdc++. Still no success with CUDA version, just cannot link with g++ to cudart_static (linking to cudart works however) and nvcc does not understand -static-libgcc -static-libstdc++ options. Can you try the new CPU and OpenCL executables?

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 166 - Posted: 20 Jun 2017, 13:59:06 UTC - in response to Message 164.

All versions needs this one symbol:
# c++filt _ZSt24__throw_out_of_range_fmtPKcz std::__throw_out_of_range_fmt(char const*, ...)

Ок, I rebuilt BOINC libraries and now able to build CPU and OpenCL versions with -static-libgcc -static-libstdc++. Still no success with CUDA version, just cannot link with g++ to cudart_static (linking to cudart works however) and nvcc does not understand -static-libgcc -static-libstdc++ options. Can you try the new CPU and OpenCL executables?

Thanks, I will test this later today after I return home.

I searched a bit and found that it is possible to use g++ instead of nvcc for linking. By doing so you would be able to link glibc/libstdc++ statically. Please check these links:
https://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
https://stackoverflow.com/questions/9421108/how-can-i-compile-cuda-code-then-link-it-to-a-c-project.
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 167 - Posted: 20 Jun 2017, 14:54:40 UTC - in response to Message 166.

Thanks, I will test this later today after I return home.

I searched a bit and found that it is possible to use g++ instead of nvcc for linking. By doing so you would be able to link glibc/libstdc++ statically. Please check these links:
https://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
https://stackoverflow.com/questions/9421108/how-can-i-compile-cuda-code-then-link-it-to-a-c-project.

Thank you!

In these two threads, they link to cudart and not to cudart_static. I can do that too, but g++ does not work with cudart_static. I'm not sure whether the cudart library is present on any linux hosts which have nvidia drivers installed. If not, the executable will not be portable.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 169 - Posted: 20 Jun 2017, 15:50:09 UTC - in response to Message 167.
Last modified: 20 Jun 2017, 15:58:47 UTC

Thanks, I will test this later today after I return home.

I searched a bit and found that it is possible to use g++ instead of nvcc for linking. By doing so you would be able to link glibc/libstdc++ statically. Please check these links:
https://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
https://stackoverflow.com/questions/9421108/how-can-i-compile-cuda-code-then-link-it-to-a-c-project.

Thank you!

In these two threads, they link to cudart and not to cudart_static. I can do that too, but g++ does not work with cudart_static. I'm not sure whether the cudart library is present on any linux hosts which have nvidia drivers installed. If not, the executable will not be portable.

nvcc internally calls host linker (e.g. /bin/ld), so you should be able to use cudart_static with g++ too. Please try to add option --verbose when calling it, you should see options passed to ld. gcc/g++ also supports this option, so you should be able to find what is missing or different when linking with g++.

Edit: nvcc also supports -Xlinker option which allows to pass any option to linker, so maybe you will be able to somehow use it to link statically with glibc/libstdc++.
____________

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 174 - Posted: 20 Jun 2017, 19:30:29 UTC
Last modified: 20 Jun 2017, 19:35:39 UTC

OK, I am back. I have just tried new CPU and OpenCL versions, and looks that they work fine. Here are messages logged to std*.txt files:

===[stdout from OCL version]===
Parsing calculation parameters...
Calculation-->PrintAtoms is set to No.
Calculation-->PolarFactor is set to No.
Calculation-->wavelength is set to default.
Calculation-->hist_bin is set to 0.001.
Sample-->Rcutoff is set to 0.

Parsing Block 0...
Block-->centered is set to No.
Atoms-->filename is set to default value.
Atom-->occ is set to 1.0.
Atom-->Uiso is set to 0.
Atom-->occ is set to 1.0.
Atom-->Uiso is set to 0.

All blocks have been parsed.

Reading NaCl_FFtable.txt
Atomic ensemble calculation time: 0.00506988 s

Selected OpenCL device:
GPU: GeForce GTX 970
Number of compute units: 13
GPU clock rate: 1177 MHz
Theoretical peak performance: 3917 GFLOPs

Histogram calculation time: 0.00126214 s
1D pattern calculation time: 0.24491 s
Total calculation time: 0.51707 s

===[stderr from OCL version]===
shmget in attach_shmem: Invalid argument
21:23:33 (12065): Can't set up shared mem: -1. Will run in standalone mode.
21:23:34 (12065): called boinc_finish(0)

===[stdout from OAM version]===
Number of OpenMP threads is set to 32

Parsing calculation parameters...
Calculation-->PrintAtoms is set to No.
Calculation-->PolarFactor is set to No.
Calculation-->wavelength is set to default.
Calculation-->hist_bin is set to 0.001.
Sample-->Rcutoff is set to 0.

Parsing Block 0...
Block-->centered is set to No.
Atoms-->filename is set to default value.
Atom-->occ is set to 1.0.
Atom-->Uiso is set to 0.
Atom-->occ is set to 1.0.
Atom-->Uiso is set to 0.

All blocks have been parsed.

Reading NaCl_FFtable.txt
Atomic ensemble calculation time: 0.00522938 s

Histogram calculation time: 2.20116 s
1D pattern calculation time: 0.109375 s
Total calculation time: 2.31856 s

===[stderr from OAM version]===
shmget in attach_shmem: Invalid argument
21:25:13 (12136): Can't set up shared mem: -1. Will run in standalone mode.
21:25:15 (12136): called boinc_finish(0)

Could you load them on server as "official" ones, or change something to allow anonymous platform? Server still rejects it.

Edit: looks that libcuda.so is installed together with driver:
# ls -l /lib64/libcuda* lrwxrwxrwx. 1 root root 12 06-11 13:34 /lib64/libcuda.so -> libcuda.so.1 lrwxrwxrwx. 1 root root 17 06-11 13:34 /lib64/libcuda.so.1 -> libcuda.so.367.57 -rwxr-xr-x. 1 root root 8227752 06-11 13:34 /lib64/libcuda.so.367.57

____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 177 - Posted: 20 Jun 2017, 23:40:24 UTC - in response to Message 174.

OK, I am back. I have just tried new CPU and OpenCL versions, and looks that they work fine. Here are messages logged to std*.txt files:

Thank you very much!

Could you load them on server as "official" ones, or change something to allow anonymous platform? Server still rejects it.

I updated the apps and the preferences for linux plan classes, now you should be able to recieve CPU tsaks on your CentOS hosts. If not, then I probably should restart the scheduler.

Edit: looks that libcuda.so is installed together with driver:

But not libcudart... However, there is an obvious workaround. I need to create additional linux plan class for Nvidia with OpenCL.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 180 - Posted: 21 Jun 2017, 6:08:38 UTC - in response to Message 177.

Could you load them on server as "official" ones, or change something to allow anonymous platform? Server still rejects it.

I updated the apps and the preferences for linux plan classes, now you should be able to recieve CPU tsaks on your CentOS hosts. If not, then I probably should restart the scheduler.

Thanks! I see that one host without app_info.xml got and finished some tasks. Two failed because of "finish file present too long" BOINC bug, all other passed validation.

Edit: looks that libcuda.so is installed together with driver:

But not libcudart... However, there is an obvious workaround. I need to create additional linux plan class for Nvidia with OpenCL.

Yes, this is another possible solution. There is also one more which came to my mind now: try to setup virtual machine with CentOS and use it to build CUDA app.

Could you paste error which you get when you try to link with cudart_static lib using g++? Maybe I will be able to help you with this.
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 182 - Posted: 21 Jun 2017, 9:55:52 UTC - in response to Message 177.

However, there is an obvious workaround. I need to create additional linux plan class for Nvidia with OpenCL.

Done. Now, anyone who has linux with kernel verion < 3.16 and Nvidia GPU will get OpenCL app instead of CUDA.

Profile [B@P] Daniel
Send message
Joined: 18 Jun 17
Posts: 25
Credit: 47,963,162
RAC: 0
Message 194 - Posted: 22 Jun 2017, 6:16:03 UTC - in response to Message 182.

However, there is an obvious workaround. I need to create additional linux plan class for Nvidia with OpenCL.

Done. Now, anyone who has linux with kernel verion < 3.16 and Nvidia GPU will get OpenCL app instead of CUDA.

Thanks! Nvidia GPU app also works under Linux now.
____________

Profile Bryan
Send message
Joined: 15 Nov 16
Posts: 4
Credit: 112,270,210
RAC: 65,824
Message 679 - Posted: 20 Jan 2019, 13:00:22 UTC
Last modified: 20 Jan 2019, 13:00:39 UTC

I ran into a problem yesterday with some new machines (AMD CPU) unable to pull CPU work on Linux w/ kernel 4.15 (Mint 19).

Other machines (intel) had no problem with either Windows or Linux (3.13).

I tried everything I could think of on this end but the project would not issue work. It kept saying "no cpu tasks available" but there is work for Nvida, ATI, etc.

Host:
6611
6612
6631

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 680 - Posted: 2 Feb 2019, 18:26:32 UTC - in response to Message 679.
Last modified: 2 Feb 2019, 20:25:56 UTC

I ran into a problem yesterday with some new machines (AMD CPU) unable to pull CPU work on Linux w/ kernel 4.15 (Mint 19).

Other machines (intel) had no problem with either Windows or Linux (3.13).

I tried everything I could think of on this end but the project would not issue work. It kept saying "no cpu tasks available" but there is work for Nvida, ATI, etc.

Host:
6611
6612
6631

I increased the scheduler’s debug level to see if there are any warnings regarding your hosts. Also, I changed the job sizing policy to increase the number of CPU jobs available for high-end CPUs. However, I cannot promise that it will help because I don't know yet whether the problem is related to job size matching or not.

Update. Seems it didn't help...

xii5ku
Send message
Joined: 10 Mar 18
Posts: 6
Credit: 26,382,764
RAC: 21,124
Message 704 - Posted: 16 Feb 2019, 20:04:54 UTC

I never received a single CPU job ever since I joined this project almost a year ago, but I always received Nvidia and Intel GPU jobs.

My CPU-only hosts run Gentoo Linux and OpenSUSE with 4.x kernels, x86-64. Their web preferences are set to run any application. (The Nvidia hosts run Linux Mint, and the Intel IGP is running under Win 7. These GPU hosts have their web preferences set to use only GPUs, not CPUs. Hence they never ask for CPU jobs, unlike my CPU-only hosts.)

xii5ku
Send message
Joined: 10 Mar 18
Posts: 6
Credit: 26,382,764
RAC: 21,124
Message 708 - Posted: 17 Feb 2019, 10:31:47 UTC - in response to Message 704.

Correction to my previous post:
In March...May 2018, I allowed my Nvidia/ Linux Mint hosts to pull CPU jobs along with GPU jobs, and they received CPU jobs indeed. Just my CPU-only hosts were never graced with a single Xansons task.

1 · 2 · Next

Message boards : Linux : No tasks sent to Linux CentOS 7 hosts


Main page · Your account · Message boards


© 2020 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)