WU's too small for GPU
log in

Advanced search

Message boards : Windows : WU's too small for GPU

Previous · 1 · 2 · 3 · Next
Author Message
Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 245 - Posted: 25 Jun 2017, 19:23:11 UTC - in response to Message 244.

I ran 4x with MW as well since they are so short as well and there is some CPU crunch time at the end.

Why at the end? It should be in the beginning, http://xansons4cod.com/xansons4cod/result.php?resultid=2245397. The line:
Atomic ensemble calculation time: 3.96273 s

This one is done on the CPU, so even the GPU app does some computations on the CPU. At the end the app only writes the results to the file, it does nothing more.

Also, I can't agree that the new application is worse than the previous one, because it does not waste the CPU power while doing the calculations on the GPU as the previous one did.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 247 - Posted: 26 Jun 2017, 0:09:09 UTC
Last modified: 26 Jun 2017, 0:24:16 UTC

I have noticed an immediate huge difference in the 1.03 OCL app, it immediately locked up my computer!

Couldn't access anything. (I suspect that the memory requirements when through the roof)

Did a hard reboot and suspended the program once Boinc manager started. That ended the lockup.

Reset the app-config to run 1x1 (1GPU x 1CPU) runs fine but the memory usage had tripled. Reset it for .5x.5 memory usage doubled, reset it to .33x.33 and the memory usage went up 50%

I can no longer run 4x much less the 5x I was running memory usage runs amok tying up every bit of memory I have over the 1.02 app

Currently running 3x and it's using 14gig of memory where previously running 5x it was using 7gig. Trying to run 5x now hard locks the computer...

Ok, Reconfigured a few things on the machine, it's now running 5x again using a little more than half the memory (10 gig) and 13% of the CPU. It's using a lot more CPU than it was before.... (1-2% in the 1.02 app)

Don't know exactly what you've changed and I've yet to see if it is faster or not. So far not enough data to tell...

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 248 - Posted: 26 Jun 2017, 0:31:22 UTC - in response to Message 247.

I have noticed an immediate huge difference in the 1.03 OCL app, it immediately locked up my computer!

Thanks for the report! I've marked all 1.03 OpenCL versions as beta. I apologize for any inconvenience. Actually, 1.03 is almost the same as 1.02 if --nowait option is not specified, so I'm confused.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 249 - Posted: 26 Jun 2017, 0:41:46 UTC
Last modified: 26 Jun 2017, 0:44:37 UTC

I don't know what changed, what I do know is the resources that were being used on my hardware.

1.02 was using almost no CPU so it was locked into 10 cores on one processor and was actually only using 5 of them. and it was using around 7 gig of memory for all 20 client instances

Now, with 1.03, it's running 20 but it's accessing all of the processors, (dual E5-2670v3's 48 cores/threads) and 11 gig of memory.

I haven't experimented with restricting core access as of yet to see how few it will run on or how it effects running as I want to get an idea of it's native speed in relation to what it was doing before the change...

For right now it's running wide open...

And thank you for increasing the cache size, it really helps the feeding issues on this side...

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 250 - Posted: 26 Jun 2017, 0:56:28 UTC - in response to Message 249.

1.02 was using almost no CPU so it was locked into 10 cores on one processor and was actually only using 5 of them. and it was using around 7 gig of memory for all 20 client instances

Now, with 1.03, it's running 20 but it's accessing all of the processors, (dual E5-2670v3's 48 cores/threads) and 11 gig of memory.

Memory consumption also depends on the initial data. It can be up to 1 - 1.5 GB per WU. There are some really 'heavy' structures with very high atomic density in the database. Can you give me the link to WUs that caused the trouble?

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 251 - Posted: 26 Jun 2017, 1:02:51 UTC - in response to Message 250.
Last modified: 26 Jun 2017, 8:51:23 UTC

Memory consumption also depends on the initial data. It can be up to 1 - 1.5 GB per WU. There are some really 'heavy' structures with very high atomic density in the database. Can you give me the link to WUs that caused the trouble?

This one is heavy for example. The memory consumption is almost 1GB. And this does not depend on the app version. With CUDA it also was 1GB.
So, I will check both 1.02 and 1.03 versions on this WU tomorrow.

Update
This one was calculated with 1.02. It used more than 1 GB of memory. So, it was a coincidence that several such WUs were executed in parallel with 1.03. It could be with 1.02 as well.
Also, the WU having 'solid_material' in their names always consume a lot of memory (on both CPU and GPU). So, it should be considered that one WU can consume up to 1.5 GB of memory at worst.

Update 2
I checked that the 1.03 app uses exactly the same amount of memory as 1.02 does. So, it was a coincidence that you got several memory consuming WU at the same time with 1.03 and not with 1.02. I'll try to reduce memory consumption in the next version. For now it can be up to 1.5 GB per WU depending on the data, so it's not safe to run 16 WUs in parallel on a system with 16 GB of RAM.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 258 - Posted: 26 Jun 2017, 13:41:35 UTC - in response to Message 251.


Update 2
I checked that the 1.03 app uses exactly the same amount of memory as 1.02 does. So, it was a coincidence that you got several memory consuming WU at the same time with 1.03 and not with 1.02. I'll try to reduce memory consumption in the next version. For now it can be up to 1.5 GB per WU depending on the data, so it's not safe to run 16 WUs in parallel on a system with 16 GB of RAM.


I can confirm that is was a one time one system issue.

My Blackhawk-4 made the changeover without issue and it's the exact same system to Blackhawk-1 (which had the memory lockup issue)

After running for a good 8 hours wide open without issue, I've reset settings to what they were originally under the 1.02 app and it's crunching along fine without issue.

Still haven't nailed down what caused the problem, (probably never will) but it has cleared and is not repeating.... The error logs are not revealing much.

Looks to be a single issue on a single system and not a program error.

If there is a way to ease up the memory requirements that would make the project even more efficient especially on AMD hardware and windows....

I don't think it was a memory leak.....

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 259 - Posted: 26 Jun 2017, 14:14:22 UTC - in response to Message 258.

If there is a way to ease up the memory requirements that would make the project even more efficient especially on AMD hardware and windows....

The new apps will be released in a few hours. They use twice less memory. Also they release CPU memory immediately after the data transfered to the GPU. Note that you will still need about 700 - 800 MB per WU to run them safely.

mmonnin
Send message
Joined: 28 Nov 16
Posts: 19
Credit: 5,313,048
RAC: 0
Message 261 - Posted: 26 Jun 2017, 20:55:36 UTC - in response to Message 245.

I ran 4x with MW as well since they are so short as well and there is some CPU crunch time at the end.

Why at the end? It should be in the beginning, http://xansons4cod.com/xansons4cod/result.php?resultid=2245397. The line:
Atomic ensemble calculation time: 3.96273 s

This one is done on the CPU, so even the GPU app does some computations on the CPU. At the end the app only writes the results to the file, it does nothing more.

Also, I can't agree that the new application is worse than the previous one, because it does not waste the CPU power while doing the calculations on the GPU as the previous one did.


The Milkyway project does some CPU calculations at the end of each of its GPU tasks. It was an example of very short tasks on another project needing to run multiple tasks at once to keep GPUs busy.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 262 - Posted: 26 Jun 2017, 21:01:08 UTC - in response to Message 259.

If there is a way to ease up the memory requirements that would make the project even more efficient especially on AMD hardware and windows....

The new apps will be released in a few hours. They use twice less memory. Also they release CPU memory immediately after the data transfered to the GPU. Note that you will still need about 700 - 800 MB per WU to run them safely.


Now running some 1.04 WU's, they seem to be running even smoother than the 1.02/03 WU's....

EXCELLENT!!!!

Blackhawk-1 is still running 5x and is only showing 3 gig of memory usage. (running 1.04's)....

The other three machines are running through their cache's of 1.03 but are receiving 1.04's

GOOD DEAL!!!

Thank you....

Question, should I abort the .03's or just let them run themselves out....

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 264 - Posted: 26 Jun 2017, 21:15:34 UTC - in response to Message 262.

Now running some 1.04 WU's, they seem to be running even smoother than the 1.02/03 WU's....

Sorry, I hurried to much to release them and skipped some routine testing. They produce invalid results on some machines and I rolled back to version 1.03 for now.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 271 - Posted: 27 Jun 2017, 3:09:44 UTC - in response to Message 264.
Last modified: 27 Jun 2017, 3:14:24 UTC

Now running some 1.04 WU's, they seem to be running even smoother than the 1.02/03 WU's....

Sorry, I hurried to much to release them and skipped some routine testing. They produce invalid results on some machines and I rolled back to version 1.03 for now.


I see we are up at version 1.05 now. Back to running like it was before smooth and stable at 5x. In fact in testing I had it up to 7x before it started showing instability. Of course at 7x it runs a lot slower, but it was stable.

One thing I've noticed, in the app_config file, the line to regulate the CPU percentage, doesn't have any effect on the app in Boinc manager at all. Zero none. I have set the CPU ranging from .2 to 1.0 and it has no effect at all.

I'm using Process Lasso to set the CPU usage and it appears to take two cores/threads for every GPU. In my case that would be ten cores/threads. (two cores x five parallel WU per GPU)

The funny part is Process Lasso shows it only using five of the allotted cores/threads. And, since the processes are using only one core per five GPU processes, so since it is only using five cores in actuality, when I cut Process Lasso to five cores it throttles the process down to about half speed.

It needs the two cores per GPU. This is on the Intel DP, Also it doesn't matter if it is a physical core or a logical core. It will use either, but it HAS to have 2 per GPU.

On the other hand, the FX8350 boxes don't have this behavior. They operate like normal and the app_config file works as normal, the CPU line works to limit CPU usage per core.

Interesting to say the least.

I wonder what is causing the difference?

JugNut
Send message
Joined: 21 Jun 17
Posts: 20
Credit: 63,544,204
RAC: 5,157
Message 273 - Posted: 27 Jun 2017, 11:18:36 UTC - in response to Message 271.

Hey Vlad GPU work has dried up again. Server stats show zero ready to send

The last lot of work units were very small and were completing faster than they could be downloaded.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 275 - Posted: 27 Jun 2017, 11:52:32 UTC - in response to Message 273.
Last modified: 27 Jun 2017, 11:57:05 UTC

Hey Vlad GPU work has dried up again. Server stats show zero ready to send

The last lot of work units were very small and were completing faster than they could be downloaded.

Yes, I had to limit the number of new WUs generated each 5 minutes because the server is overburden with result's validation.
Almost all materials with IDs starting with 152 and 153 are non-organic. The simulation of diffraction patterns is very fast for them. Fortunately, the majority of the materials in the Crystallography Open Database is organic.

JugNut
Send message
Joined: 21 Jun 17
Posts: 20
Credit: 63,544,204
RAC: 5,157
Message 276 - Posted: 27 Jun 2017, 12:01:45 UTC - in response to Message 275.
Last modified: 27 Jun 2017, 12:02:18 UTC

Thanks again for your quick response Vlad. On my faster GPU's the work unit cache looks to be slowly filling again.

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 278 - Posted: 27 Jun 2017, 12:36:51 UTC
Last modified: 27 Jun 2017, 13:00:45 UTC

Been watching this for about an hour now, mine are barely able to stay operating.....

Blackhawk-4 is barely staying ahead of idle with 0 cache....

UPDATE:

Ok, the cache's are beginning to fill now, although when they get a bunch of smaller WU's they still seem to empty quicker.

One thing you might consider is increasing the client side cache again....

But I know that to a certain point that has diminishing benefits.

I understand the project needs, but small wu's put a very heavy load on the server...

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 274
Credit: 103,382
RAC: 0
Message 281 - Posted: 27 Jun 2017, 13:06:20 UTC - in response to Message 278.

Been watching this for about an hour now, mine are barely able to stay operating.....

Blackhawk-4 is barely staying ahead of idle with 0 cache....

I think the best way is to run the WU from this project in parallel with the WUs from the other projects, because this project is really small. With the current processing power, all the 13 million WUs will be completed in 2 months.

JugNut
Send message
Joined: 21 Jun 17
Posts: 20
Credit: 63,544,204
RAC: 5,157
Message 283 - Posted: 27 Jun 2017, 13:28:20 UTC - in response to Message 278.
Last modified: 27 Jun 2017, 13:29:40 UTC

Luckily for the moment there are some bigger work units coming through and that's helping the cache situation a little.

In the long run though, that's the real question??

EG
Avatar
Send message
Joined: 22 Jun 17
Posts: 15
Credit: 12,727,586
RAC: 0
Message 284 - Posted: 27 Jun 2017, 13:43:11 UTC - in response to Message 281.

I think the best way is to run the WU from this project in parallel with the WUs from the other projects, because this project is really small. With the current processing power, all the 13 million WUs will be completed in 2 months.


There is a LOT of processing power in the Boinc user base, a lot more than many realize. I can take a few machines off and run something else.... No problem.

The project is stable and scales beautifully, at least on the GPU side, Good Job!

Is there any prospect for a larger project in the foreseeable future?

JugNut
Send message
Joined: 21 Jun 17
Posts: 20
Credit: 63,544,204
RAC: 5,157
Message 285 - Posted: 27 Jun 2017, 15:21:35 UTC - in response to Message 284.
Last modified: 27 Jun 2017, 15:38:58 UTC

I think the best way is to run the WU from this project in parallel with the WUs from the other projects, because this project is really small. With the current processing power, all the 13 million WUs will be completed in 2 months.


There is a LOT of processing power in the Boinc user base, a lot more than many realize. I can take a few machines off and run something else.... No problem.

The project is stable and scales beautifully, at least on the GPU side, Good Job!

Is there any prospect for a larger project in the foreseeable future?


How true, Vlad has done a great job and you're right there is tons of power available on boinc.

@Vlad: You seem stuck between a rock & a hard place, you could go the expensive route & buy more server power or even start restricting work to only 'X' amount of WU's per host but then that would only spread the pain & make things worse.

I guess you could close the site to new members as you already have more than enough firepower already? None of those are particularly appealing options.

Happily though workflow is little better at the moment :)

Maybe talk to the project admin at Milky way they have been running huge amounts of small WU's just like yours for years on end, they are sure to have some tips to help ease the load. http://milkyway.cs.rpi.edu/milkyway/

All the best.

Previous · 1 · 2 · 3 · Next

Message boards : Windows : WU's too small for GPU


Main page · Your account · Message boards


© 2018 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)