457 new structures
log in

Advanced search

Message boards : News : 457 new structures

Author Message
Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 738 - Posted: 8 Jun 2019, 18:21:12 UTC
Last modified: 8 Jun 2019, 20:41:35 UTC

New COD entries are added.

Update. All WUs are sent to the hosts.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 739 - Posted: 8 Jun 2019, 20:42:00 UTC - in response to Message 738.
Last modified: 9 Jun 2019, 11:07:04 UTC

Ok, temporary server upgrade didn’t help. The limitation seems to be in the number of incoming requests per second (about 30,000 packages/s) and not in the Mbit/s. Sorry, I don’t know how to overcome this for now.

bluestang
Send message
Joined: 22 May 18
Posts: 6
Credit: 4,865,241
RAC: 95,448
Message 740 - Posted: 8 Jun 2019, 20:53:22 UTC - in response to Message 739.

No worries. We'll just have to live with it as crunchers.

Keep up the good work!

Profile marmot
Avatar
Send message
Joined: 2 May 19
Posts: 7
Credit: 127,881
RAC: 1,537
Message 741 - Posted: 9 Jun 2019, 1:53:27 UTC
Last modified: 9 Jun 2019, 2:04:57 UTC

Certainly needs to be babysat on these Saturdays.
I caught up on sleep and let BOINC handle it all.

Had 4 GPU's seeking work and got zero WU's of the 16,000+ that went out.

The CPU's got 240 WU (better than expected) compared to 1800 last fortnight.

incoming requests per second (about 30,000)


What's the minimum time you can add to the requests backoff (i.e. requests too recent, delaying 37 seconds) parameter?

I think the BOINC forums mentioned minimum was 6 seconds.

Not sure how many packets are being sent in the actual u/l - d/l procedures versus just the handshaking for WU requests. The backoff should reduce total requests per second though.

Richard Haselgrove at the BOINC forums would know how to solve your issue.

Profile marsinph
Send message
Joined: 20 Apr 18
Posts: 31
Credit: 1,747,148
RAC: 2,670
Message 742 - Posted: 9 Jun 2019, 20:53:40 UTC - in response to Message 741.

Hello Marmot,
I am a very active cruncher here. at least, I try.
With my hosts (six moderate hosts), I try to get as uch as possible WU.
With Vlad, and estimation, I alone, can crunch all WU in only 2 or 3 days. You also. I am sure. (I not mix COD and WU)

Vlad, have decide to make everyone equal.
By very strong limitation. I understand him.

You, probably like me, tou try to make manual update.
Also not forget up/down time and bottleneck
If you consider amount of host connected on PRJ (+/- 2750 actives) Let u say 3.000 (see below)

When Vlad speak about 30.000 request / second, you can understand that something is not normal. Not you, not me can make so much request by second !!!

3.000 host for 30.000 request/sec..... In the worst case : each user
need to make 10 request by second !!!.........

Impossible of course.


Then about you did not get any GPU, I suggest you read very, but very acrefully, the config.
It took me some months before I understood why. Now it is OK
On ATI RTX580, or 1060, or little GTX570.
All working.

Now this saturday it were less GPU WU as usual.

Best regards
____________

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 743 - Posted: 9 Jun 2019, 22:07:42 UTC - in response to Message 742.
Last modified: 9 Jun 2019, 22:08:04 UTC

3.000 host for 30.000 request/sec..... In the worst case : each user
need to make 10 request by second !!!.........

Sorry, I already made the correction in my post. It is actually 30.000 packeges/sec (this is what the AWS statistics says). I do not know how many packages are in a single work request but I'll try to find it out.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 744 - Posted: 9 Jun 2019, 22:16:45 UTC - in response to Message 741.
Last modified: 9 Jun 2019, 22:22:13 UTC

I think the BOINC forums mentioned minimum was 6 seconds.

You are probably talking about the "min_sendwork_interval" parameter. If I understand correctly, this parameter tells the client not to request new WUs earlier than X seconds after receiving the last one. But if the client has no work at all, it can request it as often as the client configuration allows to.
Thanks for the suggestions anyway.

Profile marmot
Avatar
Send message
Joined: 2 May 19
Posts: 7
Credit: 127,881
RAC: 1,537
Message 745 - Posted: 10 Jun 2019, 2:38:41 UTC - in response to Message 742.
Last modified: 10 Jun 2019, 2:52:29 UTC


Then about you did not get any GPU, I suggest you read very, but very acrefully, the config.


It's not a configuration issue.

All the GPU's got WU's on May 25th; enough to easily complete 100+ hours at WUProps for the XaNSoNS GPU WU. It was a failure by BOINC to request work quickly and a stalled d/l preventing any more requests. (I was sleeping).

I don't use BOINC Tasks (Emfer?) which supposedly has the auto update macro that requests work from project every few seconds. I'll look into using it for my next custom BOINC install.

Just relied on the additional 0.01 days of work setting and keeping the GPU and CPU work caches with only a few WU's to get standard BOINC Mgr to most frequently update.

Profile marmot
Avatar
Send message
Joined: 2 May 19
Posts: 7
Credit: 127,881
RAC: 1,537
Message 746 - Posted: 10 Jun 2019, 2:51:33 UTC - in response to Message 744.


Thanks for the suggestions anyway.


If Richard Haselgrove at BOINC main forums doesn't have a solution; the project LHC@Home uses VM's that are on a distributed CVFMS file system and have to deal with incoming hundreds of thousands of packets all day.
They constantly are confronted with packet flow headaches.
Laurence is their project administrator and might have some insights; he might refer you to computezrmle.

Hope the solution is found easily.

Profile marmot
Avatar
Send message
Joined: 2 May 19
Posts: 7
Credit: 127,881
RAC: 1,537
Message 747 - Posted: 15 Jun 2019, 5:54:30 UTC - in response to Message 744.
Last modified: 15 Jun 2019, 5:56:33 UTC

@Vlad

I think the BOINC forums mentioned minimum was 6 seconds.

You are probably talking about the "min_sendwork_interval" parameter. If I understand correctly, this parameter tells the client not to request new WUs earlier than X seconds after receiving the last one. But if the client has no work at all, it can request it as often as the client configuration allows to.
Thanks for the suggestions anyway.



Woke up one night after your response and had analyzed the stats of this.

My 27 machines got 1800 WU's on May 25th and each one's first request got WU's equal to the machine core counts (8 WU's). So the "min_sendwork_interval" did not effect the first 216 (~12%) of the work load to my machines; but the remaining 88% of work requested would have been limited by the parameter.

Taking a quick look at your usage statistics, given that even a computer asking for 7 days will be refused and get WU's equal to core counts; what percentage of requests would be 2nd or greater cache requests? (Thus effected by the parameter).
20%, 50%, 80%?

Profile marsinph
Send message
Joined: 20 Apr 18
Posts: 31
Credit: 1,747,148
RAC: 2,670
Message 748 - Posted: 15 Jun 2019, 14:14:45 UTC - in response to Message 743.

Hello Vlad,

The answer is in the TCP/IP protocol.

To know how many packets there are in a single WU, it depend of kind of connection also ISP !!! Yes Internet Sevice Provider (see below)

IPV4 :
Usualy Ethernet TCP/IP use 1500 bytes (include protocol 20 bytes, ipv6 : 40).
Also called MTU
FDDI is 4470 bytes.

To make very easy/
Let us start that one WU is 1200 bytes.
So the WU can be inside one packet (1500-20)
If the WU is 1490 bytes, it need 2 packets.
One with 1480 real WU data and one with 10 bytes real WU data.
The 1470 free are unused.


Why also ISP ?
Some of them reduce the packet to 1488. Why, to have a multiple of 8 and 16 !!!
Considering here above and the size, if locally, you not adapt the MTU to the size of ISP, it need often a resend of the latest bytes of the whole "message"./
Of course, since many years and the power of network, it is not sensible.
In the far past with a 56Kbps connection, it was very usefull.

Considering the WU are not from the same size, you understand, it is very difficult to make a correct calculation of packet amount a single WU need.
And, imagine all WU are perfect the same, you will be able to know the amount of package.
But never the connections problems between your server and all users.
I mean lost packets, resend because corrupt,...

Best regards




3.000 host for 30.000 request/sec..... In the worst case : each user
need to make 10 request by second !!!.........

Sorry, I already made the correction in my post. It is actually 30.000 packeges/sec (this is what the AWS statistics says). I do not know how many packages are in a single work request but I'll try to find it out.

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 749 - Posted: 16 Jun 2019, 21:46:08 UTC

First of all, thanks to everyone for sharing their ideas. With some help from participants I found out the following:

1) The number of requests per seconds causing the overload is hundreds, not thousands. However, hundreds of requests/s is also a big number for a single server (large projects usually have multiple servers and the load balancer).

2) The min_sendwork_interval parameter sets the time out delay even if the work was not sent to the host, so I was wrong about that. This parameter is set to 7 seconds for this project. However, increasing it will not solve the problem (see 3).

3) One can easily overcome the limitation set by the min_sendwork_interval parameter (as well as the limitation on maximum tasks in progress) by launching multiple clients on a single host. In this case each client will be treated as a separate host with its unique ID. The parameter multiple_clients_per_host, which could prohibit the scheduler from registering each client as a separate host, is deprecated since 2011. The huge number of requests, which are causing the overload, come not from different hosts but from many different clients running on the same hosts. And this is also the reason why most of the tasks are processed by only a few users.

4) Surely, the scheduler could be modified to distinguish the unique hosts from just different versions of clients running on the same host by the combination of local IP, domain name and the external IP. However, one can set up a proxy for each client to make the external IPs unique. But even if the reliable method of finding the unique hosts exist, it would require querying over the hundreds of the hosts in the database, which will significantly slow down the scheduler.

So, the whole situation with the overload is a result of inability of this project to provide sufficient number of WUs for everyone. I do not blame in any way the users who try to get as much WUs as possible by using various methods, because I understand that the competition is a major part of BOINC. However, continuing the project will cause painful experience for the participants and this is definitely not what I want. Considering that the power of BOINC is actually not required for this project, since the project reached its goal in September 2017, I think this is a right moment to end XANSONS for COD. I’ll announce the finish of the project in a separate thread.

Profile marsinph
Send message
Joined: 20 Apr 18
Posts: 31
Credit: 1,747,148
RAC: 2,670
Message 754 - Posted: 18 Jun 2019, 17:34:33 UTC

Hello Vlad,
I think, and the most of user users, knowns your efforts,
No only about WU but also to inform all of us. Very few admin do it.
Your way of do is an example.

About amount of WU, crunching time, computing power, you personnaly know what I mean....
Perhaps a solution : release WU hen you have. Not wait saturday.
So WU will be crunched and no bottleneck...
You have 10 COD, send it.
So not so may requests on saturday...
In one year , I have see the difference. also you.

The "problem" of Xanson, is only linked , or due, to the race on Formula Boinc.

Your WU are very fast returned, few credits, but the same for eveyone.
More and more users connects here because it take very short time.
Make your WU as long that they need several days , with the same credits,
you will never see bottleneck !
Look with CAS@home ! Little PRJ, very few WU,... never problem.

About your latest sentence, I not agree. There are much more project unable to produce work.

Then the worst who make me sad and very disappointed. Your idea to stop and close Xanson. Once gain I understand your point of view.
I am sure in your research center, there are other project.
Not leave to fast. You can set Xanson on pause, not

So Vlad, I thank you for all your work, efforts,...
Best kind and friendly regards with hope to not read closure of Xanson

Vlad
Project administrator
Project developer
Project tester
Project scientist
Help desk expert
Send message
Joined: 26 Oct 16
Posts: 321
Credit: 103,382
RAC: 0
Message 755 - Posted: 18 Jun 2019, 22:05:21 UTC - in response to Message 754.

Perhaps a solution : release WU hen you have. Not wait saturday.

Then it's likely that the server will be overloaded permanently, not only on Saturdays. And even if not, those who run multiple clients per host will get almost all the WUs.

The "problem" of Xanson, is only linked , or due, to the race on Formula Boinc.

I tried to contact Formula Boinc admin to ask him to exclude this project from FB but I had no opportunity to write him because he never activated my account.

Then the worst who make me sad and very disappointed. Your idea to stop and close Xanson. Once gain I understand your point of view.
I am sure in your research center, there are other project.
Not leave to fast. You can set Xanson on pause, not

So Vlad, I thank you for all your work, efforts,...
Best kind and friendly regards with hope to not read closure of Xanson

Initially, it was planned that all calculations for this project will be completed in 2017. When they were completed, I started releasing the updates each two weeks also in BOINC, because why not? But the truth is that one should not use BOINC for something that can be done on a single PC.

I got an incredibly valuable experience by running this project and I'll definitely return to BOINC if I have a valuable project. Thank you very much for your support!

Profile marmot
Avatar
Send message
Joined: 2 May 19
Posts: 7
Credit: 127,881
RAC: 1,537
Message 757 - Posted: 19 Jun 2019, 8:00:32 UTC - in response to Message 755.

Perhaps a solution : release WU hen you have. Not wait saturday.

Then it's likely that the server will be overloaded permanently, not only on Saturdays. And even if not, those who run multiple clients per host will get almost all the WUs.


Some projects have rare WU's and I wonder how they deal with this issue? RNA World, CAS. Climate Prediction Net short and regional WU's, etc should all having high request counts. I do not think RNA World has more than one server dealing with the requests. It's hard to get a hold of one of their 200+ day long WU's.


Initially, it was planned that all calculations for this project will be completed in 2017. When they were completed, I started releasing the updates each two weeks also in BOINC, because why not? But the truth is that one should not use BOINC for something that can be done on a single PC.

I got an incredibly valuable experience by running this project and I'll definitely return to BOINC if I have a valuable project. Thank you very much for your support!


Wish I'd gotten here sooner.
Thanks for all the fish!

Message boards : News : 457 new structures


Main page · Your account · Message boards


© 2019 Vladislav Neverov (NRC 'Kurchatov institute'), Nikolay Khrapov (Institute for Information Transmission Problems of RAS)