Advanced search

Message boards : Number Crunching : All my Mac OS X WU failing

Author Message
[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 226 - Posted: 5 Apr 2013, 19:46:49 UTC

Hi

I very recently attached to the project, pretty happy that there is a Mac app !

But all the WUs I got so far are failing :


Stderr output

<core_client_version>7.0.31</core_client_version>
<![CDATA[
<message>
process got signal 5
</message>
<stderr_txt>
dyld: DYLD_ environment variables being ignored because main executable (/Library/Application Support/BOINC Data/slots/9/../../switcher/switcher) is setuid or setgid
dyld: Library not loaded: /opt/local/lib/libopencv_core.2.4.dylib
Referenced from: /Library/Application Support/BOINC Data/slots/9/../../projects/volunteer.cs.und.edu_wildlife/wildlife_0.04
Reason: image not found

</stderr_txt>
]]>



I have an iMac :

Starting BOINC client version 7.0.31 for x86_64-apple-darwin
Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz [x86 Family 6 Model 30 Stepping 5]
OS: Mac OS X 10.8.3 (Darwin 12.3.0)


Thanks for your help !

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 23
Images Observed: 774

              
Message 229 - Posted: 6 Apr 2013, 1:37:54 UTC - in response to Message 226.

Yup! We're working on the problem right now.

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 230 - Posted: 6 Apr 2013, 10:14:42 UTC

Good to know, thanks for the quick answer !

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 302 - Posted: 27 Apr 2013, 11:32:28 UTC

It's working now. But I think you should warn people wanting to help you in this boinc project that they are going to have WU as big as 900 MB and not longer than 30mn of calculation.

This is huge and unprecedented in the boinc world, regular boinc users are not used to this, and will not like it, especially when not warned.

For the moment I'm letting it go cause I want to help the testing and see that all goes well, but I will certainly lower (a lot) the priority after that.

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 23
Images Observed: 774

              
Message 308 - Posted: 28 Apr 2013, 2:50:26 UTC - in response to Message 302.

It's working now. But I think you should warn people wanting to help you in this boinc project that they are going to have WU as big as 900 MB and not longer than 30mn of calculation.

This is huge and unprecedented in the boinc world, regular boinc users are not used to this, and will not like it, especially when not warned.

For the moment I'm letting it go cause I want to help the testing and see that all goes well, but I will certainly lower (a lot) the priority after that.


The next batch of workunits will be the same size, but run around 20-50x longer, so hopefully that will be more in line with other projects.

I don't have a official way to sign up on the front page yet, but when I do it will have a warning that it will use a lot of bandwidth.

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 310 - Posted: 28 Apr 2013, 11:37:35 UTC

Thanks for the answer, the WUs I have been crunching today are significantly longer indeed, that's better (if it means they do a "better work" of course, cause if it's just to make them longer...), or is the duration of the boinc WU simply proportional to the video length ?

I'm not sure to understand what you mean with "I don't have a official way to sign up on the front page yet", can't you write what you want on your home webpage ?

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 23
Images Observed: 774

              
Message 321 - Posted: 29 Apr 2013, 19:16:04 UTC - in response to Message 310.

Thanks for the answer, the WUs I have been crunching today are significantly longer indeed, that's better (if it means they do a "better work" of course, cause if it's just to make them longer...), or is the duration of the boinc WU simply proportional to the video length ?


They'll be proportional to the video length, but also a factor of the application type. The batch I sent out over the weekend are doing motion detection, which is fairly straightforward -- they're comparing the current frame to the average of a window of frames around it, so it can get done pretty quick.

The next batches of workunits are going to be doing feature detection, which is a lot more complicated. They'll be trying to pick out pieces of each frame of video that might match a bird, a predator, an empty nest, etc. I'm expecting these to take at least 10x longer than the other type of workunits, and depending on the feature files for the things we're looking for (basically we take a bunch of pictures of a bird or whatever, and then pull out the most indicative parts of those images to try and match agains) they're going to have different run times.

But in general they'll be proportional to the video length.



I'm not sure to understand what you mean with "I don't have a official way to sign up on the front page yet", can't you write what you want on your home webpage ?


Well, there's no button to click and download BOINC or anything like that yet. I actually need to make a page with instructions for joining the project. People who are already using BOINC know they can just plug in http://volunteer.cs.und.edu/wildlife to connect, but for new BOINC users there's nothing letting them know how to do it. The workunits are still in a pretty alpha state anyways so that might not be a bad thing.

Profile KPX
Send message
Joined: 29 Apr 12
Posts: 2
Combined Credit: 5,230,134
DNA@Home: 131,829
SubsetSum@Home: 326,120
Wildlife@Home: 4,772,184
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 12
Images Observed: 0

        
Message 325 - Posted: 30 Apr 2013, 5:44:45 UTC - in response to Message 321.

Greetings.
Making the units 50 times longer is a scary proposition in a situation of failing WUs. It is bad enough, if a 30 minute unit fails, it is 50 times worse, if a 50 times longer unit fails. Please fix the failure rate before sending out such long units, or adjust your credit policy and give credit for failed units - to compensate for a time and money spent on calculating such units, since I cannot influence if a unit validates or not. (My latest unit didnt validate after 5000 seconds of crunching, even though all previous ones did...do I know why? No clue.)

Travis Desell
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 16 Jan 12
Posts: 1813
Combined Credit: 23,514,257
DNA@Home: 293,563
SubsetSum@Home: 349,212
Wildlife@Home: 22,871,482
Wildlife@Home Watched: 212,926s
Wildlife@Home Events: 51
Climate Tweets: 23
Images Observed: 774

              
Message 348 - Posted: 2 May 2013, 19:23:45 UTC - in response to Message 325.
Last modified: 2 May 2013, 19:23:52 UTC

Greetings.
Making the units 50 times longer is a scary proposition in a situation of failing WUs. It is bad enough, if a 30 minute unit fails, it is 50 times worse, if a 50 times longer unit fails. Please fix the failure rate before sending out such long units, or adjust your credit policy and give credit for failed units - to compensate for a time and money spent on calculating such units, since I cannot influence if a unit validates or not. (My latest unit didnt validate after 5000 seconds of crunching, even though all previous ones did...do I know why? No clue.)


I've updated the linux application which should fix this problem.

I do apologize about the validation not working correctly, but please bear with us and keep in mind the project is in very alpha stages so we're going to have to do a lot of testing and debugging. It can be very difficult to ensure that an application runs exactly the same across all the many types of volunteered computers.

If the credit for workunits is low, I don't mind increasing it to make up for the alpha state of the project.

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 409 - Posted: 19 May 2013, 19:24:35 UTC - in response to Message 321.

Well, there's no button to click and download BOINC or anything like that yet. I actually need to make a page with instructions for joining the project. People who are already using BOINC know they can just plug in http://volunteer.cs.und.edu/wildlife to connect, but for new BOINC users there's nothing letting them know how to do it. The workunits are still in a pretty alpha state anyways so that might not be a bad thing.

Yes a little square in the corner of the page, or an obvious link to a typical page like this one would simply do it.

Profile distributed.org.ua
Avatar
Send message
Joined: 31 Jan 12
Posts: 11
Combined Credit: 12,573
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 12,573
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

  
Message 411 - Posted: 20 May 2013, 14:36:04 UTC - in response to Message 409.

Macos 10.8 WUs are fine for me, no fails
____________
i crunch for Ukraine

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 2081 - Posted: 4 Nov 2013, 13:07:38 UTC
Last modified: 4 Nov 2013, 13:08:04 UTC

Hello... again.

Failed to open ../../projects/volunteer.cs.und.edu_wildlife/CH00_20120725_170657MN.mp4

after a veeeeeery long download of 400 or 500 MB each WU at 30 KB/s, they are failing immediately...

Wildlife@Home Descriptor Collection (SURF) v0.04 (Mac OS X)

Thanks.

Profile Steve Hawker*
Send message
Joined: 8 Apr 13
Posts: 134
Combined Credit: 829,896
DNA@Home: 11,932
SubsetSum@Home: 299,708
Wildlife@Home: 518,257
Wildlife@Home Watched: 5,541,577s
Wildlife@Home Events: 2,169
Climate Tweets: 8,659
Images Observed: 55

              
Message 2442 - Posted: 23 Dec 2013, 22:31:55 UTC - in response to Message 2081.

Download fail:

<core_client_version>7.0.31</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>CH00_20130526_085550MN.mp4.config</file_name>
<error_code>-224</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>
____________

Kyle Goehner
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 17 Feb 13
Posts: 46
Combined Credit: 866,343
DNA@Home: 66,934
SubsetSum@Home: 205,422
Wildlife@Home: 593,987
Wildlife@Home Watched: 44,959s
Wildlife@Home Events: 3
Climate Tweets: 6
Images Observed: 9

        
Message 2443 - Posted: 24 Dec 2013, 0:23:57 UTC - in response to Message 2442.

Download fail:

7.0.31

WU download error: couldn't get input files:

CH00_20130526_085550MN.mp4.config
-224
permanent HTTP error



]]>


I believe this is due to the storage server being down. Hopefully it will be fixed soon.

____________
--
Kyle

[AF>Le_Pommier] Jerome_C2005
Send message
Joined: 3 Apr 13
Posts: 12
Combined Credit: 3,036,499
DNA@Home: 140,242
SubsetSum@Home: 151,756
Wildlife@Home: 2,744,502
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

      
Message 2462 - Posted: 26 Dec 2013, 10:53:01 UTC

I have 7 that did succeed (woohoo) but I have loads that didn't...

Profile PinkPenguin
Avatar
Send message
Joined: 28 Jun 13
Posts: 31
Combined Credit: 102,860
DNA@Home: 97,238
SubsetSum@Home: 0
Wildlife@Home: 5,622
Wildlife@Home Watched: 313,167s
Wildlife@Home Events: 0
Climate Tweets: 0
Images Observed: 0

    
Message 2477 - Posted: 7 Jan 2014, 22:29:44 UTC
Last modified: 7 Jan 2014, 22:31:23 UTC

2 out of 11 MAC OS X work units validated. It seems they are valid only if the same task was run other MAC OS X PCs. If the wingman has a Linux box MAC OS X WUs (and sometimes all WUs) get marked invalid with the message: "Too many results (may be nondeterministic)".

My Linux box was more consistent it returned 10 valid WUs out of 14. The WUs in error returned the same message. In this case there is no clear pattern to the valid WUs apart from the fact that they were all Linux boxes (mixed AMD / INTEL and even mixed 32-bit/64-bit). If one of the Wingmen was a MAC OS X unit the MAC OS X unit was marked invalid when the linux units were marked valid...

At a glance it seems like Linux machines have a better chance on this project... ;-)
____________
'"He's three years old, gentle as a kitten, and likes dogs." I wonder whether Mark means that he eats dogs or is fond of them?'

Profile Steve Hawker*
Send message
Joined: 8 Apr 13
Posts: 134
Combined Credit: 829,896
DNA@Home: 11,932
SubsetSum@Home: 299,708
Wildlife@Home: 518,257
Wildlife@Home Watched: 5,541,577s
Wildlife@Home Events: 2,169
Climate Tweets: 8,659
Images Observed: 55

              
Message 2646 - Posted: 5 Apr 2014, 0:29:45 UTC

Hurrah! I got some WUs...

Awww... they both failed

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process got signal 5
</message>
<stderr_txt>
dyld: Library not loaded: /usr/local/lib/libmp3lame.0.dylib
Referenced from: /Library/Application Support/BOINC Data/slots/13/libavcodec.55.39.101.dylib
Reason: image not found

</stderr_txt>
]]>
____________

noderaser
Avatar
Send message
Joined: 29 Dec 13
Posts: 6
Combined Credit: 3,148,207
DNA@Home: 0
SubsetSum@Home: 0
Wildlife@Home: 3,148,207
Wildlife@Home Watched: 0s
Wildlife@Home Events: 0
Climate Tweets: 1
Images Observed: 0

  
Message 2882 - Posted: 20 May 2014, 4:34:07 UTC

Doesn't look like things are going very well... Across two OS X hosts, I have only 1 of 20 work units that has completed and validated successfully. The rest include:

Completed, validation inconclusive
Completed, marked as invalid
Error while computing
Timed out - no response (MIA, doesn't show up on host)
Error while downloading
Error while downloading - Not in DB
Completed, can't validate

Since it's not a single error, I don't even know where to start... Have not attached any Windows hosts yet.
____________
Click Here to see My Detailed BOINC Stats


Post to thread

Message boards : Number Crunching : All my Mac OS X WU failing