holy smoke!
Even if i would have the time to get something going at the Tesla's here i wouldn't get to those ranges any quick :) Toying with the alphabet now, especially (un)abcd :) 
Moved it from sieving to testing.
Using sllr64 here right now at CPU hardware (Xeon L5420), tested as fastest at the CPU hardware. I remember Jean Penne busy with some gpgpu software, how did that progress lately; has Riesel Prime Search already a public version of that? Got some Tesla's here. They idle now :) 
CUDALLR is available, and in my experience stable. It only uses powerof2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).
Check in the hardware/GPU computing forum I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3megabit range. Curtis 
[QUOTE=VBCurtis;354213]CUDALLR is available, and in my experience stable. It only uses powerof2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).
Check in the hardware/GPU computing forum I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3megabit range. Curtis[/QUOTE] Thx Curtis, i downloaded it. Will try to get it to work! Is that power of 2 the only 'disadvantage' over the IBDWT in SSE2 i got running currently? I tend to remember how my own FFT implementation that also used power of 2 had another few disadvantages (let's say it polite) :) The tesla's i got here are 0.5 Tflop in theory (of course that's always 2x more than it can do in terms of instructions, they always assume you can use multiplyadd, not sure whether this FFT can), looking forward benchmarking it for this code! Note it would be possible at Nvidia to run at each SIMD a different code stream. I don't know whether it still can deliver 0.5 Tflop doing that, yet if it can, should be easier to get rid of that power of 2 sized FFT? Maybe? 
I don't recall what msft (user name, not company) said about the limitations of his code I believe he stopped development shortly after he got it working, in favor of an OpenCL version for the other half of the GPUniverse.
I happen to have plenty of work available near 3M, so I haven't considered alternatives. 
hi,
I found a prime, maybe some want to verify it is prime. How to properly report it? 69 * 2 ^ 2649939  1 was found prime here! Thanks, Vincent [email]diep@xs4all.nl[/email] in case i don't respond quickly at forum. 
Hi diep
Congratulations! To report it please create a new prover's code including RPS, Psieve, Srsieve and the software you used to prove it prime like LLR. Thanks! 
Tried all that, let me know if worked out ok. Thanks!
Paul Underwood verified with pfgw and confirms in meantime. 
[QUOTE=diep;363882]Tried all that, let me know if worked out ok. Thanks!
Paul Underwood verified with pfgw and confirms in meantime.[/QUOTE] It's correct. [url]http://primes.utm.edu/primes/page.php?id=116841[/url] 
thanks for verifying!

At the L5420 Xeon machines i have here at home, i had seen a pretty big jump in testing time moving up from roughly 2.74Mbit to 2.76 mbit
Testing times increased roughly from 6123 seconds to 7689 seconds. Each CPU has 12 MB L2 cache. So to speak 3MB a core Seems it's the transform causing it, not the hardware. Not sure about transform size internal. If it stores 2.75M bits and assume 18 bits per double then it would require an array sized 2.75mbits * 64 / (18 * 8 bits per byte) = 2.75 * 8 / 18 = 1.2 MB Even double that would easily fit in L2. At what mbit level can i again expect a big dang like that? Is that at double this size at 5.5 Mbit? 
All times are UTC. The time now is 03:11. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.