Welcome to Dream.In.Code
Getting C++ Help is Easy!

Join 119,730 C++ Programmers for FREE! Ask your question and get quick answers from experts. There are 1,284 online right now! We've got more than 500 tutorials and 2,000 snippets. Join and find out why Dream.In.Code is the #1 programming help community on the internet! Registration is fast and FREE... Join Now!



Float math problem - 64 megabyte limitation

 
Reply to this topicStart new topic

Float math problem - 64 megabyte limitation

Godzilla
post 2 Jul, 2008 - 04:05 PM
Post #1


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

I was experiencing a very strange float math problem, but while trying to replicate it in a form I could share I came across another one. In the code below I get output of "Total is: 67108864.000000 Average: 0.671089", when the total should be 200 million and the average 2, since the 100 million values consist of ints 0 to 4. The total instead maxes out at 64 megabytes.
CODE
int * vals = (int*) malloc(100000000*sizeof(int));
for(int i=0; i<100000000; i++){
    int val = ((float) rand()/RAND_MAX)*5;
    vals[i] = val;
    if(i%100000==0) srand(val);
}
float total = 0;
for(int i=0; i<20000; i++){
    for(int j=0; j<5000; j++){
        float val = vals[i*5000+j];
        total += val;
    }
}
//total = total*10;
printf("Total is: %f Average: %f\n", total, total/100000000);
free(vals);

I checked the vals array above where the total maxes out, and it still has positive values at those points. I also thought maybe it had to do with rand() seeding order so I added srand() now and then. At the end I can still multiple 'total' by 10 to increase it, as you'd expect a float to be able to increase arbitrarily. However inside the loops, once it reaches 64*1024*1024, it stops increasing. I also found that if once changed the 'val' loop to be of type float, then the max for total become 128*1024*1024.

Why is total maxing out inside the loop, despite being a floating point?
User is offlineProfile CardPM

Go to the top of the page


Godzilla
post 2 Jul, 2008 - 05:24 PM
Post #2


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

A way to get the right output is the add an intermediary variable that deals with the inner loop:
CODE
for(int i=0; i<20000; i++){
    float med = 0;
    for(int j=0; j<5000; j++){
        float val = vals[i*5000+j];
        med += val;
    }
    total += med;
}

This gives output of "Total is: 199894672.000000 Average: 1.998947", which is right.

But why does total know - or care - which way the summation is taking place?
User is offlineProfile CardPM

Go to the top of the page

baavgai
post 2 Jul, 2008 - 05:40 PM
Post #3


Dreaming Coder

Group Icon
Joined: 16 Oct, 2007
Posts: 1,810



Thanked 74 times

Dream Kudos: 475

Expert In: C, C++, Java, C#, ASP.NET, PHP, Perl, Python, Oracle, SQL Server, MySql, HTML, JavaScript, Lua

My Contributions


Try this experiment:
cpp

#include <stdio.h>
#include <time.h>

#define SIZE 10
#define RAND_MAX 5

int main() {
int i, n;
long total = 0;
srand ( time(NULL) );

for(i=0; i<SIZE; i++){
//total += (rand() % RAND_MAX);
n = ((float) rand()/RAND_MAX)*5;
printf("%d\n", n);
total += n;
}
printf("Total is: %d Average: %f\n", total, (double)total/(double)SIZE);
return 0;
}


I don't believe that the random function is returning within range you seem to expect. Also, don't forget to seed your randomizer. In fact, see what happens when you take the srand out in the above example. This is probably another one of your problems.

Hope this helps.
User is offlineProfile CardPM

Go to the top of the page

Godzilla
post 3 Jul, 2008 - 05:13 PM
Post #4


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

The RAND_MAX is working fine, and the randomizer is in fact returning within the range I expect (0-4). I am certain the randomizer has nothing to do with the problem. My original problem, which I believe is related to the same core dual-loop 64 megabyte/128 megabyte float maximum val summation issue, does not deal with any random quantities but stored data; this is simply an example.

As I said, I checked the vals array itself, and the values within it are fine. It is total that simply stops increasing. Use the code below and you should see this:
CODE
#include <iostream>

using namespace std;

int main(){
    int * vals = (int*) malloc(100000000*sizeof(int));
    for(int i=0; i<100000000; i++){
        int val = ((float) rand()/RAND_MAX)*5;
        vals[i] = val;
        if(i%100000==0) srand(val);
    }
    float total = 0;
    for(int i=0; i<20000; i++){
        for(int j=0; j<5000; j++){
            float val = vals[i*5000+j];
            total += val;
                    if(total>=64*1024*1024-20){
                            printf("Total: %f Val: %f\n", total, val);
                    }
        }
    }
    //total = total*10;
    printf("Total is: %f Average: %f\n", total, total/100000000);
    free(vals);
}

Even though the val values are as they should be, total increases up to 64*1024*1024 and then stops increasing. If I make a second total2 float and set it to zero and have it start summing once total maxes out, then that accepts the val values tfine - though that too will eventually max out because of the array size.

Why the total float value would be path dependent or care how values are added to it, I do not understand.
User is offlineProfile CardPM

Go to the top of the page

perfectly.insane
post 3 Jul, 2008 - 10:48 PM
Post #5


D.I.C Addict

Group Icon
Joined: 22 Mar, 2008
Posts: 550



Thanked 44 times

Dream Kudos: 25

Expert In: C/C++

My Contributions


It seems to work correctly if you substitute double for float also. I'm not exactly sure why it's doing what it's doing, but I do know that float will not give you the exact answer that you're looking for, as there's only 23-bits of actual precision.
User is offlineProfile CardPM

Go to the top of the page

baavgai
post 4 Jul, 2008 - 02:54 AM
Post #6


Dreaming Coder

Group Icon
Joined: 16 Oct, 2007
Posts: 1,810



Thanked 74 times

Dream Kudos: 475

Expert In: C, C++, Java, C#, ASP.NET, PHP, Perl, Python, Oracle, SQL Server, MySql, HTML, JavaScript, Lua

My Contributions


QUOTE(Godzilla @ 3 Jul, 2008 - 08:13 PM) *

The RAND_MAX is working fine, and the randomizer is in fact returning within the range I expect (0-4). I am certain the randomizer has nothing to do with the problem.


I'm afraid I can't really offer anything more, then. The above code I offered, using the randomizing code you gave, gives me the follow results.
CODE

2070028180
1923451955
1630464434
1568112057
933632
578008431
635555225
460437250
1247658801
163955328
Total is: 1688670701 Average: 168867070.100000


Changing the randomizer to "n = (rand() % RAND_MAX);" yields:
CODE

0
3
2
2
1
2
2
3
4
2
Total is: 21 Average: 2.100000


Since I can't reproduce your results with your code, I can't reasonable analyze the problem . Good luck.
User is offlineProfile CardPM

Go to the top of the page

Godzilla
post 4 Jul, 2008 - 01:16 PM
Post #7


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

QUOTE(perfectly.insane @ 4 Jul, 2008 - 01:48 AM) *

It seems to work correctly if you substitute double for float also. I'm not exactly sure why it's doing what it's doing, but I do know that float will not give you the exact answer that you're looking for, as there's only 23-bits of actual precision.

I know that doubles work better, however for the problem in question it's not an option for me because of memory limitations. It's a very large analytical problem and if doubles were used, the system would often run out of memory; it would also require more hard disk space, as caching and memory mapping is involved.
QUOTE(baavgai @ 4 Jul, 2008 - 05:54 AM) *
Since I can't reproduce your results with your code, I can't reasonable analyze the problem . Good luck.

I think at first I did not understand quite what you were attempting to show - I think you misunderstand RAND_MAX. RAND_MAX is system-defined, you do not need to define it. On my system it is 2147483647. If you simply copy and paste my code as-is it should work fine in any case.
User is offlineProfile CardPM

Go to the top of the page

Godzilla
post 7 Jul, 2008 - 04:44 PM
Post #8


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

Can anyone figure this out? It actually seems like a pretty sizeably important thing.
User is offlineProfile CardPM

Go to the top of the page

Godzilla
post 22 Jul, 2008 - 08:31 PM
Post #9


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

I'll give this one last shot. Someone have any idea what's going on here, and why the float sum is capping out?
User is offlineProfile CardPM

Go to the top of the page

perfectly.insane
post 23 Jul, 2008 - 11:03 AM
Post #10


D.I.C Addict

Group Icon
Joined: 22 Mar, 2008
Posts: 550



Thanked 44 times

Dream Kudos: 25

Expert In: C/C++

My Contributions


Ok, let's think of it this way....

Let's think in reverse to give light. Let's say that you have an integer, int x, and you add float 0.5 to it 1000 times. What is the end result? 0. Most understand why this happens.

However, the same thing can apply to floats also. What is the maximum number in a float that you can store and still have integer precision? -(2^22) through 2^22 - 1, which is about 16 times less than the value that you have in your output. So, if you have 2^26, then your values will only be able to increment in units of 16.0

All of your result values should be evenly divisible by 16 if I am correct as to what is going on. ( 2^(26-22) ). If you add 4 to a value that only has a granularity of 2^4, it will be rounded down, which means your addition will not have an effect at that point. When you go below 2^26, you're only needing 25 bits, which then the granularity is 8. Since 1/4 of the total possibilities should round up, some of the additions will have effect below that.

The second method you mentioned works because you're not adding numbers by 0-4 at a time, except to the intermediate value, which never increases to the point where you're having a precision problem to begin with.

I suppose converting the value to a float immediately isn't particularly useful here, and could have been easily stored as an int, until the actual division occurs (as the operands are ints anyway), but I assume you had some reason for doing such.
User is offlineProfile CardPM

Go to the top of the page

Godzilla
post 23 Jul, 2008 - 10:00 PM
Post #11


New D.I.C Head

*
Joined: 29 May, 2008
Posts: 40

Yes perfectly, you are correct smile.gif Precision was indeed the problem. The following code:
CODE
float f = 67108864.0;
for(int i=0; i<=12; i++){
    float g = f + i;
    printf("i: %d %f\n", i, g);
}

Gives the output:
CODE
i: 0 67108864.000000
i: 1 67108864.000000
i: 2 67108864.000000
i: 3 67108864.000000
i: 4 67108864.000000
i: 5 67108872.000000
i: 6 67108872.000000
i: 7 67108872.000000
i: 8 67108872.000000
i: 9 67108872.000000
i: 10 67108872.000000
i: 11 67108872.000000
i: 12 67108880.000000

It's rounding to the nearest number divisible by 8 when above 2^26.
QUOTE(perfectly.insane @ 23 Jul, 2008 - 02:03 PM) *

I suppose converting the value to a float immediately isn't particularly useful here, and could have been easily stored as an int, until the actual division occurs (as the operands are ints anyway), but I assume you had some reason for doing such.

Yes, this was a simplified example, in my actual program it was dealing with floats. Thank you.
User is offlineProfile CardPM

Go to the top of the page

Reply to this topicStart new topic
Time is now: 10/15/08 04:39PM

Live C++ Help!

C++ Tutorials

Reference Sheets

C++ Snippets

Bye Bye Ads

Free DIC T-Shirt

T-Shirt Example

Related Sites

Monthly Drawing

Thumb Drive

Partners

Top Contributors

Top 10 Kudos This Month