Potential bug in LargeVis::search_reverse_thread #9

Quasimondo · 2016-08-14T17:56:36Z

I believe there is a small bug in the search_reverse_thread call - I am not sure what the correct fix is, but the current version does not really make sense to me:

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[x]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`

In the inner loop y gets assigned the index of a connected node, but then y is not being used at all. Since it's supposed to be a backwards search my guess is that the fix would be to replace head[x] with head[y] in the inner loop - so there is a search back from the other edges neighbors until the two paths meet.

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[y]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`

The text was updated successfully, but these errors were encountered:

DataWaveAnalytics · 2016-08-30T03:51:23Z

Just curiosity...Did you run any test using this change?

sonjageorgievska · 2016-08-30T12:42:54Z

Hi, I also had to make this change, otherwise it was crashing. I ran it on the CondMat example and it produced acceptable results.

sonjageorgievska · 2016-08-30T12:58:19Z

I have to mention that before fixing this bug, I had to correct the sample_an_edge method to something like this:

long long LargeVis::sample_an_edge(real rand_value1, real rand_value2)
{
    long long k = (long long)(n_edge * rand_value1) - 1;
    k = max(k, 0);
    return rand_value2 <= prob[k] ? k : alias[k];
}

Without this correction, I had "index out of range" when rand_value1 would be 1 or when rand_value2 would be one. This is a quick fix, though, but worked. (Potentially another issue)

DataWaveAnalytics · 2016-08-31T00:25:09Z

That last change shouldn't be necessary because according to GSL's documentation you shouldn't get a 1 running gsl_rng_uniform.

Function: double gsl_rng_uniform (const gsl_rng * r)
This function returns a double precision floating point number uniformly distributed in the range [0,1). The range includes 0.0 but excludes 1.0. The value is typically obtained by dividing the result of gsl_rng_get(r) by gsl_rng_max(r) + 1.0 in double precision. Some generators compute this ratio internally so that they can provide floating point numbers with more than 32 bits of randomness (the maximum number of bits that can be portably represented in a single unsigned long int).

Which dataset were you using when crashed?

I removed the GSL dependency in my code. I'm using the stl uniform_real_distribution to improve portability. I did tests with this change and I got acceptable results as well.

Next, I did a test using MNIST dataset with the change proposed here, and I get different results (I got more clusters (13) than the typical ten you can find around). So, I think it deserves more debugging.

sonjageorgievska · 2016-08-31T12:42:50Z

I used the CondMath_network dataset.

I agree that this deserves more debugging or a separate issue.

DataWaveAnalytics · 2016-09-01T00:11:56Z

I guess the bug is in the construction of prob or alias vectors. I'm contacted the original authors and Jian (one of the authors) confirmed this his original source code. Probably, they would give us more insights to solve them.

DataWaveAnalytics · 2016-09-01T03:38:07Z

After doing some research, it looks like there is a rounding problem between double and float (real). The details are explained here.

I changed the sample_an _edge method using the solution proposed in the post. Note you should also change the method declaration in LargeVis.h due the new type in the parameters.

long long LargeVis::sample_an_edge(double rand_value1, double rand_value2){
    real rv1 = rand_value1;
    real rv2 = rand_value2;

    if(rv1 > rand_value1)
        rv1 = std::nextafter(rv1, -std::numeric_limits<real>::infinity());

    if(rv2 > rand_value2)
        rv2 = std::nextafter(rv2, -std::numeric_limits<real>::infinity());

    long long k = (long long)(n_edge * rv1);
    return rv2 < prob[k] ? k : alias[k];
}

sonjageorgievska · 2016-09-05T12:02:50Z

Update: I tried this solution, but at progress of 2.84% I got again the "index out of range" crash. So, for the moment I am still using the fix that I wrote about earlier...

DataWaveAnalytics · 2016-09-06T00:43:27Z

I've running several tests without getting any segmentation fault. Please, could run tests using gdb and post the output here?.

gdb --args ./largevis -input ...
then write run and press Enter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in LargeVis::search_reverse_thread #9

Potential bug in LargeVis::search_reverse_thread #9

Quasimondo commented Aug 14, 2016

DataWaveAnalytics commented Aug 30, 2016

sonjageorgievska commented Aug 30, 2016

sonjageorgievska commented Aug 30, 2016 •

edited

Loading

DataWaveAnalytics commented Aug 31, 2016 •

edited

Loading

sonjageorgievska commented Aug 31, 2016

DataWaveAnalytics commented Sep 1, 2016

DataWaveAnalytics commented Sep 1, 2016 •

edited

Loading

sonjageorgievska commented Sep 5, 2016

DataWaveAnalytics commented Sep 6, 2016

Potential bug in LargeVis::search_reverse_thread #9

Potential bug in LargeVis::search_reverse_thread #9

Comments

Quasimondo commented Aug 14, 2016

DataWaveAnalytics commented Aug 30, 2016

sonjageorgievska commented Aug 30, 2016

sonjageorgievska commented Aug 30, 2016 • edited Loading

DataWaveAnalytics commented Aug 31, 2016 • edited Loading

sonjageorgievska commented Aug 31, 2016

DataWaveAnalytics commented Sep 1, 2016

DataWaveAnalytics commented Sep 1, 2016 • edited Loading

sonjageorgievska commented Sep 5, 2016

DataWaveAnalytics commented Sep 6, 2016

sonjageorgievska commented Aug 30, 2016 •

edited

Loading

DataWaveAnalytics commented Aug 31, 2016 •

edited

Loading

DataWaveAnalytics commented Sep 1, 2016 •

edited

Loading