Skip to content

Conversation

marcvivet-we
Copy link

@marcvivet-we marcvivet-we commented Jan 16, 2019

We added these expressions:

  • '$cossim', '$chi2', '$euclidean', '$squared_euclidean', '$manhattan'

Which allow us to compare long vectors (image features) stored as arrays or BSON.
It is useful to find the most similar images in a dataset. The usage is the following:

db.test_speed.aggregate([
    {   
        '$project':
        {
            'id': '$id',
            "other_id": '$other_id',
            'distance': {'$cossim': [vector, '$vector']},
        },
    },
    {"$sort": {"distance": -1}},
    {"$limit": 20}
])

In addition implementations using avx2 and avx512 are included in this pull request.

@marcvivet-we marcvivet-we changed the title Added distance expression for image feature comparision Added distance expressions for image feature comparison Jan 16, 2019
@OscarRL
Copy link

OscarRL commented Jan 16, 2019

I think that our implementation still needs some improvements..

  1. I suggest to create a special data type "Vector" to handle float32, float16.. etc vectors in binary format to avoid confusions with the BinData type. Actually our code is making a "greedy" cast from BinData.

  2. Memory aligment and batch processing.. instead of compute vectors distances one by one we can improve the speed alot by creating batches of thousends of vectors to compute at once.

  3. More testing and unit cases.

Any help is appreciated

@marcvivet-we
Copy link
Author

marcvivet-we commented Jan 16, 2019

What it would be nice to speed up even more these operations would be to have a new function on the Value object that returns the memory aligned:

so instead of -> value1.getBinData(); -> value1.getAlignedBinData();

This can be done by changing the memory allocation to __mm_malloc and __mm_free (alignment of 32bits). This would allow us to use _mmXXX_load_ps instead of _mmXXX_loadu_ps.

@Schubes
Copy link
Collaborator

Schubes commented Jan 16, 2019

Hi @marcvivet-we,

Thanks so much for opening this pull request! It looks like a really interesting and worthwhile extension. I see you've already signed the contributor's agreement, so I've gone ahead and created SERVER-39057 to track this issue on your behalf. The query team will look over the pull request and provide more substantive comments.

Thanks again,
Kelsey

@Schubes Schubes changed the title Added distance expressions for image feature comparison SERVER-39057 Add distance expressions for image feature comparison Jan 16, 2019
<< " dbpath=" << storageGlobalParams.dbpath;

const bool is32bit = sizeof(int*) == 4;
l << (is32bit ? " 32" : " 64") << "-bit host=" << getHostNameCached() << endl;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Những

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what is your intention @anhems with this line?
Seems completely unrelated to this PR,

Best,
Miguel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants