-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Evaluation on AMD 16-Core CPU Bare Metal via Latitude.sh Hardware Cloud #306
base: main
Are you sure you want to change the base?
Conversation
Thanks so much to you and Latitude for this comparison! Really interesting to see. Posting a few preliminary thoughts right away since this PR is still a work in progress. From looking over the revised rankings, the thing that jumps out to me the most is that SCANN's submission seems to do very poorly compared to the rankings on the competition machine. It might be worth looking into this one a little bit and trying to understand the discrepancy--it seems like by far the largest jump in rankings on the board. I'm OOF until mid-September but I would be interested in looking into this when I return. Another thing that occurs to me is that at least the baseline for OOD (DiskANN) sets the number of threads for query time as an explicit parameter in the configuration. So unless you changed their config, it would be running with 8 threads on your 16 core machine, while other algorithms may automatically adjust the number of threads they use to the number of available threads. It might be good to try to standardize this--I did some spot checking on other algorithms and didn't find any other instances where the number of query-time threads is set explicitly to 8 (my first thought with SCANN, but I didn't find evidence of this), but it's definitely possible I missed some. I am happy to help with producing results with the streaming track, I use that code frequently and recently contributed some new runbooks. I did not completely understand whether the problem was with running the algorithms or producing a ranking--let me know and I can probably help out. |
Great @magdalendobson! My responses in-line...
Great. Assuming all things are equal, then perhaps the difference is hardware related (bare-metal instead of virtualized, different CPU, different NVMe drive, etc.). As you suggest, this should be verified with some additional debugging.
Yeah, I did not change any configs. If I recall, the competition leverages Docker to limit/standardize the use of the underlying resources, and I did not change any of the default behavior in this regard.
Great. I think I'm missing something super obvious. Just as an example, i'm running the following commands for streaming diskann. It appears to run ok, but I can't extract the results either with data_export.py or plot.py. I think I'm missing something super obvious.
|
Paging @arron2003 to take a look at SCANN results here--in your experience do results on this hardware look accurate to you? Any thoughts on whether there may be an easily addressed issue? |
@sourcesync the plotting code (e.g. |
Thanks @magdalendobson! I put a copy of the data export into this PR branch. I didn't notice 'streaming' in the track column. Is there are specialized method to export streaming track results? |
Can you share what is the VM used for this, and how can I reproduce this? For previous Azure VM with 16 vCPU, it was the case that there were only 8 physical cores, thus 8 threads. |
Hi @arron2003 ! This is a bare-metal system. I put detailed hardware inventory in this README. Scroll down to Hardware Inventory. These systems were donated by Latitude.sh. If you need access I would need to give you credentials and instructions. Let me know if that is useful. I can also run some commands on your behalf it that's easier. |
It will be helpful if you can share the credential to me.
|
OK @arron2003 , let's get you access. What's the best way to share VPN and login credentials with you privately? I can send the credentials to an email of your choice, or I can invite you to my Slack for DM (i'll need an email in that case as well). Or something else? |
You can find me at my github handle @ gmail dot com. |
OK @arron2003, sent. |
…dri/big-ann-benchmarks into gw/latitude_m4_metal_medium
@arron2003 hey I went ahead and merged your remote main into my PR branch. Here are the new rankings. Is ok? |
Sure - no problem at all!
…On Fri, Sep 13, 2024, 4:33 PM C. George Williams ***@***.***> wrote:
@arron2003 <https://github.com/arron2003> hey I went ahead and merged
your remote main into my PR branch. Here are the new rankings
<https://github.com/harsha-simhadri/big-ann-benchmarks/blob/23eb1a986c5c6fdb956a09704bf079c8f2d0a244/neurips23/latitude-m4-metal-medium.md>.
Is ok?
—
Reply to this email directly, view it on GitHub
<#306 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACAF2Z5ONBAWQPEBJLBKVGDZWNDZFAVCNFSM6AAAAABNMZCSZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJQGEYTENZZGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…dri/big-ann-benchmarks into gw/latitude_m4_metal_medium
FYI. I also updated the OOD graph in the README @arron2003. |
What is this PR?
This PR provides competition evaluation on new hardware, based on the AMD 16-core CPU (bare metal.)
A little background first. I, Harsha, and Amir Ingber were interviewed by Harald Carlens of MLContests at NeurIPS2023. Harald introduced us to Victor Chiea of Latitude.sh. Latitude graciously donated credits for use of their hardware cloud, which provides many flavors of CPUs and GPUs.
As a first step, the decision was made to evaluate on a Latitude system similar to the ones used for the 2023 competition. This PR is the result of that on-going effort.
How do I get started? How do I view the track rankings on this hardware?
The track rankings are here.. Also included are track Pareto plots, detailed hardware inventory, commands used, and additional notes.
Why is this PR still WIP? How can I help?