Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
omarsar authored Jul 1, 2024
1 parent 482d23c commit a19e877
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight t
Here is the weekly series:

## 2024
- [Top ML Papers of the Week (June 24 - June 30)](./#top-ml-papers-of-the-week-june-24---june-30---2024)
- [Top ML Papers of the Week (June 17 - June 23)](./#top-ml-papers-of-the-week-june-17---june-23---2024)
- [Top ML Papers of the Week (June 10 - June 16)](./#top-ml-papers-of-the-week-june-10---june-16---2024)
- [Top ML Papers of the Week (June 3 - June 9)](./#top-ml-papers-of-the-week-june-3---june-9---2024)
Expand Down Expand Up @@ -92,6 +93,20 @@ Here is the weekly series:

[Join our Discord](https://discord.gg/SKgkVT8BGJ)

## Top ML Papers of the Week (June 24 - June 30) - 2024
| **Paper** | **Links** |
| ------------- | ------------- |
| 1) **ESM3** - a new LLM-based biological model that generates a new green fluorescent protein called esmGFP; builds on a bidirectional transformer, uses masked language models for the objective function, leverages geometric attention to represent atomic coordinates, and applies chain-of-thought prompting to generate fluorescent proteins; estimates that esmGFP represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator. | [Paper](https://evolutionaryscale-public.s3.us-east-2.amazonaws.com/research/esm3.pdf), [Tweet](https://x.com/alexrives/status/1805559211394277697) |
| 2) **Gemma 2** - presents a family of open models ranging between 2B to 27B parameters; demonstrates strong capabilities in reasoning, math, and code generation, outperforming models twice its size. | [Paper](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf), [Tweet](https://x.com/omarsar0/status/1806352449956958501) |
| 3) **LLM Compiler** - a suite of open pre-trained models (7B and 13B parameters) designed for code optimization tasks; it’s built on top of Code Llama and trained on a corpus of 546 billion tokens of LLVM-IR and assembly code; it’s also instruction fine-tuned to interpreter compiler behavior; achieves 77% of the optimizing potential of autotuning search and performs accurate disassembling 14% of the time compared to the autotuning technique on which it was trained. | [Paper](https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization), [Tweet](https://x.com/AIatMeta/status/1806361623831171318) |
| 4) **Enhancing RAG with Long-Context LLMs** - proposes LongRAG, which combines RAG with long-context LLMs to enhance performance; uses a long retriever to significantly reduce the number of extracted units by operating on longer retrieval units; the long reader takes in the long retrieval units and leverages the zero-shot answer extraction capability of long-context LLMs to improve performance of the overall system; claims to achieve 64.3% on HotpotQA (full-wiki), which is on par with the state-of-the-art model. | [Paper](https://arxiv.org/abs/2406.15319), [Tweet](https://x.com/omarsar0/status/1805230323799560199) |
| 5) **Improving Retrieval in LLMs through Synthetic Data** - proposes a fine-tuning approach to improve the accuracy of retrieving information in LLMs while maintaining reasoning capabilities over long-context inputs; the fine-tuning dataset comprises numerical dictionary key-value retrieval tasks (350 samples); finds that this approach mitigates the "lost-in-the-middle" phenomenon and improves performance on both information retrieval and long-context reasoning. | [Paper](https://arxiv.org/abs/2406.19292), [Tweet](https://x.com/omarsar0/status/1806738385039692033) |
| 6) **GraphReader** - proposes a graph-based agent system to enhance the long-context abilities of LLMs; it structures long text into a graph and employs an agent to explore the graph (using predefined functions guided by a step-by-step rational plan) to effectively generate answers for questions; consistently outperforms GPT-4-128k across context lengths from 16k to 256k. | [Paper](https://arxiv.org/abs/2406.14550v1), [Tweet](https://x.com/omarsar0/status/1806802925517218078) |
| 7) **Faster LLM Inference with Dynamic Draft Trees** - presents a context-aware dynamic draft tree to increase the speed of inference; the previous speculative sampling method used a static draft tree for sampling which only depended on position but lacked context awareness; achieves speedup ratios ranging from 3.05x-4.26x, which is 20%-40% faster than previous work; these speedup ratios occur because the new method significantly increases the number of accepted draft tokens. | [Paper](https://arxiv.org/abs/2406.16858), [Tweet](https://x.com/omarsar0/status/1805629496634294760) |
| 8) **Following Length Constraints in Instructions** - presents an approach for how to deal with length bias and train instruction following language models that better follow length constraint instructions; fine-tunes a model using DPO with a length instruction augmented dataset and shows less length constraint violations and while keeping a high response quality. | [Paper](https://arxiv.org/abs/2406.17744), [Tweet](https://x.com/jaseweston/status/1805771223747481690) |
| 9) **On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation** - survey on LLM-based synthetic data generation, curation, and evaluation. | [Paper](https://arxiv.org/abs/2406.15126), [Tweet](https://x.com/omarsar0/status/1805652404404207919) |
| 10) **Adam-mini** - a new optimizer that reduces memory footprint (45%-50% less memory footprint) by using fewer learning rates and achieves on-par or even outperforms AdamW; it carefully partitions parameters into blocks and assigns a single high-quality learning that outperforms Adam; achieves consistent results on language models sized from 125M -7B for pre-training, SFT, and RLHF. | [Paper](https://arxiv.org/abs/2406.16793), [Tweet](https://x.com/arankomatsuzaki/status/1805439246318125299) |

## Top ML Papers of the Week (June 17 - June 23) - 2024
| **Paper** | **Links** |
| ------------- | ------------- |
Expand Down

0 comments on commit a19e877

Please sign in to comment.