Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
omarsar authored Aug 5, 2024
1 parent 33f4c38 commit a73c735
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight t
Here is the weekly series:

## 2024
- [Top ML Papers of the Week (July 29 - August 4)](./#top-ml-papers-of-the-week-july-29---august-4---2024)
- [Top ML Papers of the Week (July 22 - July 28)](./#top-ml-papers-of-the-week-july-15---july-21---2024)
- [Top ML Papers of the Week (July 15 - July 21)](./#top-ml-papers-of-the-week-july-15---july-21---2024)
- [Top ML Papers of the Week (July 8 - July 14)](./#top-ml-papers-of-the-week-july-8---july-14---2024)
Expand Down Expand Up @@ -97,6 +98,22 @@ Here is the weekly series:

[Join our Discord](https://discord.gg/SKgkVT8BGJ)


## Top ML Papers of the Week (July 29 - August 4) - 2024
| **Paper** | **Links** |
| ------------- | ------------- |
| 1) **Meta-Rewarding LLMs** - proposes a self-improving alignment technique (no human supervision) where the LLM judges its own judgements and uses the feedback to improve its judgment skills; shows that leveraging this LLM-as-a-Meta-Judge approach improves the LLM's ability to judge and follow instructions; just doing self-improvement to generate better responses (act) saturates quickly; this work improves the LLM's ability to judge itself (judge) to avoid issues like reward hacking; in addition to the act and judge roles, a third role called meta-judge is used to evaluate the model's own judgements. | [Paper](https://arxiv.org/abs/2407.19594), [Tweet](https://x.com/omarsar0/status/1818680848058585119) |
| 2) **MindSearch** - presents an LLM-based multi-agent framework to perform complex web-information seeking and integration tasks; a web planner effectively decomposes complex queries followed by a web searcher that performs hierarchical information retrieval on the Internet to improve the relevancy of the retrieved information; the planning component is powered by an iterative graph construction which is used to better model complex problem-solving processes; the multi-agent framework handles long context problems better by distributing reasoning and retrieval tasks to specialized agents. | [Paper](https://arxiv.org/abs/2407.20183), [Tweet](https://x.com/omarsar0/status/1818673381069226053) |
| 3) **Improved RAG with Self-Reasoning** - presents an end-to-end self-reasoning framework to improve the reliability and traceability of RAG systems; leverages the reasoning trajectories generated by the LLM itself; the LLM is used to carry out the following 3 processes: 1) relevance-aware: judges the relevance between the retrieved documents and the question, 2) evidence-aware selective: chooses and cites relevant documents, and then automatically selects snippets of key sentences as evidence from the cited documents, and 3) trajectory analysis: generates a concise analysis based on all gathered self-reasoning trajectories generated by the previous 2 processes and then provides the final inferred answer; this method helps the model to be more selective, reason and distinguish relevant and irrelevant documents, therefore improving the accuracy of the overall RAG system; the framework achieves comparable performance to GPT-4 with only 2K training samples (generated by GPT-4). | [Paper](https://arxiv.org/abs/2407.19813), [Tweet](https://x.com/omarsar0/status/1818139150882664696) |
| 4) **Constrained-CoT** - limits the model reasoning output length without sacrificing performance; shows that constraining the reasoning of LLaMA2-70b to 100 words improves the accuracy from 36.01% (CoT) to 41.07% (CCoT) on GSM8K, while reducing the average output length by 28 words. | [Paper](https://arxiv.org/abs/2407.19825), [Tweet](https://x.com/omarsar0/status/1818133220484898992) |
| 5) **Adaptive RAG for Conversations Sytems** - develops a gating model that predicts if a conversational system requires RAG to improve its responses; shows that RAG-based conversational systems have the potential to generate high-quality responses and high generation confidence; it also claims to identify a correlation between the generation's confidence level and the relevance of the augmented knowledge. | [Paper](https://arxiv.org/abs/2407.21712), [Tweet](https://x.com/omarsar0/status/1818843407977959756) |
| 6) **ShieldGemma** - offers a comprehensive suite of LLM-based safety content moderation models built on Gemma 2; includes classifiers for key harm types such as dangerous content, toxicity, hate speech, and more. | [Paper](https://arxiv.org/abs/2407.21772), [Tweet](https://x.com/omarsar0/status/1818837753292853349) |
| 7) **Evaluating Persona Agents** - proposes a benchmark to evaluate persona agent capabilities in LLMs; finds that Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore compared to GPT 3.5 despite being a much more advanced model. | [Paper](https://arxiv.org/abs/2407.18416), [Tweet](https://x.com/omarsar0/status/1817964944949739544) |
| 8) **Machine Unlearning Survey** - provides a comprehensive survey on machine unlearning in generative AI. | [Paper](https://arxiv.org/abs/2407.20516), [Tweet](https://x.com/omarsar0/status/1818476462262906985) |
| 9) **ThinK** - proposes an approach to address inefficiencies in KV cache memory consumption; it focuses on the long-context scenarios and the inference side of things; it presents a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least significant channels | [Paper](https://arxiv.org/abs/2407.21018), [Tweet](https://x.com/omarsar0/status/1818474655461621903) |
| 10) **The Art of Refusal** - a survey of the current methods used to achieve refusal in LLMs; provides evaluation benchmarks and metrics used to measure abstention in LLMs. | [Paper](https://arxiv.org/abs/2407.18418), [Tweet](https://x.com/omarsar0/status/1817961056465035596) |


## Top ML Papers of the Week (July 22 - July 28) - 2024
| **Paper** | **Links** |
| ------------- | ------------- |
Expand Down

0 comments on commit a73c735

Please sign in to comment.