Update README.md

filipeoliveiraa · Aug 5, 2024 · a73c735 · a73c735
1 parent 33f4c38
commit a73c735
Showing 1 changed file with 17 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -7,6 +7,7 @@ At DAIR.AI we ❤️ reading ML papers so we've created this repo to highlight t
 Here is the weekly series:
 
 ## 2024
+- [Top ML Papers of the Week (July 29 - August 4)](./#top-ml-papers-of-the-week-july-29---august-4---2024)
 - [Top ML Papers of the Week (July 22 - July 28)](./#top-ml-papers-of-the-week-july-15---july-21---2024)
 - [Top ML Papers of the Week (July 15 - July 21)](./#top-ml-papers-of-the-week-july-15---july-21---2024)
 - [Top ML Papers of the Week (July 8 - July 14)](./#top-ml-papers-of-the-week-july-8---july-14---2024)
@@ -97,6 +98,22 @@ Here is the weekly series:
 
 [Join our Discord](https://discord.gg/SKgkVT8BGJ)
 
+
+## Top ML Papers of the Week (July 29 - August 4) - 2024
+| **Paper**  | **Links** |
+| ------------- | ------------- |
+| 1) **Meta-Rewarding LLMs** - proposes a self-improving alignment technique (no human supervision) where the LLM judges its own judgements and uses the feedback to improve its judgment skills; shows that leveraging this LLM-as-a-Meta-Judge approach improves the LLM's ability to judge and follow instructions; just doing self-improvement to generate better responses (act) saturates quickly; this work improves the LLM's ability to judge itself (judge) to avoid issues like reward hacking; in addition to the act and judge roles, a third role called meta-judge is used to evaluate the model's own judgements.   | [Paper](https://arxiv.org/abs/2407.19594), [Tweet](https://x.com/omarsar0/status/1818680848058585119) |
+| 2) **MindSearch** - presents an LLM-based multi-agent framework to perform complex web-information seeking and integration tasks; a web planner effectively decomposes complex queries followed by a web searcher that performs hierarchical information retrieval on the Internet to improve the relevancy of the retrieved information; the planning component is powered by an iterative graph construction which is used to better model complex problem-solving processes; the multi-agent framework handles long context problems better by distributing reasoning and retrieval tasks to specialized agents. | [Paper](https://arxiv.org/abs/2407.20183), [Tweet](https://x.com/omarsar0/status/1818673381069226053) |
+| 3) **Improved RAG with Self-Reasoning** - presents an end-to-end self-reasoning framework to improve the reliability and traceability of RAG systems; leverages the reasoning trajectories generated by the LLM itself; the LLM is used to carry out the following 3 processes: 1) relevance-aware: judges the relevance between the retrieved documents and the question, 2) evidence-aware selective: chooses and cites relevant documents, and then automatically selects snippets of key sentences as evidence from the cited documents, and 3) trajectory analysis: generates a concise analysis based on all gathered self-reasoning trajectories generated by the previous 2 processes and then provides the final inferred answer; this method helps the model to be more selective, reason and distinguish relevant and irrelevant documents, therefore improving the accuracy of the overall RAG system; the framework achieves comparable performance to GPT-4 with only 2K training samples (generated by GPT-4). | [Paper](https://arxiv.org/abs/2407.19813), [Tweet](https://x.com/omarsar0/status/1818139150882664696) |
+| 4) **Constrained-CoT** - limits the model reasoning output length without sacrificing performance; shows that constraining the reasoning of LLaMA2-70b to 100 words improves the accuracy from 36.01% (CoT) to 41.07% (CCoT) on GSM8K, while reducing the average output length by 28 words.  | [Paper](https://arxiv.org/abs/2407.19825), [Tweet](https://x.com/omarsar0/status/1818133220484898992) |
+| 5) **Adaptive RAG for Conversations Sytems** - develops a gating model that predicts if a conversational system requires RAG to improve its responses; shows that RAG-based conversational systems have the potential to generate high-quality responses and high generation confidence; it also claims to identify a correlation between the generation's confidence level and the relevance of the augmented knowledge.  | [Paper](https://arxiv.org/abs/2407.21712), [Tweet](https://x.com/omarsar0/status/1818843407977959756) |
+| 6) **ShieldGemma** - offers a comprehensive suite of LLM-based safety content moderation models built on Gemma 2; includes classifiers for key harm types such as dangerous content, toxicity, hate speech, and more. | [Paper](https://arxiv.org/abs/2407.21772),  [Tweet](https://x.com/omarsar0/status/1818837753292853349)  |
+| 7) **Evaluating Persona Agents** - proposes a benchmark to evaluate persona agent capabilities in LLMs; finds that Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore compared to GPT 3.5 despite being a much more advanced model.  | [Paper](https://arxiv.org/abs/2407.18416),  [Tweet](https://x.com/omarsar0/status/1817964944949739544)  |
+| 8) **Machine Unlearning Survey** - provides a comprehensive survey on machine unlearning in generative AI. | [Paper](https://arxiv.org/abs/2407.20516),  [Tweet](https://x.com/omarsar0/status/1818476462262906985)  |
+| 9) **ThinK** - proposes an approach to address inefficiencies in KV cache memory consumption; it focuses on the long-context scenarios and the inference side of things; it presents a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least significant channels | [Paper](https://arxiv.org/abs/2407.21018), [Tweet](https://x.com/omarsar0/status/1818474655461621903)  |
+| 10) **The Art of Refusal** - a survey of the current methods used to achieve refusal in LLMs; provides evaluation benchmarks and metrics used to measure abstention in LLMs. | [Paper](https://arxiv.org/abs/2407.18418), [Tweet](https://x.com/omarsar0/status/1817961056465035596) |
+
+
 ## Top ML Papers of the Week (July 22 - July 28) - 2024
 | **Paper**  | **Links** |
 | ------------- | ------------- |