Skip to content

Commit

Permalink
Merge pull request dair-ai#17 from pivoshenko/main
Browse files Browse the repository at this point in the history
Update README - Remove unopened brackets
  • Loading branch information
omarsar authored May 18, 2024
2 parents 8ad5a48 + 6c5cf64 commit 59cb92b
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ Here is the weekly series:
| **Paper** | **Links** |
| ------------- | ------------- |
| 1) **AlphaFold 3** -releases a new state-of-the-art model for accurately predicting the structure and interactions of molecules; it can generate the 3D structures of proteins, DNA, RNA, and smaller molecules; the model is an improved version of the Evoformer module and then assembling its predictions using a diffusion network; the diffusion process starts with a cloud of atoms which converges to its final molecular structure. | [Paper](https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/), [Tweet](https://x.com/GoogleDeepMind/status/1788223454317097172) |
| 2) **xLSTM: Extended Long Short-Term Memory** - attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs; to enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new memory mixing mechanism (termed sLSTM); to enhance the storage capacities of LSTMs, they add a matrix memory and a covariance update rule (termed mLSTM); Both the sLSTM and xLSTM cells stabilize their exponential gates using the same technique; these extensions lead to xLSTM blocks that are residually stacked into the final xLSTM architecture; compared to Transformers, xLSTMs have a linear computation and constant memory complexity concerning the sequence length; the xLSTM architecture is shown to be efficient at handling different aspects of long context problems; achieves better validation perplexities when compared to different model classes like Transformers, SSMs, and RNNs.| [Paper](https://arxiv.org/abs/2405.04517)), [Tweet](https://x.com/omarsar0/status/1788236090265977224) |
| 2) **xLSTM: Extended Long Short-Term Memory** - attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs; to enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new memory mixing mechanism (termed sLSTM); to enhance the storage capacities of LSTMs, they add a matrix memory and a covariance update rule (termed mLSTM); Both the sLSTM and xLSTM cells stabilize their exponential gates using the same technique; these extensions lead to xLSTM blocks that are residually stacked into the final xLSTM architecture; compared to Transformers, xLSTMs have a linear computation and constant memory complexity concerning the sequence length; the xLSTM architecture is shown to be efficient at handling different aspects of long context problems; achieves better validation perplexities when compared to different model classes like Transformers, SSMs, and RNNs.| [Paper](https://arxiv.org/abs/2405.04517), [Tweet](https://x.com/omarsar0/status/1788236090265977224) |
| 3) **DeepSeek-V2** -a strong MoE model comprising 236B parameters, of which 21B are activated for each token; supports a context length of 128K tokens and uses Multi-head Latent Attention (MLA) for efficient inference by compressing the Key-Value (KV) cache into a latent vector; DeepSeek-V2 and its chat versions achieve top-tier performance among open-source models. | [Paper](https://arxiv.org/abs/2405.04434v2), [Tweet](https://x.com/p_nawrot/status/1788479672067481664) |
| 4) **AlphaMath Almost Zero** - enhances LLMs with Monte Carlo Tree Search (MCTS) to improve mathematical reasoning capabilities; the MCTS framework extends the LLM to achieve a more effective balance between exploration and exploitation; for this work, the idea is to generate high-quality math reasoning data without professional human annotations; the assumption is that a well pre-trained LLM already possesses mathematical knowledge to generate reasoning steps but needs better stimulation such as an advanced prompting or search strategy; unlike other methods such as Program-of-thought and Chain-of-thought, no solutions are required for the training data, just the math questions and the answers; the integration of LLMs, a value model, and the MCTS framework enables an effective and autonomous process of generating high-quality math reasoning data; the value model also aids the policy model in searching for effective solution paths. | [Paper](https://arxiv.org/abs/2405.03553), [Tweet](https://x.com/omarsar0/status/1787678940158468283)) |
| 4) **AlphaMath Almost Zero** - enhances LLMs with Monte Carlo Tree Search (MCTS) to improve mathematical reasoning capabilities; the MCTS framework extends the LLM to achieve a more effective balance between exploration and exploitation; for this work, the idea is to generate high-quality math reasoning data without professional human annotations; the assumption is that a well pre-trained LLM already possesses mathematical knowledge to generate reasoning steps but needs better stimulation such as an advanced prompting or search strategy; unlike other methods such as Program-of-thought and Chain-of-thought, no solutions are required for the training data, just the math questions and the answers; the integration of LLMs, a value model, and the MCTS framework enables an effective and autonomous process of generating high-quality math reasoning data; the value model also aids the policy model in searching for effective solution paths. | [Paper](https://arxiv.org/abs/2405.03553), [Tweet](https://x.com/omarsar0/status/1787678940158468283) |
| 5) **DrEureka: Language Model Guided Sim-To-Real Transfer** - investigates using LLMs to automate and accelerate sim-to-real design; it requires the physics simulation for the target task and automatically constructs reward functions and domain randomization distributions to support real-world transfer; discovers sim-to-real configurations competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. | [Paper](https://eureka-research.github.io/dr-eureka/assets/dreureka-paper.pdf), [Tweet](https://x.com/DrJimFan/status/1786429467537088741) |
| 6) **Consistency LLMs** - proposes efficient parallel decoders that reduce inference latency by decoding n-token sequence per inference step; the inspiration for this work comes from the human's ability to form complete sentences before articulating word by word; this process can be mimicked and learned through fine-tuning pre-trained LLMs to perform parallel decoding; it is trained to perform parallel decoding by mapping randomly initialized n-token sequences to the same result yielded by autoregressive (AR) decoding in as few steps as possible; a consistency loss helps with multiple-token prediction and a standard AR loss prevents deviation from the target LLM and ensures generation quality. Shows 2.4x to 3.4x improvements in generation speed while preserving the generation quality. | [Paper](https://arxiv.org/abs/2403.00835), [Tweet](https://x.com/omarsar0/status/1788594039865958762) |
| 7) **Is Flash Attention Stable?** - develops an approach to understanding the effects of numeric deviation and applies it to the widely-adopted Flash Attention optimization; finds that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16. | [Paper](https://arxiv.org/abs/2405.02803), [Tweet](https://x.com/arankomatsuzaki/status/1787674624647414168) |
Expand Down

0 comments on commit 59cb92b

Please sign in to comment.