Skip to content

Commit

Permalink
docs(README): remove typo
Browse files Browse the repository at this point in the history
  • Loading branch information
pivoshenko committed May 17, 2024
1 parent 8ad5a48 commit 3801d94
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Here is the weekly series:
| **Paper** | **Links** |
| ------------- | ------------- |
| 1) **AlphaFold 3** -releases a new state-of-the-art model for accurately predicting the structure and interactions of molecules; it can generate the 3D structures of proteins, DNA, RNA, and smaller molecules; the model is an improved version of the Evoformer module and then assembling its predictions using a diffusion network; the diffusion process starts with a cloud of atoms which converges to its final molecular structure. | [Paper](https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/), [Tweet](https://x.com/GoogleDeepMind/status/1788223454317097172) |
| 2) **xLSTM: Extended Long Short-Term Memory** - attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs; to enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new memory mixing mechanism (termed sLSTM); to enhance the storage capacities of LSTMs, they add a matrix memory and a covariance update rule (termed mLSTM); Both the sLSTM and xLSTM cells stabilize their exponential gates using the same technique; these extensions lead to xLSTM blocks that are residually stacked into the final xLSTM architecture; compared to Transformers, xLSTMs have a linear computation and constant memory complexity concerning the sequence length; the xLSTM architecture is shown to be efficient at handling different aspects of long context problems; achieves better validation perplexities when compared to different model classes like Transformers, SSMs, and RNNs.| [Paper](https://arxiv.org/abs/2405.04517)), [Tweet](https://x.com/omarsar0/status/1788236090265977224) |
| 2) **xLSTM: Extended Long Short-Term Memory** - attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs; to enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new memory mixing mechanism (termed sLSTM); to enhance the storage capacities of LSTMs, they add a matrix memory and a covariance update rule (termed mLSTM); Both the sLSTM and xLSTM cells stabilize their exponential gates using the same technique; these extensions lead to xLSTM blocks that are residually stacked into the final xLSTM architecture; compared to Transformers, xLSTMs have a linear computation and constant memory complexity concerning the sequence length; the xLSTM architecture is shown to be efficient at handling different aspects of long context problems; achieves better validation perplexities when compared to different model classes like Transformers, SSMs, and RNNs.| [Paper](https://arxiv.org/abs/2405.04517), [Tweet](https://x.com/omarsar0/status/1788236090265977224) |
| 3) **DeepSeek-V2** -a strong MoE model comprising 236B parameters, of which 21B are activated for each token; supports a context length of 128K tokens and uses Multi-head Latent Attention (MLA) for efficient inference by compressing the Key-Value (KV) cache into a latent vector; DeepSeek-V2 and its chat versions achieve top-tier performance among open-source models. | [Paper](https://arxiv.org/abs/2405.04434v2), [Tweet](https://x.com/p_nawrot/status/1788479672067481664) |
| 4) **AlphaMath Almost Zero** - enhances LLMs with Monte Carlo Tree Search (MCTS) to improve mathematical reasoning capabilities; the MCTS framework extends the LLM to achieve a more effective balance between exploration and exploitation; for this work, the idea is to generate high-quality math reasoning data without professional human annotations; the assumption is that a well pre-trained LLM already possesses mathematical knowledge to generate reasoning steps but needs better stimulation such as an advanced prompting or search strategy; unlike other methods such as Program-of-thought and Chain-of-thought, no solutions are required for the training data, just the math questions and the answers; the integration of LLMs, a value model, and the MCTS framework enables an effective and autonomous process of generating high-quality math reasoning data; the value model also aids the policy model in searching for effective solution paths. | [Paper](https://arxiv.org/abs/2405.03553), [Tweet](https://x.com/omarsar0/status/1787678940158468283)) |
| 5) **DrEureka: Language Model Guided Sim-To-Real Transfer** - investigates using LLMs to automate and accelerate sim-to-real design; it requires the physics simulation for the target task and automatically constructs reward functions and domain randomization distributions to support real-world transfer; discovers sim-to-real configurations competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. | [Paper](https://eureka-research.github.io/dr-eureka/assets/dreureka-paper.pdf), [Tweet](https://x.com/DrJimFan/status/1786429467537088741) |
Expand Down

0 comments on commit 3801d94

Please sign in to comment.