We ❤️ reading ML papers so we've created this repo to highlight the top ML papers of every week.
Subscribe to our Substack Newsletter
Paper | Links |
---|---|
1) PaLM-E: An Embodied Multimodal Language Model - PaLM-E - incorporates real-world continuous sensor modalities resulting in an embodied LM that performs tasks such as robotic manipulation planning, visual QA, and other embodied reasoning tasks. | Paper, Demo , Tweet |
2) Prismer: A Vision-Language Model with An Ensemble of Experts - Prismer - a parameter-efficient vision-language model powered by an ensemble of domain experts; it efficiently pools expert knowledge from different domains and adapts it to various vision-language reasoning tasks. | Paper, GitHub, ProjectTweet |
3) Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - Visual ChatGPT - it connects ChatGPT and different visual foundation models to enable users to interact with ChatGPT beyond language format. | Paper, Gitub Tweet |
4) A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT - A History of Generative AI - an overview of generative AI - from GAN to ChatGPT. | Paper, Tweet |
5. Larger language models do in-context learning differently - LLMs do In-Context Learning Differently - shows that with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. | Paper , Tweet |
6) Foundation Models for Decision Making: Problems, Methods, and Opportunities - Foundation Models for Decision Making - provides an overview of foundation models for decision making, including tools, methods, and new research directions. | Project , Tweet |
7) Hyena Hierarchy: Towards Larger Convolutional Language Models - Hyena Hierarchy - a subquadratic drop-in replacement for attention; it interleaves implicit long convolutions and data-controlled gating and can learn on sequences 10x longer and up to 100x faster than optimized attention. | Paper, Code, Blog, Tweet |
8) OpenICL: An Open-Source Framework for In-context Learning - OpenICL - a new open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs. | Paper, Repo, Tweet |
9) MathPrompter: Mathematical Reasoning using Large Language Models - MathPrompter - a technique that improves LLM performance on mathematical reasoning problems; it uses zero-shot chain-of-thought prompting and verification to ensure generated answers are accurate. | Paper, Tweet |
10) Scaling up GANs for Text-to-Image Synthesis - GigaGAN - enables scaling up GANs on large datasets for text-to-image synthesis; it’s found to be orders of magnitude faster at inference time, synthesizes high-resolution images, & supports various latent space editing applications. | Paper, Project , Tweet |
Paper | Links |
---|---|
1) Language Is Not All You Need: Aligning Perception with Language Models - introduces a multimodal large language model called Kosmos-1; achieves great performance on language understanding, OCR-free NLP, perception-language tasks, visual QA, and more. | Paper, Tweet |
2) Evidence of a predictive coding hierarchy in the human brain listening to speech - finds that human brain activity is best explained by the activations of modern language models enhanced with long-range and hierarchical predictions. | Paper, Tweet |
3) EvoPrompting: Language Models for Code-Level Neural Architecture Search - combines evolutionary prompt engineering with soft prompt-tuning to find high-performing models; it leverages few-shot prompting which is further improved by using an evolutionary search approach to improve the in-context examples. | Paper, Tweet |
4) Consistency Models - a new family of generative models that achieve high sample quality without adversarial training. | Paper, Tweet |
5. Goal Driven Discovery of Distributional Differences via Language Descriptions - a new task that automatically discovers corpus-level differences via language description in a goal-driven way; applications include discovering insights from commercial reviews and error patterns in NLP systems. | Paper , Code, Tweet |
6) High-resolution image reconstruction with latent diffusion models from human brain activity - proposes an approach for high-resolution image reconstruction with latent diffusion models from human brain activity. | Project , Tweet |
7) Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control - a scalable approach to planning with LLMs in embodied settings through grounding functions; GD is found to be a general, flexible, and expressive approach to embodied tasks. | Paper, Project Tweet |
8) Language-Driven Representation Learning for Robotics - a framework for language-driven representation learning from human videos and captions for robotics. | Paper, Models, Evaluation, Tweet |
9) Dropout Reduces Underfitting - demonstrates that dropout can mitigate underfitting when used at the start of training; it counteracts SGD stochasticity and limits the influence of individual batches when training models. | Paper, Tweet |
10) Enabling Conversational Interaction with Mobile UI using Large Language Models - an approach that enables versatile conversational interactions with mobile UIs using a single LLM. | Paper, Tweet |
Paper | Links |
---|---|
1) LLaMA: Open and Efficient Foundation Language Models - LLaMA - a 65B parameter foundation model released by Meta AI; relies on publicly available data and outperforms GPT-3 on most benchmarks despite being 10x smaller. | Paper, Tweet |
2) Composer: Creative and Controllable Image Synthesis with Composable Conditions - Composer - a 5B parameter creative and controllable diffusion model trained on billions (text, image) pairs. | Paper, Project , GitHub , Tweet |
3) The Wisdom of Hindsight Makes Language Models Better Instruction Followers - Hindsight Instruction Relabeling - an alternative algorithm to train LLMs from feedback; the feedback is converted to instruction by relabeling the original one and training the model, in a supervised way, for better alignment. | Paper, GitHub Tweet |
4) Active Prompting with Chain-of-Thought for Large Language Models - Active-Prompt - a prompting technique to adapt LLMs to different task-specific example prompts (annotated with human-designed chain-of-thought reasoning); this process involves finding where the LLM is most uncertain and annotating those. | Paper, Code Tweet |
5. Modular Deep Learning - Modular Deep Learning - a survey offering a unified view of the building blocks of modular neural networks; it also includes a discussion about modularity in the context of scaling LMs, causal inference, and other key topics in ML. | Paper , Project, Tweet |
6) Recitation-Augmented Language Models - Recitation-Augmented LMs - an approach that recites passages from the LLM’s own memory to produce final answers; shows high performance on knowledge-intensive tasks. | Paper , Tweet |
7) Learning Performance-Improving Code Edits - LLMs to Optimize Code - an approach that uses LLMs to suggest functionally correct, performance-improving code edits. | Paper, Tweet |
8) More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models - Prompt Injection Threats - a comprehensive analysis of novel prompt injection threats to application-integrated LLMs. | Paper, Tweet |
9) Aligning Text-to-Image Models using Human Feedback - Aligning Text-to-Image Models using Human Feedback - proposes a fine-tuning method to align generative models using human feedback. | Paper, Tweet |
10) MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes - MERF - a memory-efficient radiance field representation for real-time view synthesis of large-scale scenes in a browser. | Paper, Tweet |
Paper | Links |
---|---|
1) Symbolic Discovery of Optimization Algorithms - Lion (EvoLved Sign Momentum) - a simple and effective optimization algorithm that’s more memory-efficient than Adam. | Paper, Tweet |
2) Transformer models: an introduction and catalog** - Transformer models: an introduction and catalog. | Paper, Tweet |
3) 3D-aware Conditional Image Synthesis - pix2pix3D - a 3D-aware conditional generative model extended with neural radiance fields for controllable photorealistic image synthesis. | Paper, Project Tweet |
4) The Capacity for Moral Self-Correction in Large Language Models - Moral Self-Correction in Large Language Models - finds strong evidence that language models trained with RLHF have the capacity for moral self-correction. The capability emerges at 22B model parameters and typically improves with scale. | Paper, Tweet |
6) Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment - Language Quantized AutoEncoders (LQAE) - an unsupervised method for text-image alignment that leverages pretrained language models; it enables few-shot image classification with LLMs. | Paper , Code Tweet |
7) Augmented Language Models: a Survey - Augmented Language Models - a survey of language models that are augmented with reasoning skills and the capability to use tools. | Paper, Tweet |
8) Geometric Clifford Algebra Networks - Geometric Clifford Algebra Networks (GCANs) - an approach to incorporate geometry-guided transformations into neural networks using geometric algebra. | Paper, Tweet |
9) Auditing large language models: a three-layered approach - Auditing large language models - proposes a policy framework for auditing LLMs. | Paper, Tweet |
10) Energy Transformer - Energy Transformer - a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associate Memory model; this follows the popularity that Hopfield Networks have gained in the field of ML. | Paper, Tweet |
Paper | Links |
---|---|
1) Toolformer: Language Models Can Teach Themselves to Use Tools - Toolformer - introduces language models that teach themselves to use external tools via simple API calls. | Paper, Tweet |
2) Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents - Describe, Explain, Plan, and Select - proposes using language models for open-world game playing. | Paper, Tweet |
3) A Categorical Archive of ChatGPT Failures - A Categorical Archive of ChatGPT Failures - a comprehensive analysis of ChatGPT failures for categories like reasoning, factual errors, maths, and coding. | Paper, Tweet |
4) Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery - Hard Prompts Made Easy - optimizing hard text prompts through efficient gradient-based optimization. | Paper, Tweet |
5) Data Selection for Language Models via Importance Resampling - Data Selection for LMs - proposes a cheap and scalable data selection framework based on an importance resampling algorithm to improve the downstream performance of LMs. | Paper, Tweet |
6) Structure and Content-Guided Video Synthesis with Diffusion Models - Gen-1 - proposes an approach for structure and content-guided video synthesis with diffusion models. | Paper , Project, Tweet |
7) A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity - Multitask, Multilingual, Multimodal Evaluation of ChatGPT - performs a more rigorous evaluation of ChatGPt on reasoning, hallucination, and interactivity. | Paper, Tweet |
8) Noise2Music: Text-conditioned Music Generation with Diffusion Models - Noise2Music - proposes diffusion models to generate high-quality 30-second music clips via text prompts. | Paper, Project, Tweet |
9) Offsite-Tuning: Transfer Learning without Full Model - Offsite-Tuning - introduces an efficient, privacy-preserving transfer learning framework to adapt foundational models to downstream data without access to the full model. | Paper, Project, Tweet |
10) Zero-shot Image-to-Image Translation - pix2pix-zero - proposes a model for zero-shot image-to-image translation. | Paper, Project, Tweet |
Paper | Links |
---|---|
1) REPLUG: Retrieval-Augmented Black-Box Language Models - REPLUG - a retrieval-augmented LM framework that adapts a retriever to a large-scale, black-box LM like GPT-3. | Paper, Tweet |
2) Extracting Training Data from Diffusion Models - Extracting Training Data from Diffusion Models - shows that diffusion-based generative models can memorize images from the training data and emit them at generation time. | Paper, Tweet |
3) The Flan Collection: Designing Data and Methods for Effective Instruction Tuning - The FLAN Collection - release a more extensive publicly available collection of tasks, templates, and methods to advancing instruction-tuned models. | Paper, Tweet |
4) Multimodal Chain-of-Thought Reasoning in Language Models - Multimodal Chain-of-Though Reasoning - incorporates vision features to elicit chain-of-thought reasoning in multimodality, enabling the model to generate effective rationales that contribute to answer inference. | Paper, Code Tweet |
5) Dreamix: Video Diffusion Models are General Video Editors - Dreamix - a diffusion model that performs text-based motion and appearance editing of general videos. | Paper, Project, Tweet |
6) Benchmarking Large Language Models for News Summarization - Benchmarking LLMs for news summarization. | Paper , Tweet |
7) Mathematical Capabilities of ChatGPT - Mathematical Capabilities of ChatGPT - investigates the mathematical capabilities of ChatGPT on a new holistic benchmark called GHOSTS. | Paper, Tweet |
8) Emergence of Maps in the Memories of Blind Navigation Agents - Training ‘Blind’ Agents - trains an AI agent to navigate purely by feeling its way around; no use of vision, audio, or any other sensing (as in animals). | Paper, Project, Tweet |
9) SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections - SceneDreamer - a generative model that synthesizes large-scale 3D landscapes from random noises. | Paper, Tweet |
10) Large Language Models Can Be Easily Distracted by Irrelevant Context - LLMs and irrelevant context - finds that many prompting techniques fail when presented with irrelevant context for arithmetic reasoning. | Paper, Tweet |
Paper | Links |
---|---|
1) MusicLM: Generating Music From Text - MusicLM - a generative model for generating high-fidelity music from text descriptions. | Paper, Tweet |
2) Hungry Hungry Hippos: Towards Language Modeling with State Space Models - H3 - an approach to reduce the gap, in terms of performance and hardware utilization, between state space models and attention for language modeling. | Paper, Tweet |
3) A Watermark for Large Language Models - A Watermark for LLMs - a watermarking framework for proprietary language models. | Paper, Tweet |
4) Text-To-4D Dynamic Scene Generation - Make-A-Video3D - a new text-to-4D model for dynamic scene generation from input text. | Paper, Github, Tweet |
5) ClimaX: A foundation model for weather and climate - ClimaX - a foundation model for weather and climate, including many capabilities for atmospheric science tasks. | Paper, Tweet, Blog |
6) Open Problems in Applied Deep Learning - If you're looking for interesting open problems in DL, this is a good reference. Not sure if intentional but it also looks useful to get a general picture of current trends in deep learning with ~300 references. | Paper , Tweet |
7) DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature - DetectGPT - an approach for zero-shot machine-generated text detection. Uses raw log probabilities from the LLM to determine if the passage was sampled from it. | Paper, Tweet |
8) StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis - StyleGAN-T - a new model that aims to regain the competitiveness of GANs for fast large-scale text-to-image synthesis. | Paper, Project, Code Tweet |
9) StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis - ProGen - an LLM that can generate protein sequences with a predictable function across large protein families. | Paper, Tweet |
10) The Impossibility of Parallelizing Boosting - The Impossibility of Parallelizing Boosting - investigates the possibility of parallelizing boosting. | Paper, Tweet |
Paper | Links |
---|---|
1) Google AI Research Recap (2022 Edition) - an excellent summary of some notable research Google AI did in 2022. | Blog, Tweet |
2) Dissociating language and thought in large language models: a cognitive perspective - a review paper on the capabilities of LLMs from a cognitive science perspective. | Paper, Tweet |
3) Human-Timescale Adaptation in an Open-Ended Task Space - an agent trained at scale that leads to a general in-content learning algorithm able to adapt to open-ended embodied 3D problems. | Paper, Tweet |
4) AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation - an approach to help provide explanations of generative transformer models through memory-efficient attention manipulation. | Paper, Tweet |
5) Everything is Connected: Graph Neural Networks - short overview of key concepts in graph representation learning. | Paper, Tweet |
6) GLIGEN: Open-Set Grounded Text-to-Image Generation - an approach that extends the functionality of existing pre-trained text-to-image diffusion models by enabling conditioning on grounding inputs. | Paper, Tweet, Project |
7) InstructPix2Pix: Learning to Follow Image Editing Instructions - proposes a method with the capability of editing images from human instructions. | Paper, Tweet |
8) Dataset Distillation: A Comprehensive Review | Paper, Tweet |
9) Learning-Rate-Free Learning by D-Adaptation - a new method for automatically adjusting the learning rate during training, applicable to more than a dozen diverse ML problems. | Paper, Tweet |
10) RecolorNeRF: Layer Decomposed Radiance Field for Efficient Color Editing of 3D Scenes - a user-friendly color editing approach for the neural radiance field to achieve a more efficient view-consistent recoloring. | Paper, Tweet |
Paper | Links |
---|---|
1) Mastering Diverse Domains through World Models - a general algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in AI. | Paper, Tweet |
2) Tracr: Compiled Transformers as a Laboratory for Interpretability - a compiler for converting RASP programs into transformer weights. This way of constructing NNs weights enables the development and evaluation of new interpretability tools. | Paper, Tweet, Code |
3) Multimodal Deep Learning - multimodal deep learning is a new book published on ArXiv. | Book, Tweet |
4) Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk - new work analyzing how generative LMs could potentially be misused for disinformation and how to mitigate these types of risks. | Paper, Tweet |
5) Why do Nearest Neighbor Language Models Work? - empirically identifies reasons why retrieval-augmented LMs (specifically k-nearest neighbor LMs) perform better than standard parametric LMs. | Paper, Code, Tweet |
6) Memory Augmented Large Language Models are Computationally Universal - investigates the use of existing LMs (e.g, Flan-U-PaLM 540B) combined with associative read-write memory to simulate the execution of a universal Turing machine. | Paper , Tweet |
7) A Survey on Transformers in Reinforcement Learning - transformers for RL will be a fascinating research area to track. The same is true for the reverse direction (RL for Transformers)... a notable example: using RLHF to improve LLMs (e.g., ChatGPT). | Paper, Tweet |
8) Scaling Laws for Generative Mixed-Modal Language Models - introduces scaling laws for generative mixed-modal language models. | Paper, Tweet |
9) DeepMatcher: A Deep Transformer-based Network for Robust and Accurate Local Feature Matching - a transformer-based network showing robust local feature matching, outperforming the state-of-the-art methods on several benchmarks. | Paper, Tweet |
10) Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement - addresses the time series forecasting problem with generative modeling; involves a bidirectional VAE backbone equipped with diffusion, denoising for prediction accuracy, and disentanglement for model interpretability. | Paper, Tweet |
Paper | Links |
---|---|
1) Muse: Text-To-Image Generation via Masked Generative Transformers - introduces Muse, a new text-to-image generation model based on masked generative transformers; significantly more efficient than other diffusion models like Imagen and DALLE-2. | Paper, Project, Code, Tweet |
2) VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers - introduces VALL-E, a text-to-audio model that performs state-of-the-art zero-shot performance; the text-to-speech synthesis task is treated as a conditional language modeling task. | Project, Tweet |
3) Rethinking with Retrieval: Faithful Large Language Model Inference - shows the potential of enhancing LLMs by retrieving relevant external knowledge based on decomposed reasoning steps obtained through chain-of-thought prompting. | Paper, Tweet |
4) SparseGPT: Massive Language Models Can Be Accurately Pruned In One-Shot - presents a technique for compressing large language models while not sacrificing performance; "pruned to at least 50% sparsity in one-shot, without any retraining." | Paper, Tweet |
5) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders - a performant model based on a fully convolutional masked autoencoder framework and other architectural improvements. CNNs are sticking back! | Paper, Code, Tweet |
6) Large Language Models as Corporate Lobbyists - with more capabilities, we are starting to see a wider range of applications with LLMs. This paper utilized large language models for conducting corporate lobbying activities. | Paper , Code, Tweet |
7) Superposition, Memorization, and Double Descent - aims to better understand how deep learning models overfit or memorize examples; interesting phenomena observed; important work toward a mechanistic theory of memorization. | Paper, Tweet |
8) StitchNet: Composing Neural Networks from Pre-Trained Fragments - new idea to create new coherent neural networks by reusing pretrained fragments of existing NNs. Not straightforward but there is potential in terms of efficiently reusing learned knowledge in pre-trained networks for complex tasks. | Paper, Tweet |
9) Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes - proposes integrated decomposition, an approach to improve Science Q&A through a human-in-the-loop workflow for refining compositional LM programs. | Paper, Code Tweet |
10) A Succinct Summary of Reinforcement Learning - a nice overview of some important ideas in RL. | Paper, Tweet |
We use a combination of AI-powered tools, analytics, and human curation to build the lists of papers.
Subscribe to our NLP Newsletter to stay on top of ML research and trends.
Join our Discord.