Skip to content

Latest commit

 

History

History
41 lines (33 loc) · 3.61 KB

project_plan.md

File metadata and controls

41 lines (33 loc) · 3.61 KB

Project Plan

This page contains the initial project plan we made in week 2.

Goal of the Reproduction

The main goal is to implement the CURL architecture by ourselves and test it, in combination with an existing SAC implementation. This will mostly be implementing the encoder model and the exponentially moving average encoder by only using details of the paper. We will reuse a part of their repo using the util functions and the RL algorithm since this is not computer vision related.

The paper has performed many experiments on lots of different environments. It uses 16 environments from DMControl and all 26 from Atari. Since we only have limited time and resources, we plan to only test a couple (2-3) environments from DMControl. Which environments we will used specifically will be determined at a later stage. If time permits it we would love to test on more environments, but we think this is good as a start. We choose DMControl over Atari because we think the DMControl environments are more interesting since they have a continuous action space instead of a (mostly) discrete action space for the Atari games.

Implementation Details

  • We use Pytorch as our Deep Learning framework.
  • We use the DeepMind Control Suite to test our network. link
  • SAC implementation: TBA

Expected Memory usage and Training Time

The authors of the paper use a batch size of 512 for training. The forward pass of the encoder should then use about 300 MB of memory for its four convolutional layers, two linear layers and activations. We're not quite sure how the memory usage works for the momentum encoder that encodes the keys, but in the worst case this is still only double the amount of memory, which is perfectly reasonable. The SAC network is fairly small and only has 2 hidden layers of size 256. Thus, this takes about 0.5 MB of memory.

Time cost for training is expected to be reasonable, but we do not know that for sure. That's we we will determine at a later stage how many environments we will test.

Further Research Options

If our work turn out to be too trivial, we purpose we extend our initial plan by one or more of the following options:

  • We test more environments. Either from the DMControl suite or Atari.
  • We Make our own implementation of SAC on top of CURL and compare this with the SAC implementation used by the authors.
  • Do the same additional experiments as the paper: detaching the encoder training from the SAC algorithm learning and then visualizing what happens with the learned kernels.
  • We find a paper that extend CURL in some way and reproduce that as well.
    • Or extend using "Decoupling Representation Learning from Reinforcement Learning"
    • Could extend the implementation by the paper "Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings".

Planning

  • Week 2 (start): Create project plan, read paper in detail and startup repo.
  • Week 3: Start implementation unsupervised model with RL algorithm (Need to find SAC algorithm) and get first images from DMcontrol suite.
  • Week 4: Work further on implementation possibly get first results.
  • Week 5: Work further on implementation. Reflect on current progress and replan coming weeks. See if any research questions come up.
  • Week 6: Training different tasks. Find possible improvements.
  • Week 7: Training etc. Start basics for blog.
  • Week 8: Continue...
  • Week 9: Continue...
  • Week 10: 90% of blog done and final runs.
  • Week 11: Blog finished and presentation created.