DARL Reproduction

We want to train an SAC model based on pixel input. In order to achieve that we’re gonna need an encoder that will map the image into latent space. The encoder is going to be shared between the actor and Q-network, although during training of the actor the encoder is frozen.

The encoder will be jointly trained with Q-network and additional classifier. The classifier will receive image embedding and try to classify it to the domain it belongs. Between classifier module and encoder module there’ll be a Gradient Reversal layer - it’s supposed to make the encoder domain-invariant my maximizing domain classification loss.

The rest is a standard SAC training (with some data augmentation: https://arxiv.org/pdf/2004.14990.pdf) for continuous domain. Unfortunately due to high workload at university I didn’t manage to wrap up my findings in a paper.

Git repository