(a previous version of this post was removed because of a missing tag. I am sorry for this and hope to have fixed it. A message would have been nice, though since i can't add tags afterwards)
//Edit: ES=Evolution Strategies, RL=Reinforcement Learning
Since people recognized that ES can solve RL-tasks, which the ES community knew more than 10 years ago, we have a crazy amount of RL algorithms based on ES. However, the ML/RL field is not looking at what the ES community is doing, but is basically repeating the same mistake the community did more than 20 years ago. The OpenAI paper would not pass any review in an ES track at GECCO because the algorithm would not be even considered a valid baseline anymore. While it is okay for the first paper reintroducing this to not know stuff, it is not okay for the follow-up work. This ignorance of SOTA in the field while knowing that the field exists is worrying.
To make this a bit more productive, here are a few references:
1.most importantly The original ES-based RL paper:
Heidrich-Meisner, Verena, and Christian Igel. "Neuroevolution strategies for episodic reinforcement learning." Journal of Algorithms 64.4 (2009): 152-168.
2. CMA-ES and NES
Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 11(1), 1-18.
Krause, O., Arbonès, D. R., & Igel, C. (2016). CMA-ES with optimal covariance update and storage complexity. In Advances in Neural Information Processing Systems (pp. 370-378).