Asynchronous one-step Q-learning