Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
This blog is about the first part of my internship which I did at the Intelligent and Autonomous Systems group at CWI, Amsterdam, under the supervision of Dr Hendrik Baier.
We wanted to find out a decent way to train neural networks using Evolution Strategies to play board games. Due to CoViD’19 I did my internship virtually, and I didn’t had access to CWI servers. Hence, we miniaturised the problem statement to solving a simple board game like Connect Four.
Published:
C51 is one of my favourite RL algorithms, because of its unique approach of calculating the distribution of returns rather than calculating the expectation of rewards. I firstly heard of this approach in this(AI Prism: Deep RL Bootcamp) youtube video. Someone with basic knowledge of DQN and policy gradient algorithms should watch the complete course. C51 has some modified versions too like QR-DQN and IQN. C51 can be easily visualised compared to QR-DQN and IQN, so before going into them, we must know about C51 and the distributional perspective of RL. I have implemented C51 and QR-DQN, and this blog will help readers to understand Distributional RL. I have also written some of my observations. You can find my code here.
The idea behind learning the distribution of future returns instead of just learning the expected value is to make the model capable of learning intrinsic randomness(Stochastic dynamics, Stochastic reward) of the returns. Consider a person who bought a lottery ticket and so did 1000 other people. 10 lucky people will get 1M dollar if they win, which is 1 in a 100 chance he will get 1M dollars otherwise he will get 0 dollars with chance 90 in 100 so, expected amount a person will get is 1000 dollars and traditional value-based Deep RL algorithms learn to calculate the expected reward, but in reality he will never get reward of 1000 dollars, i.e. we are getting false knowledge of the possible returns. Returns can be multimodal like in the lottery scenario due to which variance is too high which makes the convergence difficult.
Published:
layout: post title: ‘XOR-LSTM; Bits Parity’ permalink: /posts/2019/10/BitsParity/ tags: [LSTM, OpenAI] — This is a part of OpenAI Request to Research 2.0 warmup problems. I thought it was a good idea to work on this problem in my leisure time. I came to know about this “Request to Research” thing while going through OpenAI SpinningUP RL( worth a read for someone interested in Reinforcement Learning).
Task:
Train an LSTM to solve the XOR problem: that is, given a sequence of bits, determine its parity. The LSTM should consume the sequence, one bit at a time, and then output the correct answer at the sequence’s end. Test the two approaches below:
Published:
Short description of portfolio item number 1
Published:
Short description of portfolio item number 2
Published in IEEE-NIH (HI-PoCT) 2019 Conference , Bethesda, Maryland, 2019
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.