Maximum Entropy Inverse Reinforcement Learning: Algorithms and Computation

In the previous post we introduced inverse reinforcement learning. We defined the problem that is associated with this field, which is that of reconstructing a reward function given a set of demonstrations, and we saw what the ability to do this implies. In addition to this, we also saw came across some classification results as well as convergence guarantees from selected methods that were simply referred to in the post. There were some challenges with the classification results that we discussed, and although there were attempts to deal with these, there is still quite a lot that we did not talk about.

Maximum Entropy Inverse Reinforcement Learning

We shall now introduce a probabilistic approach based on what is known as the principle of maximum entropy, and this provides a well defined globally normalised distribution over decision sequences, while providing the same performance assurances as previously mentioned methods. This probabilistic approach allows moderate reasoning about uncertainty in the setting inverse reinforcement learning, and the assumptions further limits the space in which we search for solutions which we saw, last time, was quite massive.…

By | May 22nd, 2020|Uncategorized|0 Comments

Inverse Reinforcement Learning: The general basics

Standard Reinforcement Learning

The very basic ideas in Reinforcement Learning are usually defined in the context of Markov Decision Processes. For everything that follows, unless stated otherwise, assume that the structures are finite.

A Markov Decision Process (MDP) is a tuple (S,A, P, \gamma, R) where the following is true:
1. S is the set of states s_k with k\in \mathbb{N} .
2. A is the set of actions a_k with k\in \mathbb{N} .
3. P is the matrix of transition probabilities for taking action a_k given state s_j.
4. \gamma is the discount factor in the unit interval.
5. R is defined as the reward function, and is taken as a function from A\times S\to \mathbb{R}.

In this context, we have policies as maps

\pi:S\to A ,

state value functions for a policy, \pi , evaluated at s_1 as

V^\pi(s_1)=\mathbb{E}[\sum_{i=0}\gamma ^i R(s_i)|\pi],

and state action values defined as

Q^\pi (s,a)=R(s)+\gamma \mathbb{E}_{s'\sim P_{sa}}[V^\pi (s')].

The optimal functions are defined as

V^*(s)=\sup_\pi V^{\pi}(s),

and

Q^*(s,a)=\sup_\pi Q^\pi (s,a).

Here we assume that we have a reward function, and this reward function is used to determine an optimal policy.

By | May 17th, 2020|Uncategorized|0 Comments

Correlation vs Mutual Information

This post is based on a (very small) part of the (dense and technical) paper Fooled by Correlation by N.N. Taleb, found at (1)

Notes on the main ideas in this post are available from Universidad de Cantabria, found at (2)

The aims of this post are to 1) introduce mutual information as a measure of similarity and 2) to show the nonlinear relationship between correlation and information my means of a relatively simple example

Introduction

A significant part of Statistical analysis is understanding how random variables are related – how much knowledge about the value of one variable tells us about the value of another. This post will consider this issue in the context of Gaussian random variables. More specifically, we will compare- and discuss the relationship between- correlation and mutual information.

Mutual Information

The Mutual Information between 2 random variables is the amount of information that one gains about a random variable by observing the value of the other.…

By | March 28th, 2020|English, Level: intermediate, Uncategorized|3 Comments

The Res-Net-NODE Narrative

Humble Beginnings: Ordinary Differential Equations

The story begins with differential equations. Consider f such that f:[0,T]\times \mathbb{R}^n\to \mathbb{R}^n is a continuous function. We can construct a rather simple differential equation given this in the following way. We let

\begin{cases}  {y'(t)}=f(t,y(t))\\  y(0)=y_0\in \mathbb{R}^n  \end{cases}

A solution to this system is a continuous map that is defined in the neighbourhood of t=0 such that this map satisfies the differential equation.

Ordinary differential equations are well-studied, and we know that, for example, a solution to the given differential equation will exist whenever the function f satisfies the following:

(\forall x,y\in \mathbb{R}^n)(\exists C>0 (\in \mathbb{R}))(||f(t,y)-f(t,x)||\leq C||y-x||)

This property is known as Lipschitz continuity. A function that satisfies this condition is said to be Lipschitz. We shall see that whenever we require this condition, on up coming situations, wonderful things happen!

A remarkable field that almost always couples with differential equations is numerical analysis, where we learn to solve differential equations numerically and we study these numerical schemes. We shall explore numerical integration briefly.…

By | March 18th, 2020|Uncategorized|Comments Off on The Res-Net-NODE Narrative

Scaled Reinforcement Learning: A Brief Introduction to Deep Q-Learning

This blog post is a direct translation of a talk that was given by the author on the 17th of February 2020. The ideas was to very briefly introduce Deep Q-Learning to an audience that was familiar with the fundamental concepts of reinforcement learning. If the person reading this is not familiar with these basics, then a very great introduction can be found here: An Introduction to Reinforcement Learning. Without the additional details from the talk, one will note that this post is rather brief, and should really be used as a tool to gain an overview for the method or a gateway to relevant resources. This will not the case for posts later in the series, because the intention is to deal more with the mathematical aspect of reinforcement learning.

Basic Reinforcement Learning Notions

The idea behind reinforcement learning is that there is an agent that interacts with the environment in order to achieve a certain task.…

By | March 18th, 2020|Uncategorized|0 Comments

Curves for the Mathematically Curious – an anthology of the unpredictable, historical, beautiful and romantic, by Julian Havil – a review

NB I was sent this book as a review copy.

What a beautiful idea. What a beautiful book! In studying mathematics, one comes across various different curves while studying calculus, or number theory, or geometry in various forms and they are asides of the particular subject. The idea however of flipping the script and looking at curves themselves and from them gaining insight into: statistics, combinatorics, number theory, analysis, cryptography, fractals, Fourier series, axiomatic set theory and so much more is just wonderful.

This book looks at ten carefully chosen curves and from them shows how much insight one can get into vast swathes of mathematics and mathematical history. The curves chosen are:

  1. The Euler Spiral – an elegant spiral which leads to many other interesting parametrically defined curves
  2. The Weierstrass Curve – an everywhere continuous but nowhere differentiable function
  3. Bezier Curves – which show up in computer graphics and beyond
  4. The Rectangular Hyperbola – which leads to the investigation of logarithms and exponentials
  5. The Quadratrix of Hippies – which are tightly linked to the impossible problems of antiquity
  6. Peano’s Function and Hilbert’s Curve – space filling curves which lead to a completely flipped understanding of the possibilities of infinitely thin lines
  7. Curves of Constant Width – curves which can perfectly fit down a hallway as they rotate.
By | March 15th, 2020|Book reviews, Reviews, Uncategorized|1 Comment

The Objective Function

In both Supervised and Unsupervised machine learning, most algorithms are centered around minimising (or, equivalently) maximising some objective function. This function is supposed to somehow represent what the model knows/can get right. Normally, as one would expect, the objective function does not always reflect exactly what we want.

The objective function presents 2 main problems: 1. how do we minimise it (the answer to this is up for debate and there is lots of interesting research about efficient optimisation of non-convex functions and 2) assuming we can minimise it perfectly, is it the correct thing to be minimising?

It is point 2 which is the focus of this post.

Let’s take the example of square-loss-linear-regression. To do so we train a linear regression model with a square loss \mathcal{L}(\mathbf{w})=\sum_i (y_i - \mathbf{w}^Tx_i)^2. (Where we are taking the inner product of learned weights with a vector of features for each observation to predict the outcome).…

By | February 20th, 2020|Level: Simple, Uncategorized|0 Comments

Tales of Impossibility – The 2000 year quest to solve the mathematical problems of antiquity, by David S. Richeson – a review

NB I was sent this book as a review copy.

Four impossible puzzles, all described in detail during the height of classical Greek Mathematics. All simple to define and yet so tempting that it has taken not only the brain power of many, many thousands of mathematicians (amateur and professional alike), but also two millennia to show that however hard you may try, these puzzles are just not possible. The puzzles are:

  • Squaring the circle: With only a compass and a straight edge, draw a square with the same area as that of a given circle.
  • Doubling the cube: With only a compass and a straight edge, draw the edge of a cube with volume twice that of a cube whose edge is given.
  • Constructing regular polygons: Given a compass and a straight edge, construct a regular n-gon in a given circle for n\ge 3.
  • Trisecting an angle: Given a compass and a straight edge, and a given angle, construct an angle that is one third of the original.
By | February 9th, 2020|Uncategorized|1 Comment

What is mathematics?

Below you find some thoughts on this wide question, I encourage you to think about. What is your vision of mathematics? It will be most probably the result of your own experience with the subject, traumas that happened along the way and realizing that, could make you more conscious about your relationship with the subject and the walls you might have built against the subject or part of the subject. In a sense, by understanding the bias and blockage, by objectively thinking about its value, you could allow yourself to be able to equip yourself with the full set of skills mathematics gives you to build your own greatest life.

Many of thoughts in the following around this topic are taken from the paper I recommend to read: Teaching and Learning “What is Mathematics and why we should ask, where one should experience and learn that and how to teach it”, by Gunter M.…

By | December 16th, 2019|Uncategorized|0 Comments

Reasoning and making sense: a pillar of mathematics?

An essential part of learning mathematics is about reasoning and making sense. What does this exactly mean?

When a student is given a problem, he needs to make sense of it, from his level of perceptive which is unique to each individual. This will come with big struggle, and the important next step is to stay motivated, curious, be extremely perseverant and not give up after the first few attempts. This might also require a good relationship with mistakes.

A students will have to develop his own strategy to solve a given problem. That might imply first to translate it in their own language, use their own words and knowledge background to get (understand) the actual question and problem they are attempting to solve.

They will have to build bridge in their mind to similar problem they have solve in the past even though they might seem different. This bridge will be easier and easier to connect with practice and experience and sometimes might not work and some other connections will need to be created until finding a suitable one.…

By | December 16th, 2019|Uncategorized|0 Comments