Homepage of Yann Le Tallec

RESEARCH INTERESTS


RESEARCH INTERESTS

Optimization (especially convex and robust), control (robust adaptative), decision making under uncertainty and risk (especially in Markov Decision Process), information theory, machine learning. Applications to Operations Management, Computational Biology, Biomedical and Social Sciences.

I am currently working with Prof. John Tsitsiklis and Duncan Simester.

Documents available on-line

Risk-averse and robust control of Markov Decision processes with main contributions

·         Defined and motivated Markovian multi-period risk measures

·         Optimized efficiently convex dynamically consistent Markovian

  risk measures of sample cost.

·         Showed the equivalence of this problem with zero-sum sequential Markov

  game against nature.

·         Proposed a tractable formulation for the robust control of MDPs

  founded in risk-sensitive decision theory.

Presented at the Canadian Operations Research Society conference in Montreal in May 2006

Data-driven dynamic programming

January 2004 - present

Dynamic programming is a standard technique for decision making under uncertainty (to solve Markov Decision Process, a.k.a. MDP). However, this approach requires the knowledge of many parameters of the system of interest (that is its dynamic and cost structure). In practice, these parameters often need to be estimated from available data. In any case, their value is imprecisely known to the decision maker, which needs to factor this parametric uncertainty into his or her decision making.

As time goes, the controller could precise his knowledge of the system to make better decisions so that the well-known trade-off between exploration of the system and exploitation of the current knowledge is central when learning is possible/implemented.

However, it is usually impractical to solve the joint learning and control problem (with techniques such as Partially Observed MDP). The computational burden is intractable and the controller might not be able to implement the optimal policy, which is non-stationary. Furthermore, there are practical situations where learning is simply not possible. For example, let us assume that the decision maker faces a population of heterogenous systems modelled as stationary MDPs that are not identifiable/segmentable. Then the controller needs to select a policy that performs well for a given population mix of MDPs. This situation is particularly motivated by social science systems (like marketing), where people could be modelled by uncertain MDP but cannot be segmented by the decision maker. 

In this research, we investigate several problems related to estimation from data of MC and the stationary robust control of MDPs, including:

  •  estimation of Markov Chain (MC) from population-level data. Instead of observing individual trajectories of a MC, which can be an MDP under an historical policy, we observe the distribution of a fixed population over some Markov states for some time. We desentangle the (independent identically distributed) individual trajectories that we assume follow a single underlying MC by a maximum likelihood approach in order to estimate the dynamics (i.e. the transition probability matrix) of an individual. We generalize this problem to the observations of known linear functions of the state population, and hope to push this problem to the case where the linear observers are themselves unknown. This last problem could have application in the disentanglement of population-level cDNA microarray data to estimate the a homogenous cell population dynamics and gene expression profile per state.

 

  • control of MDP with uncertain parameters. The precise formulation of the problem hinges critically on two points, namely the modelization of parametric uncertainty and the objective of the controller, which can be risk-averse. The parametric uncertainty can be assumed adversarial by choosing the worst parameters out of a given set for a given policy, or it can be endowed with a distribution and then we can focus on the expectation (or Conditional Value at Risk) with respect to the uncertain parameters of the value function of a given policy. First, we show that these natural robust control formulations are NP-hard (for history-dependent, stationary randomized and deterministic policies), except for the case of adversarial uncertainty under so-called uncertainty rectangularity assumption (a.k.a. Markov games), where the robust policy can be computed with almost no extra cost as shown recently by Nilim and El Ghaoui. However, this last case is not statistically satisfactory and can yield overly conservative policies. We mitigate the possible conservatiness of the Markov game approach by considering an uncertain parameter distribution, which can be obtained by a previous estimation procedure. We propose a new uncertain MDP formulation, which provides a continuum of robustness between Markov games and nominal DP. Our new approach contrasts with standard DP results in some important ways, in particular the optimal stationary policy needs not be deterministic, even when the controller is risk-neutral. As pointed out earlier, the problem we are dealing with is NP-hard so we consider approximation algorithm based on gradient-techniques, mathematical programming and certainty-equivalent control, and robust Bellman-type recursion.

Make-to-order Revenue Management


September 2002 - December 2003

A model for make-to-order revenue management (submitted for publication) with Prof. Jeremie Gallien and Tor Schoenmeyr:

Abstract: Seeking to help practitioners establish quantitative guidelines for
negotiating make-to-order contracts along the dimensions of price, quantity
and lead-time, we investigate the dynamic admission control of jobs with
hard deadlines into a single machine queue with preemptive scheduling. Using
the concept of minimum workload function, we establish that earliest
due-date scheduling can be assumed at no cost to optimality, and propose a
discrete-time formulation for the problem of maximizing long-run expected
profit. We establish some properties and a characterization of the optimal
policy, which we exploit to derive two heuristic policies (fluid
and lookahead) relying on different approximations for the
opportunity cost of accepting a job. Numerical experiments under various
load, stretch and granularity parameters suggest that they always better
than common simple static policies. Limited experiments also suggest that
the optimal static policy may perform nearly as well as our two dynamic
heuristics; but while it is simple to implement it seems challenging to
derive using known methods. Overall, our fluid heuristic stands out for its
robust performance at a relatively low computational cost, and its possible
extensions in practice to non-stationary demand and orders with staggered
deliveries.


This page can be found on the web at http://www.carva.org/yann.le-tallec and has been modified the last time on 10 Mars 2005. The email address of the author is letallec at mit.edu .