|
RESEARCH INTERESTS
|
Optimization (especially convex and robust), control
(robust adaptative), decision making under uncertainty and risk (especially
in Markov Decision Process), information theory, machine learning.
Applications to Operations Management, Computational Biology, Biomedical and
Social Sciences.
I am currently working with Prof. John Tsitsiklis and Duncan Simester.
|
|
Documents available on-line
|
Risk-averse and robust
control of Markov Decision processes with main contributions
·
Defined and motivated Markovian multi-period
risk measures
·
Optimized efficiently convex dynamically
consistent Markovian
risk measures of sample cost.
·
Showed the equivalence of this problem with
zero-sum sequential Markov
game against nature.
·
Proposed a tractable formulation for the
robust control of MDPs
founded in risk-sensitive decision theory.
Presented at the Canadian Operations Research Society
conference in Montreal
in May 2006
|
|
Data-driven dynamic programming
January 2004 - present
|
Dynamic programming is a standard technique for decision making under
uncertainty (to solve Markov Decision Process, a.k.a. MDP). However, this approach
requires the knowledge of many parameters of the system of interest (that is
its dynamic and cost structure). In practice, these parameters often need to
be estimated from available data. In any case, their value is imprecisely
known to the decision maker, which needs to factor this parametric
uncertainty into his or her decision making.
As time goes, the controller could precise his knowledge of the system to
make better decisions so that the well-known trade-off between exploration of
the system and exploitation of the current knowledge is central when learning
is possible/implemented.
However, it is usually impractical to solve the joint learning and control
problem (with techniques such as Partially Observed MDP). The computational
burden is intractable and the controller might not be able to implement the
optimal policy, which is non-stationary. Furthermore, there are practical
situations where learning is simply not possible. For example, let us assume
that the decision maker faces a population of heterogenous systems modelled
as stationary MDPs that are not identifiable/segmentable. Then the controller
needs to select a policy that performs well for a given population mix of
MDPs. This situation is particularly motivated by social science systems
(like marketing), where people could be modelled by uncertain MDP but cannot
be segmented by the decision maker.
In this research, we investigate several problems related to estimation
from data of MC and the stationary robust control of MDPs, including:
- estimation
of Markov Chain (MC) from population-level data. Instead of
observing individual trajectories of a MC, which can be an MDP under an
historical policy, we observe the distribution of a fixed population
over some Markov states for some time. We desentangle the (independent
identically distributed) individual trajectories that we assume follow a
single underlying MC by a maximum likelihood approach in order to
estimate the dynamics (i.e. the transition probability matrix) of an
individual. We generalize this problem to the observations of known
linear functions of the state population, and hope to push this problem
to the case where the linear observers are themselves unknown. This last
problem could have application in the disentanglement of population-level
cDNA microarray data to estimate the a homogenous cell population
dynamics and gene expression profile per state.
- control of MDP
with uncertain parameters. The precise formulation of the problem
hinges critically on two points, namely the modelization of parametric
uncertainty and the objective of the controller, which can be
risk-averse. The parametric uncertainty can be assumed adversarial by
choosing the worst parameters out of a given set for a given policy, or
it can be endowed with a distribution and then we can focus on the
expectation (or Conditional Value at Risk) with respect to the uncertain
parameters of the value function of a given policy. First, we show that
these natural robust control formulations are NP-hard (for history-dependent,
stationary randomized and deterministic policies), except for the case
of adversarial uncertainty under so-called uncertainty rectangularity
assumption (a.k.a. Markov games), where the robust policy can be
computed with almost no extra cost as shown recently by Nilim and El
Ghaoui. However, this last case is not statistically satisfactory and
can yield overly conservative policies. We mitigate the possible
conservatiness of the Markov game approach by considering an uncertain
parameter distribution, which can be obtained by a previous estimation
procedure. We propose a new uncertain MDP formulation, which provides a
continuum of robustness between Markov games and nominal DP. Our new
approach contrasts with standard DP results in some important ways, in
particular the optimal stationary policy needs not be deterministic,
even when the controller is risk-neutral. As pointed out earlier, the
problem we are dealing with is NP-hard so we consider approximation
algorithm based on gradient-techniques, mathematical programming and
certainty-equivalent control, and robust Bellman-type recursion.
|
|
Make-to-order Revenue Management
September 2002 - December 2003
|
A model for make-to-order revenue management
(submitted for publication) with Prof. Jeremie Gallien and Tor Schoenmeyr:
Abstract: Seeking to help
practitioners establish quantitative guidelines for
negotiating make-to-order contracts along the dimensions of price, quantity
and lead-time, we investigate the dynamic admission control of jobs with
hard deadlines into a single machine queue with preemptive scheduling. Using
the concept of minimum workload function, we establish that earliest
due-date scheduling can be assumed at no cost to optimality, and propose a
discrete-time formulation for the problem of maximizing long-run expected
profit. We establish some properties and a characterization of the optimal
policy, which we exploit to derive two heuristic policies (fluid
and lookahead) relying on different approximations for the
opportunity cost of accepting a job. Numerical experiments under various
load, stretch and granularity parameters suggest that they always better
than common simple static policies. Limited experiments also suggest that
the optimal static policy may perform nearly as well as our two dynamic
heuristics; but while it is simple to implement it seems challenging to
derive using known methods. Overall, our fluid heuristic stands out for its
robust performance at a relatively low computational cost, and its possible
extensions in practice to non-stationary demand and orders with staggered
deliveries.
|