Using Bayesian Dynamical Systems for Motion Template Libraries

Silvia Chiappa, Jens Kober, Jan Peters Max-Planck Institute for Biological Cybernetics Spemannstraße 38, 72076 Tübingen, Germany {silvia.chiappa,jens.kober,jan.peters}@tuebingen.mpg.de

Abstract
Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multi-dimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.

1 Introduction
Humans demonstrate a variety and versatility of movements far beyond the reach of current anthropomorphic robots. It is widely believed that human motor control largely relies on a set of "mental templates" [1] better known as motor primitives or motion templates. This concept has gained increasing attention both in the human motor control literature [1, 2] as well as in robot imitation learning [3, 4]. The recent suggestion of Ijspeert et al. [3] to use dynamical systems as motor primitives has allowed this approach to scale in the domain of humanoid robot imitation learning and has yielded a variety of interesting applications as well as follow-up publications. However, up to now, the focus of motion template learning has largely been on single template acquisition and self-improvement. Future motor skill learning systems on the other hand need to be able to observe several different behaviors from human presenters and compile libraries of motion templates directly from these examples with as little predetermined structures as possible. An important part of such a motor skill learning system is the clustering of many presented movements into different motion templates. Human trajectories are recorded as multi-dimensional timeseries of joint angles as well as joint velocities using either a marker-based tracking setup (e.g., a VICONT M setup), a sensing suit (e.g., a SARCOS SenSuit) or a haptic interface (e.g., an anthropomorphic master arm). Inspired by Ijspeert et al. [3], we intend to use dynamical systems as generative models of the presented trajectories, i.e., as motion templates. Our goal is to cluster 1


these multi-dimensional time-series automatically into a small number of motion templates without pre-labeling of the trajectories or assuming an a priori number of templates. Thus, the system has to discover the underlying motion templates, determine the number of templates as well as learn the underlying skill sufficiently well for robot application. In principle, one could use a non-generative clustering approach (e.g., a type of K-means) with a method for selecting an appropriate number of clusters and, subsequently, fit a dynamics systembased generative model to each cluster. Here, we take a different approach and perform clustering and learning of the underlying time-series dynamics at the same time. This way we aim at ensuring that each obtained cluster can be modeled well by its representative generative model. For determining the number of clusters, most generative approaches in the past used to train a separate model for each possible cluster configuration, and then select the one which would optimize the trade-off between accuracy and complexity, as measured, for example, by the Bayesian Information Criterion [5, 6, 7]. The drawback of these approaches is that training many separate models can lead to a large computational overhead, such that heuristics are often needed to restrict the number of possible cluster configurations used in order to ensure computational feasibility [7]. A less computationally expensive alternative is offered by recent Bayesian approaches where the model parameters are treated as random variables and integrated out yielding the marginal likelihood of the data. An appropriate prior distribution can be used to enforce a sparse representation, i.e., to select the simplest set of parameters that explains the data well by making the remaining parameters inactive. As a result, the structure selection can be achieved within the model, without the need to train and compare several separate models. To date the majority of the work on generative Bayesian models for clustering has focused on static mixture models [8, 9, 10, 11], for which a number of approximate algorithms have been developed to deal with the intractability issues arising from parameter marginalization. Clustering long or high-dimensional time-series is hard when approached with static models, such that collapsing the trajectories to a few relevant features is often required. This problem would be severe for a highdimensional motor learning system where the data needs to be represented at high sampling rates in order to ensure the capturing of all relevant details for motor skill learning. In addition, it is difficult to ensure smoothness when the time-series display high variability and, therefore, to obtain accurate generative models with static approaches. A natural alternative is to use mixtures of temporal models which explicitly model the dynamics of the time-series. In this paper, we use a Bayesian approach to Mixtures of Linear Gaussian State-Space Models (LGSSMs). LGSSMs have been frequently covered in the literature [12] due to the fact that, despite their computational simplicity, they can represent many natural dynamical processes. As we will see later in this paper, LGSSMs are powerful enough to model our timeseries sufficiently accurately while the Bayesian treatment of the model results in a parsimonious parametrization. Whilst previous Bayesian approaches to Mixtures of LGSSMs have focused on stochastic approximations to deal with model intractability [6], in this paper we introduce a deterministic approximation based on variational Bayes. Importantly, our approximation is especially designed to enable the use of standard LGSSM inference methods for the hidden state variables, which has the advantage of minimizing numerically instabilities. As a realistically difficult scenario in this first step towards large motor skill libraries, we have selected the game of dexterity Balero (also known as Ball-In-A-Cup or Kendama, see [13]) as an evaluation platform. Successfully swinging the ball into the cup requires a skilled human performer and humans tend to have a large variability in the performed movements [14]. In addition, several substantially different movements exist for performing this task. From a robotics point of view, Balero can be considered sufficiently complex as it involves movements in all major seven degrees of freedom of a human arm as well as an anthropomorphic robot arm. We are able to show that the presented method results in a reasonable amount of clusters where each movement is physically quite distinct from the other movements. The resulting generative models can be used successfully as motion templates in physically realistic simulations. In the remainder of the paper, we will proceed as follows. We will first introduce a generative approach for clustering and modeling multiple time-series with Bayesian Mixtures of LGSSMs and describe how this approach can be made tractable using a variational approximation. We will then show that the resulting model can be used to infer the motion templates underlying a set of human

2


demonstrations, and give evidence that the generative model representing each motion template is sufficiently accurate for control in a mechanically plausible simulation of the SARCOS Master Arm.

2 Bayesian Mixtures of Linear Gaussian State-Space Models
Our goal is to model both human and robot movements in order to build motion template libraries. In this section, we describe our Bayesian modeling approach and discuss both the underlying assumptions as well as how the number of motion templates can be determined. As the resulting model is not tractable for analytical solution, we introduce an approximation method based on variational Bayes. 2.1 Modeling Approach In our Bayesian approach to Mixtures of Linear Gaussian State-Space Models (LGSSMs), we are 1N given a set of N time-series1 v1::T of length T for which we define with the following marginal likelihood   z 1N ^ 1N ^ p(v1::T |1:K ,  ) = p(v1::T |z 1:N , 1:K )p(1:K |1:K ) p(z 1:N | )p( | ),
1:N 1:K

n where z n  {1, . . . , K } indicates which of a set of K LGSSMs generated the sequence v1:T . The k parameters of LGSSM k are denoted by  and have a prior distribution depending on hyperparam^ eters k . The K -dimensional vector  includes the prior probabilities of the time-series generation for each LGSSM and has prior distribution hyperparameter  .

The optimal hyperparameters are estimated by type-II maximum likelihood [15], i.e., by maximizing ^ the marginal likelihood over 1:K and  . Clustering can be performed by inferring the LGSSM that n 1N ^ most likely generated the sequence v1:T by computing arg maxk p(z n = k |v1::T , 1:K ,  ).
1:N ^ Modeling p(v1:T |z 1:N , 1:K ). As a generative temporal model for each time-series, we employ a Linear Gaussian State-Space Model (LGSSM) [12] that assumes that the observations v1:T , with vt  V , are generated from a latent Markovian linear dynamical system with hidden states h1:T , with ht  H , according to2 v v vt = B ht + t , t  N (0V , V ), hh ht = Aht-1 + t , t  N (µt , H ) .

(1 )

with p(ht |ht-1 , ) = N (Aht-1 + µt , H ), p(h1 |) = N (µ1 , ), p(vt |ht , ) = N (B ht , V ), and  = {A, B , H , V , µ1:T , }. Due to the simple structure of the model, performing inference, that is to compute quantities such as p(ht |v1:T , ), can be efficiently achieved in O(T ) operations. ^ In the presented Bayesian approach, we define a prior distribution p(|) over the parameters  ^ are the associated hyperparameters. More specifically, we define zero-mean Gaussians on wh e r e  the elements of A and on the columns of B by3
H

Standard LGSSMs assume a zero-mean hidden-state noise µt  0H . However with a timedependent mean µt = 0H a much larger set of systems can be represented, often leading to superior modeling accuracy ­ particularly in our application. A probabilistic formulation of the LGSSM is given by tT p(v1:T , h1:T |) = p(v1 |h1 , )p(h1 |) p(vt |ht , )p(ht |ht-1 , ),
=2

p

A =i |, -1 H
1 1:N v1:T 2

ij

1/2

, j =1

1 N 1 1 N N is a shorthand for v1:T , . . . , v1:T  v1 , . . . , vT , . . . , v1 , . . . , vT . Here, N (m, S ) denotes a Gaussian with mean m and covariance S , and 0X denotes an X -dimensional zero vector. The initial latent state h1 is drawn from N (µ1 , ). 3 [X ]ij and Xj denote the ij -th element and the j -th column of matrix X respectively. The dependency of the priors on H and V is chosen specifically to render a variational implementation feasible.

¨

2

 [H ]ii

e-

ij 2

©

jH [-1 ]ii A2j , p B | , -1 = i H V

¨

=1

©

j j T -1 | e- 2 Bj V Bj , 2  V |

V /2

3


where  and  are a set of hyperparameters which need to be optimized. We make the assumption that -1 , -1 and -1 are diagonal and define Gamma distributions on them. For µ1 , we define a H V zero-mean Gaussian prior while we determine the optimal values of µ2:T by formally treating them as hyperparameters. These choices are made in order to render our Bayesian treatment feasible and to obtain a sparse parametrization, as discussed in details below. In the resulting mixture model, we consider a set of K such Bayesian LGSSMs. The joint distribution over all sequences given the indicator variables and hyperparameters is defined as N K  n k 1N n ^ ^ p(v1::T |z 1:N , 1:K ) = p(v1:T |z n , 1:K ) p(k |k ),
1:K

=1

=1

n wh e r e = k,  )  denotes the probability of observation v1:T given that parameters  have been employed to generate it.

n p(v1:T |z n k

1:K

n p(v1:T |k )

Modeling p

z 1:N

|

.

As prior for  , we define a symmetric Dirichlet distribution
K  ( ) k  /K -1 p( | ) = k , ( /K )K =1

where (·) is the Gamma function and  denotes a hyperparameter that needs to be optimized. This distribution is conjugate to the multinomial, which greatly simplifies our Bayesian treatment. To model the joint indicator variables, we define p  nN 1:N n p(z | ) = p(z | ) ( | ), where p(z n = k | )  k .
=1

Such Bayesian approach favors simple model structures. In particular, the priors on A1:K and B 1:K enforce a sparse parametrization since, during learning, many ij and j get close to infinity whereby (the posterior distribution of) Aij and Bj get close to zero (see [16] for an analysis of this pruning effect). For certain k , all elements are pruned out during learning such that LGSSM 1N ^ k becomes inactive (p(z n = k |v1::T , 1:K ,  ) = 0 for all n). This means that, even if we initialize the model with a fixed number of LGSSMs, our approach ensures that the unnecessary LGSSMs are pruned out from the model during training. Thus, the optimal number of clusters is automatically determined within the model in such a way that it yields a simple solution that explains the data well. 2.2 Model Intractability and Approximate Solution The Bayesian treatment of the model is non-trivial as the integration over the parameters 1:K and  renders the computation of the required posterior distributions intractable. This problem results from the coupling in the posterior distributions between the hidden state variables h1:N and the parameters 1:T 1:K as well as between the indicators z 1:N and  , 1:K . To deal with this intractability, we use a deterministic approximation method based on variational Bayes. Variational Approximation. In our variational approach we introduce a new distribution q and make the following approximation4
1N ^ p(z 1:N , h1:N , 1:K |v1::T , 1:K ,  )  q (h1:N |z 1:N )q (z 1:N )q (1:K ). 1:T 1:T

(2 )

That is, we approximate the posterior distribution of the hidden variables of the model by one in which the hidden states are decoupled from the parameters given the indicator variables and in which the indicators are decoupled from the parameters. The approximation is achieved with a variational expectation-maximization algorithm which minimizes the KL divergence between the right and left hand sides of Equation (2), or, equivalently,
Here, we describe a collapsed approximation over  , which is considered more powerful than the alterna1N ^ tive un-collapsed approximation [17]. To simplify the notation, we omit conditioning on v1::T , 1:K ,  for the q distribution.
4

4


Figure 1: This figure shows one of the Balero motion templates found by our clustering method, i.e., the cluster C2 in Figure 2. Here, a sideways movement with a subsequent catch is performed and the uppermost row illustrates this movement with a symbolic sketch. The middle row shows an execution of the movement generated with the LGSSM representing the cluster C2. The lowest row shows a recorded human movement which was attributed to cluster C2 by our method. Note that movements generated from LGSSMs representing other clusters differ significantly.
1N ^ ^ maximizes a tractable lower bound on the log-likelihood log p(v1::T |1:K ,  )  F (1:K ,  , q ) with 1:K n ^ respect to q for fixed  and  and vice-versa. Observation vt is then placed in the most likely LGSSM by computing arg maxk q (z n = k ).

Resulting Updates. While the space does not suffice for complete derivation, we will briefly sketch the updates for q . Additional details and the updates for the hyperparameters can be found in [20]. The updates consist of a parameter update, an indicator variable update and a latent state update. First, the approximate parameter posterior is given by N n n n k n=1 q (z =k) log p(v1:T ,h1:T | ) q(hn |z n =k) ^ 1:T q (k )  p(k |k )e , ^ where · denotes expectation with respect to q . The specific choice for p(k |k ) makes the

È

q

computation of this posterior relatively straightforward, since q (k ) is a distribution of the same type. Second, the approximate posterior over the indicator variables is given by n log p(v1:T ,hn:T |k ) q(hn |zn =k)q(k ) Hq (hn:T |z n =k)+ log p(z n =k|z ¬n , ) É 1 m 1 m=n q(z ) e 1:T , q (z n = k )  e where Hq (x) denotes the entropy of the distribution q (x) and z ¬n includes all indicatorvariables except for z n . Due to the choice of a Dirichlet prior, the term p(z n = k |z ¬n ,  ) = p(z n = ¬n k |z ,  )p( ,  ) can be determined analytically. However, the required average over this term is computationally expensive, and, thus, we approximate it using a second order expansion [17]. The third and most challenging update is the one of the hidden states n log p(v1:T ,hn:T |k ) q(k ) 1 q (hn:T |z n = k )  e . (3 ) 1 Whilst computing this joint density is relatively straightforward, the parameter hnd indicator variable a w updates require the non-trivial estimation of the posterior averages hn and n hn-1 ith respect t tt to this distribution. Following a similar approach to the one proposed in [18] for the Bayesian LGSSM, we reformulate the rhs of Equation (3) as proportional to the distribution of an augmented LGSSM such that standard LGSSM inference routines in the literature can be used.

3 Results
In this section, we show that the model presented in Section 2 can be used effectively both for inferring the motion templates underlying a set of human trajectories and for approximating motion 5


0.3

C1

0.3

C2

0.3

C3

0.3

C4

0.3

C5

Z
Z

Z

Z

Z

-1 1
-0.2 -0.3 Y -0.6 0 X

-1 1 Y X -0.6 -0.5 -0.6 -0.5 0.7 Y X 0.7

-0.2 0.4 -0.3 Y -0.6 0 C6 C7 X

-0.2 0.4 -0.3 Y -0.6 0 C8 X

0.4

0.3

0.3

0.3

0.3

C9

Z

Z

Z

Z

-0.2 -0.3 Y -0.6 0 X

-0.2 0.4 -0.3 Y -0.6 0 X

-0.2 0.4 -0.3 Y -0.6 0 X

-0.2 0.4 -0.3 Y -0.6 0 X

0.4

Figure 2: In this figure, we show nine plots where each plot represents one cluster found by our method. Each of the five shown trajectories in the respective clusters represents a different recorded Balero movement. For better visualization, we do not show joint trajectories here but rather the trajectories of the cup which have an easier physical interpretation and, additionally, reveal the differences between the isolated clusters. All axes show units in meters.

templates with dynamical systems. For doing so, we take a difficult task often used in the motor control literature, i.e., Balero, and collect human movements using a motion capture setup. We show that the presented model successfully extracts meaningful human motion templates underlying Balero, and that the movements generated by the model are successful in simulation of the Balero task on an anthropomorphic SARCOS arm. 3.1 Data Generation of Balero Motions As experimental benchmark, we use the children motor learning toy of Balero, also known as BallIn-A-Cup or Kendama. In this game, a human is given a toy consisting of a cup with a ball attached by a string. The goal of the human is to toss the ball into the cup. Humans perform a wide variety of different movements in order to achieve this task [14]. For example, three very distinct movements are: (i) swing the hand slightly upwards to the side and then go back to catch the ball, (ii) hold the cup high and then move very fast to catch the ball, and (iii) jerk the cup upwards and catch the ball in a fast downwards movement. Whilst the difference in these three movements is significant and can be easily detected visually, there exist many other movements for which this is not the case. We collected 124 different Balero trajectories where the subject was free to select the employed movement. For doing so, we used a VICONT M data collection system which samples the trajectories at 200Hz to track both the cup as well as all seven major degrees of freedom of the human arm. The data was preprocessed upon data collection. For the evaluation of our method, we considered the seven joint angles of the human presenter as well as the corresponding seven estimated joint velocities. As a result, we obtained a set of 14-dimensional time-series of 0.64s (equivalent to 128 time-steps at our sampling rate) length where the length was estimated before the recording and automatically set. In the lowest row of Figure 1, we show how the human motion is collected with a VICONT M motion tracking setup. As we will see later, this specific movement is assigned by our method to cluster C2 whose representative generative LGSSM can be used successfully for imitating this motion (middle row). A sketch of the represented movement is shown in the top row of Figure 1. 3.2 Clustering and Imitation of Motion Templates We trained the variational method with different initial conditions with hidden dimension H = 35 and number of clusters K which varied from 20 to 50 in order to avoid suboptimal results due to local maxima. 6


Execution 1
0.5 0.5

Execution 2
0.5

Execution 1
0.5

Execution 2

Positions[rad]

0

0

0

0

-0.4

0.16

0.32

0.48

0.64

-0.4

0.16

0.32

0.48

0.64

-0.4

0.16

0.32

0.48

0.64

-0.4

0.16

0.32

0.48

0.64

Velocities[rad/s]

5

5

5

5

0

0

0

0

-4

0.16

0.32

0.48

0.64

-4

0.16

0.32

0.48

0.64

-4

0.16

0.32

0.48

0.64

-4

0.16

0.32

0.48

0.64

Time[s]

Time[s]

Time[s]

Time[s]

(a)

(b)

Figure 3: (a) Time-series recorded from two executions of the Balero movement assigned by our model to cluster C1. In the first and second rows are plotted the positions and velocities respectively. (b) Two executions of the Balero movement generated by our trained model using probability distributions of cluster C1. Our clustering method results in nine different motion templates. These are plotted in Figure 2, where, instead of the 14-dimensional joint angles and velocities represented by our motion template, we show the three-dimensional cup trajectories resulting from these joint movements, as it is easier for humans to make sense of cartesian trajectories. Clusters C1, C2 and C3 are movements to the side which subsequently catch the ball. Here, C1 is a short jerk, C3 appears to have a circular movement similar to a jerky movement, while C2 uses a longer but smoother movement to induce kinetic energy in the ball. Motion templates C4 and C5 are dropping movements where the cup moves down fast for more than 1.2m and then catches the ball. The template C5 is a smoother movement than C4 with a wider catching movement. For C6 and C7, we observe a significantly different movement where the cup is jerked upwards dragging the ball in this direction and then catches the ball on the way down. Clusters C8 and C9 exhibit the most interesting movement where the main motion is forward-backwards and the ball swings into the cup. In C8 this task is achieved by moving upwards at the same time while in C9 there is little loss of height. To generate Balero movements with our trained model, we can use the recursive formulation of the LGSSM given by Equation 1 where, for each cluster k , Ak , B k and µk are replaced by the mean values of their Gaussian q posterior distributions, while the noise covariances are replaced by the h v modes of their Gamma q posteriors. The initial hidden state h1 and the noise elements t and t are sampled from their respective distributions. In Figure 3 (a) we plotted two recorded executions of the Balero movement assigned by our model to cluster C1, while in Figure 3 (b) we plotted two executions generated by our model using the learned distributions from cluster C1. As we can see, our model can generate time-series with very similar dynamics to the ones of the recorded time-series and is thus well-suited for imitation learning. The resulting motor templates are accurate enough for use in a physically realistic simulation of a Balero task, as shown in Figure 1 for cluster C2 (middle row) and in the video on the author's website. Please note that a small visual feedback term based on a Jacobian transpose method was activated when the ball was within 3cm in order to ensure task-fulfillment; this additional term was inspired by Miyamoto et al. [19] and only results in a minor correction.

4 Conclusions
In this paper, we addressed the problem of automatic generation of skill libraries for both robot learning and human motion analysis as a unsupervised multidimensional time-series clustering and learning problem based on human trajectories. We have introduced a novel Bayesian temporal mixture model based on a variational approximation method which is especially designed to enable the use of efficient inference algorithms. We demonstrated that our model obtain a meaningful clustering of human executions of the difficult game of dexterity Balero and is able to generate timeseries which are very close to the recorded ones. Finally, we have shown that the model can be used 7


to obtain successful executions of the Balero movements on a physically realistic simulation of the SARCOS Master Arm.

5 Acknowledgments
The authors would like to thank David Barber for useful discussions and Betty Mohler for help with data collection.

References
[1] T. Flash and B. Hochner. Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6):660­666, 2005. [2] B. Williams, M. Toussaint, and A. Storkey. Modelling motion primitives and their timing in biologically executed movements. In Advances in Neural Information Processing Systems 20, pages 1609­1616, 2008. [3] A. Ijspeert, J. Nakanishi, and S. Schaal. Learning attractor landscapes for learning motor primitives. In Advances in Neural Information Processing Systems 15, pages 1547­1554, 2003. [4] S. Calinon, F. Guenter, and A. Billard. On learning, representing and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B, 37(2):286­298, 2007. [5] Y. Xiong and D-Y. Yeung. Mixtures of ARMA models for model-based time series clustering. In Proceedings of the IEEE International Conference on Data Mining, pages 717­720, 2002. [6] L. Y. Inoue, M. Neira, C. Nelson, M. Gleave, and R. Etzioni. Cluster-based network model for time-course gene expression data. Biostatistics, 8:507­525, 2007. [7] C. Li and G. Biswas. A Bayesian approach to temporal data clustering using hidden Markov models. In Proceedings of the International Conference on Machine Learning, pages 543­550, 2000. [8] C. E. Rasmussen. The infinite Gaussian mixture model. In Advances in Neural Information Processing Systems 12, pages 554­560, 2000. [9] Z. Ghahramani and M. J. Beal. Variational inference for Bayesian mixtures of factor analysers. In Advances in Neural Information Processing Systems 12, pages 449­455, 2000. [10] A. Kottas. Dirichlet process mixtures of beta distributions, with applications to density and intensity estimation. Workshop on Learning with Nonparametric Bayesian Methods, International Conference on Machine Learning, 2006. [11] D. Görür. Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning. Ph.D. Thesis, Technischen Universität Berlin, Germany, 2007. [12] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods. Oxford Univ. Press, 2001. [13] J. Kober, B. Mohler and J. Peters. Learning Perceptual Coupling for Motor Primitives. International Conference on Intelligent Robots and Systems, pages 834­839, 2008. [14] S. Fogel, J. Jacob, and C. Smith. Increased sleep spindle activity following simple motor procedural learning in humans. Actas de Fisiologia, 7(123), 2001. [15] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge Univ. Press, 2003. [16] D. Wipf and J. Palmer and B. Rao. Perspectives on Sparse Bayesian Learning. In Advances in Neural Information Processing Systems 16, 2004. [17] K. Kurihara, M. Welling, and Y. W. Teh. Collapsed variational Dirichlet process mixture models. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 2796­2801, 2007. [18] D. Barber and S. Chiappa. Unified inference for variational Bayesian linear Gaussian state-space models. In Advances in Neural Information Processing Systems 19, pages 81­88, 2007. [19] H. Miyamoto and S. Schaal and F. Gandolfo and Y. Koike and R. Osu and E. Nakano and Y. Wada and M. Kawato. A Kendama learning robot based on bi-directional theory. Neural Networks, 9(8): 1281­1302, 1996 [20] S. Chiappa and D. Barber. Dirichlet Mixtures of Bayesian Linear Gaussian State-Space Models: a Variational Approach. Technical Report no. 161, MPI for Biological Cybernetics, Tübingen, Germany, 2007.

8