A note on stochastic context-free grammars, termination and the EM-algorithm


Termination of a stochastic context-free grammar, i.e. almost sure finiteness of the random trees it produces, is shown to be equivalent to extinction of an embedded multitype branching process. We show that the maximum likelihood estimator in a saturated model based on complete or partial observation of a finite tree always gives terminating grammars. With partial observation we show that this in fact holds for the whole sequence of parameters obtained by the EM-algorithm. Finally, aspects of the size of the tree related to the embedded branching process is discussed.