We consider models for multivariate point processes where the intensity is given nonparametrically in terms of functions in a reproducing kernel Hilbert space. The likelihood function involves a time integral and is consequently not given in terms of a finite number of kernel evaluations. The main result is a representation of the gradient of the log-likelihood, which we use to derive computable approximations of the log-likelihood and the gradient by time discretization. These approximations are then used to minimize the approximate penalized log-likelihood. For time and memory efficiency the implementation relies crucially on the use of sparse matrices. As an illustration we consider neuron network modeling, and we use this example to investigate how the computational costs of the approximations depend on the resolution of the time discretization. The implementation is available in the R package ppstat.