The product of a dense tensor with a vector in every mode except one, called a tensor-vector product, is a key operation in several algorithms for computing the canonical tensor decomposition. In these applications, it is even more common to compute a tensor-vector product with the same tensor and r concurrently available sets of vectors, an operation we refer to as a multiple-vector tensor-vector product (MTVP). Current techniques for implementing these operations rely on explicitly reordering the elements of the tensor in order to leverage available matrix libraries. This approach has two significant disadvantages: reordering the data can be expensive if only a small number of concurrent sets of vectors is available in the MTVP, and this requires excessive amounts of additional memory. In this work, we consider two techniques resolving these issues. Successive contractions are proposed to eliminate explicit data reordering, while blocking tackles the excessive memory consumption. The numerical experiments on a wide variety of tensor shapes indicate the effectiveness of these optimizations, clearly illustrating that the additional memory consumption can be limited to tolerable amounts, generally without sacrificing expeditious execution. For several fourth-order tensors, the additional memory requirements were three orders of magnitude smaller than competing implementations, while throughputs of upward of 75% of the peak performance of the computer system can be attained for large values of r.