Subspace models for image data sets, constructed by computing sparse representations of each image with respect to other images in the set, have been found to perform very well in a variety of applications, including clustering and classification problems. One of the limitations of these methods, however, is that the subspace representation is unable to directly model the effects of non-linear transformations such as translation, rotation, and dilation that frequently occur in practice. In this paper it is shown that the properties of convolutional sparse representations can be exploited to make these methods translation invariant, thereby simplifying or eliminating the alignment pre-processing task. The potential of the proposed approach is demonstrated in two diverse applications: image clustering and video background modeling.