PCA is a common way to reduce numerical features. Factor analysis, Lisrel anyone?, is a related method involving a latent factor model and fit measures to confirm a certain factor structure: it uses the covariance matrix as input. Although it is possible to compute a covariance matrix on binary features using tetrachoric correlations, I was always warned against using factor analysis on binary features.

Modelling categorical features as little subnets is already a step ahead for blatantly dummy-fying categorical variables. RBM’s might offer a way to detect patterns in categorical feature vectors. RBM’s basically extract patterns from binary vectors, compressing the vector to a lower dimension. This is sort of no surprise, RBM’s are the basic building blocks of deep belief networks, or how to discover patterns of patterns of … all the way down. Still nice I nailed that one.

RBM are deemed a bit old fashioned and surpassed by more modern deep learning approaches. Still it could give a good insight into the order of magnitude of things. There are a lot of hyper-parameters in neural nets, any guidance it welcome AFAIK.