Spread the Knowledge

Problems

- As the vocabulary size (V) is large, these vectors will be large in size.
- They will be sparse as a word may not have co-occurred with all possible words.

Resolution

- Dimensionality Reduction using approaches like
- Singular Value Decomposition (SVD) of the term document matrix to get a K dimensional approximation.
- Other Matrix factorisation techniques can be employed for dimensionality reduction.

Possible followup question : What is the information lost in approximating a V dimensional word representation with a K dimensional representation. Answer: SVD finds the best possible K dimensional approximation of the term-document matrix from a information theoretic perspective.

Spread the Knowledge