The state vectors can be obtained by the APIs provided by DL frameworks (e.g., Keras, Tensorflow).
We take Keras as an example:
With the "return_sequences=True", we can easily obtain the state vectors produced when conducting prediction.
Next, with the all the obtained state vectors, we can fit a PCA model and transform the state vectors into a new space, keeping only the first k principal components.
Then, for the transformed state vectors, we analyze the value range of each dimension and divide the range into m splits (or buckets). Now the state vectors are mapped into the abstracted high-dimensional grids.
The state vectors are recorded for each sample and follows the sequential order they are generated. For each sequence of state vectors, we establish the transitions between these abstract states and record the frequency.
Now, we have built the DTMC model.
According to previously statistics on the range of each dimension, the state_vector_pca can be mapped to the abstract states, and a walk over the DTMC model can used to check the sample's abstract behavior, i.e., the states and transitions covered by the walk. Naturally, all the similarity metrics and coverage criteria can be derived.