Utilities¶
tinytopics.utils
¶
set_random_seed(seed)
¶
Set the random seed for reproducibility across Torch and NumPy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed
|
int
|
Random seed value. |
required |
generate_synthetic_data(n, m, k, avg_doc_length=1000, device=None)
¶
Generate synthetic document-term matrix for testing the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of documents. |
required |
m
|
int
|
Number of terms (vocabulary size). |
required |
k
|
int
|
Number of topics. |
required |
avg_doc_length
|
int
|
Average number of terms per document. Default is 1000. |
1000
|
device
|
device
|
Device to place the output tensors on. |
None
|
Returns:
Type | Description |
---|---|
Tensor
|
Document-term matrix. |
ndarray
|
True document-topic distribution (L). |
ndarray
|
True topic-term distribution (F). |
align_topics(true_F, learned_F)
¶
Align learned topics with true topics for visualization, using cosine similarity and linear sum assignment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
true_F
|
ndarray
|
Ground truth topic-term matrix. |
required |
learned_F
|
ndarray
|
Learned topic-term matrix. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Permutation of learned topics aligned with true topics. |
sort_documents(L_matrix)
¶
Sort documents grouped by dominant topics for visualization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
L_matrix
|
ndarray
|
Document-topic distribution matrix. |
required |
Returns:
Type | Description |
---|---|
list
|
Indices of documents sorted by dominant topics. |