Skip to content

Utilities

tinytopics.utils

set_random_seed(seed)

Set the random seed for reproducibility across Torch and NumPy.

Parameters:

Name Type Description Default
seed int

Random seed value.

required

generate_synthetic_data(n, m, k, avg_doc_length=1000, device=None)

Generate synthetic document-term matrix for testing the model.

Parameters:

Name Type Description Default
n int

Number of documents.

required
m int

Number of terms (vocabulary size).

required
k int

Number of topics.

required
avg_doc_length int

Average number of terms per document. Default is 1000.

1000
device device

Device to place the output tensors on.

None

Returns:

Type Description
Tensor

Document-term matrix.

ndarray

True document-topic distribution (L).

ndarray

True topic-term distribution (F).

align_topics(true_F, learned_F)

Align learned topics with true topics for visualization, using cosine similarity and linear sum assignment.

Parameters:

Name Type Description Default
true_F ndarray

Ground truth topic-term matrix.

required
learned_F ndarray

Learned topic-term matrix.

required

Returns:

Type Description
ndarray

Permutation of learned topics aligned with true topics.

sort_documents(L_matrix)

Sort documents grouped by dominant topics for visualization.

Parameters:

Name Type Description Default
L_matrix ndarray

Document-topic distribution matrix.

required

Returns:

Type Description
Sequence[int]

Sequence[int]: Indices of documents sorted by dominant topics.