Fit topic models¶
tinytopics.fit
¶
fit_model(X, k, num_epochs=200, batch_size=16, base_lr=0.01, max_lr=0.05, T_0=20, T_mult=1, weight_decay=1e-05, device=None)
¶
Fit topic model using sum-to-one constrained neural Poisson NMF. Supports both in-memory tensors and custom datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor | Dataset
|
Input data, can be:
|
required |
k
|
int
|
Number of topics. |
required |
num_epochs
|
int
|
Number of training epochs. |
200
|
batch_size
|
int
|
Number of documents per batch. |
16
|
base_lr
|
float
|
Minimum learning rate after annealing. |
0.01
|
max_lr
|
float
|
Starting maximum learning rate. |
0.05
|
T_0
|
int
|
Number of epochs until first restart. |
20
|
T_mult
|
int
|
Factor increasing restart interval. |
1
|
weight_decay
|
float
|
Weight decay for AdamW optimizer. |
1e-05
|
device
|
device | None
|
Device to run training on. |
None
|
Returns:
Type | Description |
---|---|
Tuple[NeuralPoissonNMF, Sequence[float]]
|
Tuple containing:
|
tinytopics.fit_distributed
¶
fit_model_distributed(X, k, num_epochs=200, batch_size=16, base_lr=0.01, max_lr=0.05, T_0=20, T_mult=1, weight_decay=1e-05, save_path='model.pt')
¶
Fit topic model using sum-to-one constrained neural Poisson NMF with distributed training. Supports multi-GPU, multiple node setups via Hugging Face Accelerate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor | Dataset
|
Input data, can be:
|
required |
k
|
int
|
Number of topics. |
required |
num_epochs
|
int
|
Number of training epochs. |
200
|
batch_size
|
int
|
Batch size. |
16
|
base_lr
|
float
|
Minimum learning rate after annealing. |
0.01
|
max_lr
|
float
|
Starting maximum learning rate. |
0.05
|
T_0
|
int
|
Cosine annealing param (epochs until first restart). |
20
|
T_mult
|
int
|
Cosine annealing param (factor for each restart). |
1
|
weight_decay
|
float
|
Weight decay for AdamW optimizer. |
1e-05
|
save_path
|
str | None
|
File path to save the trained model. If None, the model will not be saved to disk. |
'model.pt'
|
Returns:
Type | Description |
---|---|
Tuple[NeuralPoissonNMF, Sequence[float]]
|
Tuple containing:
|