Skip to content

Fit topic models

tinytopics.fit

fit_model(X, k, num_epochs=200, batch_size=16, base_lr=0.01, max_lr=0.05, T_0=20, T_mult=1, weight_decay=1e-05, device=None)

Fit topic model using sum-to-one constrained neural Poisson NMF. Supports both in-memory tensors and custom datasets.

Parameters:

Name Type Description Default
X Tensor | Dataset

Input data, can be:

  • torch.Tensor: In-memory document-term matrix.
  • Dataset: Custom dataset implementation. For example, see NumpyDiskDataset.
required
k int

Number of topics.

required
num_epochs int

Number of training epochs.

200
batch_size int

Number of documents per batch.

16
base_lr float

Minimum learning rate after annealing.

0.01
max_lr float

Starting maximum learning rate.

0.05
T_0 int

Number of epochs until first restart.

20
T_mult int

Factor increasing restart interval.

1
weight_decay float

Weight decay for AdamW optimizer.

1e-05
device device | None

Device to run training on.

None

Returns:

Type Description
Tuple[NeuralPoissonNMF, Sequence[float]]

Tuple containing:

  • Trained NeuralPoissonNMF model.
  • List of training losses per epoch.

tinytopics.fit_distributed

fit_model_distributed(X, k, num_epochs=200, batch_size=16, base_lr=0.01, max_lr=0.05, T_0=20, T_mult=1, weight_decay=1e-05, save_path='model.pt')

Fit topic model using sum-to-one constrained neural Poisson NMF with distributed training. Supports multi-GPU, multiple node setups via Hugging Face Accelerate.

Parameters:

Name Type Description Default
X Tensor | Dataset

Input data, can be:

  • torch.Tensor: In-memory document-term matrix.
  • Dataset: Custom dataset implementation. For example, see NumpyDiskDataset.
required
k int

Number of topics.

required
num_epochs int

Number of training epochs.

200
batch_size int

Batch size.

16
base_lr float

Minimum learning rate after annealing.

0.01
max_lr float

Starting maximum learning rate.

0.05
T_0 int

Cosine annealing param (epochs until first restart).

20
T_mult int

Cosine annealing param (factor for each restart).

1
weight_decay float

Weight decay for AdamW optimizer.

1e-05
save_path str | None

File path to save the trained model. If None, the model will not be saved to disk.

'model.pt'

Returns:

Type Description
Tuple[NeuralPoissonNMF, Sequence[float]]

Tuple containing:

  • Trained NeuralPoissonNMF model.
  • List of training losses per epoch.