Simplify hyperparameter tuning by adopting modern stochastic gradient methods.
fit_model() now uses a combination of the AdamW optimizer (with weight
decay) and the cosine annealing (with warm restarts) scheduler (#2).
Fix "Structure plot" y-axis range issue by adding a normalize_rows argument
to plot_structure() for normalizing rows so that they all sum exactly to 1,
and explicitly setting the y-axis limit to [0, 1]. (#1).