parSeqSim(). The new argument supports breaking down the pairwise similarity computation into smaller batches. This is useful when you have a large number of protein sequences, enough number of CPU cores, but not enough RAM to compute and hold all the pairwise similarities in a single batch. Also, use the other new argument
verboseto track the computation progress.
parSeqSimDisk(). Compared to the in-memory version
parSeqSim(), this new function caches the partial results in each batch to the hard drive and merges the results together in the end. This could further reduce the memory usage for parallel similarity computations involving a large number of protein sequences.
parGOSim()that will create minor numerical inconsistencies in results due to argument matching.
parSeqSim(), allowing more flexible tuning of the sequence alignment for more types of amino acid sequence data. We thank Dr. Maisa Pinheiro for the feedback.
removeGaps()for removing/replacing gaps (
-) or any irregular characters from protein sequences, to make them suitable for feature extraction or sequence alignment based similarity computation. We thank Dr. Maisa Pinheiro for the feedback.
ifelseconditioning (3f6e106) for the distribution descriptor in CTD. We thank Jielu Yan from the University of Macau for kindly reporting this issue.
Fix URLs that cannot be accessed by
curl -I -L: