Contents

Maintainer: Ji-Ping Wang, <>

References for methods:

Kendall, B., Jin, C., Li, K., Ruan, F., Wang, X.A., Wang, J.-P., DNAcycP2: improved estimation of intrinsic DNA cyclizability through data augmentation, Nucleic Acids Research, gkaf145, 2025.


0.1 Introduction

DNAcycP2, short for DNA cyclizability Prediction v2, is an R package (Python version is also available) developed for precise and unbiased prediction of DNA intrinsic cyclizability scores. This tool builds on a deep learning framework that integrates Inception and Residual network architectures with an LSTM layer, providing a robust and accurate prediction mechanism.

DNAcycP2 is an updated version of the earlier DNAcycP tool released by Li et al. in 2021. While DNAcycP was trained on loop-seq data from Basu et al.  (2021), DNAcycP2 improves upon it by training on smoothed predictions derived from this dataset. The predicted score, termed C-score, exhibits high accuracy when compared with experimentally measured cyclizability scores obtained from the loop-seq assay. This makes DNAcycP2 a valuable tool for researchers studying DNA mechanics and structure.

0.1.1 Key differences between DNAcycP2 and DNAcycP

Following the release of DNAcycP, it was found that the intrinsic cyclizability scores derived from Basu et al. (2021) retained residual bias from the biotin effect, resulting in inaccuracies (Kendall et al., 2025). To address this, we employed a data augmentation + moving average smoothing method to produce unbiased estimates of intrinsic DNA cyclizability for each sequence in the original training dataset. A new model, trained on this corrected data but using the same architecture as DNAcycP, was developed, resulting in DNAcycP2. This version also introduces improved computational efficiency through parallelization options. Further details are available in Kendall et al. (2025).

To demonstrate the differences, we compared predictions from DNAcycP and DNAcycP2 in a yeast genomic region at base-pair resolution (Figure 1). The predicted biotin-dependent scores (\(\tilde C_{26}\), \(\tilde C_{29}\), and $ C_{31}$, model trained separately) show 10-bp periodic oscillations due to biotin biases, each with distinct phases. DNAcycP’s predictions improved over the biotin-dependent scores, while still show substantial local fluctuations likely caused by residual bias in the training data (the called intrinsic cyclizability score \(\hat C_0\) from Basu et al. 2021). In contrast, DNAcycP2, trained on corrected intrinsic cyclizability scores, produces much smoother local-scale predictions, indicating a further improvement in removing the biotin bias.

The DNAcycP2 package retains all prediction functions from the original DNAcycP. The improved prediction model, based on smoothed data, can be accessed using the argument smooth=TRUE in the main function (see usage below).

0.1.2 Available formats of DNAcycP2 and DNAcycP

DNAcycP2 is available in three formats: A web server available at http://DNAcycP.stats.northwestern.edu for real-time prediction and visualization of C-score up to 20K bp, a standalone Python package avilable for free download from https://github.com/jipingw/DNAcycP2-Python, and a new R package available for free download from bioconductor (https://github.com/jipingw/DNAcycP2). DNAcycP2 R package is a wrapper of its Python version, both generate the same prediction results.

DNAcycP Python package is still available for free download from https://github.com/jipingw/DNAcycP. As DNAcycP2 include all functionalities of DNAcycP, users can generate all DNAcycP results using DNAcycP2.

0.2 Installation

DNAcycP2 is available on Bioconductor with R >= 4.5.0. To install it, run the following command:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("DNAcycP2")

0.3 Usage

0.3.1 Main Functions

The DNAcycP2 R package provides two primary functions for cyclizability prediction:

  1. cycle: Takes an R object (vector of strings) as input. Each element in the vector is a DNA sequence.
  2. cycle_fasta: Takes the path of a fasta file as input.

0.3.2 Selecting the Prediction Model

Both functions use the smooth argument to specify the prediction model:

  • smooth=TRUE: DNAcycP2 (trained on smoothed data, recommended).
  • smooth=FALSE: DNAcycP (trained on original data).

0.3.3 Parallelization with cycle_fasta

The cycle_fasta function is designed for handling larger files and supports parallelization. To enable parallelization, use the following arguments:

  • n_cores: Number of cores to use (default: 1).
  • chunk_length: Sequence length (in bp) each core processes at a time (default: 100,000).

The cycle_fasta function is designed for larger files, so it has added parallelization capability. To utilize this capability, specify the number of cores to be greater than 1 using the n_cores argument (default 1). You can also specify the length of the sequence that each core will predict on at a given time using the chunk_length argument (default 100000).

For reference, on a personal computer (16 Gb RAM, M1 chip with 8-core CPU), prediction at full parallelization directly on the yeast genome FASTA file completes in 12 minutes, and on the hg38 human genome Chromosome I FASTA file in just over 4 hours. In our experience, selection of parallelization parameters (n_cores and chunk_length) has little affect when making predictions on a personal computer, but if using the package on a high- performance compute cluster, prediction time should decrease as the number of cores increases. If you do run into memory issues, we first suggest reducing chunk_length.

library(DNAcycP2)

0.3.4 Example 1: fasta file input

ex1_file <- system.file("extdata", "ex1.fasta", package = "DNAcycP2")
ex1_smooth <- DNAcycP2::cycle_fasta(
    ex1_file, smooth=TRUE, n_cores=1, chunk_length=1000
)
#> + /home/biocbuild/.cache/R/basilisk/1.19.1/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.1/DNAcycP2/0.99.4/env1 'python=3.11' --quiet -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.1/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.1/DNAcycP2/0.99.4/env1 'python=3.11' -c conda-forge --override-channels
#> + /home/biocbuild/.cache/R/basilisk/1.19.1/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.1/DNAcycP2/0.99.4/env1 -c conda-forge 'python=3.11' 'python=3.11' 'pandas=2.1.2' 'tensorflow=2.14.0' 'keras=2.14.0' 'docopt=0.6.2' --override-channels
#> Sequence length for ID 1: 12480
#> Predicting cyclizability...
#> Chunk size: 1000, num threads: 1
#> Sequence length for ID 2: 7260
#> Predicting cyclizability...
#> Chunk size: 1000, num threads: 1
ex1_original <- DNAcycP2::cycle_fasta(
    ex1_file, smooth=FALSE, n_cores=1, chunk_length=1000
)
#> Sequence length for ID 1: 12480
#> Predicting cyclizability...
#> Chunk size: 1000, num threads: 1
#> Sequence length for ID 2: 7260
#> Predicting cyclizability...
#> Chunk size: 1000, num threads: 1

cycle_fasta takes the file path as input (ex1_file). smooth=TRUE specifies that DNAcycP2 be used to make predictions. smooth=FALSE specifies that DNAcycP be used to make predictions. n_cores=2 specifies that 2 cores are to be used in parallel. chunk_length=1000 specifies that each core will predict on sequences of length 1000 at a given time.

The output (ex1_smooth or ex1_original) is a list with element names starting with “cycle” followed by the sequence names in the fasta file. For example, ex1.fasta contains two sequences with IDs “1” and “2” respectively. Therefore both both ex1_smooth and ex1_original will be lists of length 2 with names cycle_1 and cycle_2 for the first and second sequences respectively.

Each item in the list (e.g. ex1_smooth$cycle_1) is a data.frame object with three columns. The first column is always position. When smooth=TRUE, the second and third columns are C0S_norm and C0S_unnorm, and when smooth=FALSE the second and third columns are C0_norm and C0_unnorm.

0.3.5 Example 2: input as a list/vector of sequences

ex2_file <- system.file("extdata", "ex2.txt", package = "DNAcycP2")
ex2 <- read.csv(ex2_file, header = FALSE)
ex2_smooth <- DNAcycP2::cycle(ex2$V1, smooth=TRUE)
#> Reading sequences...
#> Not all sequences are length 50, predicting every subsequence...
#> Completed 10 out of 100 total sequences
#> Completed 20 out of 100 total sequences
#> Completed 30 out of 100 total sequences
#> Completed 40 out of 100 total sequences
#> Completed 50 out of 100 total sequences
#> Completed 60 out of 100 total sequences
#> Completed 70 out of 100 total sequences
#> Completed 80 out of 100 total sequences
#> Completed 90 out of 100 total sequences
#> Completed 100 out of 100 total sequences
ex2_original <- DNAcycP2::cycle(ex2$V1, smooth=FALSE)
#> Reading sequences...
#> Not all sequences are length 50, predicting every subsequence...
#> Completed 10 out of 100 total sequences
#> Completed 20 out of 100 total sequences
#> Completed 30 out of 100 total sequences
#> Completed 40 out of 100 total sequences
#> Completed 50 out of 100 total sequences
#> Completed 60 out of 100 total sequences
#> Completed 70 out of 100 total sequences
#> Completed 80 out of 100 total sequences
#> Completed 90 out of 100 total sequences
#> Completed 100 out of 100 total sequences

cycle takes the sequences themselves as input, so we first read the file (ex2_file) and then provide the sequences as input (ex2$V1)

The output (ex2_smooth or ex2_original) is a list with indices corresponding to each sequence from the sequences argument (here it is ex2$V1). For example, ex2.txt contains 100 sequences. Therefore, both ex2_smooth and ex2_original will be lists of length 100,
where each entry in the list corresponds to the sequence with its same index.

Each item in the list (e.g. ex2_smooth[[1]]) is a data.frame object with three columns. The first columns is always position. When smooth=TRUE, the second and third columns are C0S_norm and C0S_unnorm, and when smooth=FALSE the second and third columns are C0_norm and C0_unnorm.

0.3.6 DNAcycP2 prediction – Normalized vs unnormalized

Both cycle_fasta and cycle output the prediction results in normalized (C0_norm,C0S_norm) and unnomralized (C0_unnorm,C0S_unnorm) version.

In DNAcycP2, the predicted cyclizability always contains normalized and unnormalized values. the unnormalized results were based on the model trained on unnormalized \(\hat C_0\) or \(\hat C_0^s\) scores. In contrast, the normalized results were predicted by the model trained on the normalized \(\hat C_0\) or \(\hat C_0^s\) values. The cyclizability score from different loop-seq libraries may be subject to a systematic library-specific constant difference due to its definition (see
Basu et al 2021), and hence it’s a relative measure and not directly comparable between libraries. The normalization will force the training data to have mean = 0 and standard deviation = 1 such that the 50 bp sequences from yeast genome roughly have mean = 0 and standard deviation = 1 for intrinsic cyclizabilty score. Thus for any sequence under prediciton, the normalized C-score can be more informative in terms of its cyclizabilty relative to the population. For example, the C-score provides statisitcal significance indicator, i.e. a C-score of 1.96 indicates 97.5% in the distribution.

0.3.7 Save DNAcycP2 prediciton to external file

Both cycle_fasta and cycle provides an argument save_path_prefix to save the prediction results onto local hard drive. For example:

ex2_smooth <- DNAcycP2::cycle(
    ex2$V1, 
    smooth=TRUE, 
    save_path_prefix="ex2_smooth"
)

This will execute the same predictions as previously, and additionally save two files named ‘ex2_smooth_C0S_norm.txt’ and ‘ex2_smooth_C0S_unnorm.txt’ to the current working directory. The output files from cycle_fasta have the same format as the function output, but for consistency with the Python pacakge it is important to note that the output files from cycle have a different format than the function output. Namely, rather than writing a single file for every sequence, the function always writes two files (regardless of the number of sequences), one containing normalized predictions for every sequence (ending in ‘C0S_norm.txt’ or ‘C0_norm.txt’) and the other containing unnormalized predictions for every sequence (ending in ‘C0S_unnorm.txt’ or ‘C0_unnorm.txt’). C-scores in each line correspond to the sequence from the sequences input in the same order.

For any input sequence, DNAcycP2 predicts the C-score for every 50 bp. Regardless of the input sequence format the first C-score in the output file corresponds to the sequence from position 1-50, second for 2-51 and so forth.

0.3.8 Example 3 (Single Sequence):

If you want the predict C-scores for a single sequence, you can follow the same protocol as Example 1 or 2, depending on the input format. We have included two example files representing the same 1000bp stretch of S. Cerevisiae sacCer3 Chromosome I (1:1000) in .fasta and .txt format.

First, we will consider the .fasta format:

ex3_fasta_file <- system.file(
    "extdata", "ex3_single_seq.fasta", package = "DNAcycP2"
)
ex3_fasta_smooth <- DNAcycP2::cycle_fasta(ex3_fasta_file,smooth=TRUE)
#> Sequence length for ID 1: 1000
#> Predicting cyclizability...
#> Chunk size: 100000, num threads: 1
ex3_fasta_original <- DNAcycP2::cycle_fasta(ex3_fasta_file,smooth=FALSE)
#> Sequence length for ID 1: 1000
#> Predicting cyclizability...
#> Chunk size: 100000, num threads: 1

The output (ex3_fasta_smooth or ex3_fasta_original) is a list with 1 entry named “cycle_1”.

Let’s say we are interested only in the smooth (DNAcycP2), normalized predictions for the subsequence defined by the first 100bp (corresponding to subsequences defined by regions [1,50], [2,51], …, and [51-100], or positions 25, 26, …, and 75). We can access the outputs for this subsequence using the following command:

ex3_fasta_smooth[[1]][1:51,c("position", "C0S_norm")]
#>    position    C0S_norm
#> 1        25  0.94794828
#> 2        26  0.88397902
#> 3        27  1.05313075
#> 4        28  1.31837559
#> 5        29  1.43364513
#> 6        30  1.47490740
#> 7        31  1.68857002
#> 8        32  1.79380059
#> 9        33  1.78517485
#> 10       34  1.51873457
#> 11       35  1.83588874
#> 12       36  1.81811345
#> 13       37  1.89403594
#> 14       38  1.94740200
#> 15       39  1.67138946
#> 16       40  1.55992007
#> 17       41  1.66846406
#> 18       42  1.62409890
#> 19       43  1.54238129
#> 20       44  1.46230042
#> 21       45  1.41134250
#> 22       46  1.27757108
#> 23       47  1.02153277
#> 24       48  1.01221037
#> 25       49  0.93556404
#> 26       50  1.00608754
#> 27       51  1.09152973
#> 28       52  1.32879055
#> 29       53  1.25343049
#> 30       54  1.34268761
#> 31       55  1.13988316
#> 32       56  1.00737357
#> 33       57  1.13837850
#> 34       58  1.11469960
#> 35       59  0.96556181
#> 36       60  0.85989267
#> 37       61  0.91100568
#> 38       62  0.87353712
#> 39       63  0.55437237
#> 40       64  0.60366714
#> 41       65  0.41495007
#> 42       66  0.22158629
#> 43       67  0.14434627
#> 44       68  0.08841624
#> 45       69 -0.09656694
#> 46       70 -0.19039956
#> 47       71 -0.34425002
#> 48       72 -0.41227332
#> 49       73 -0.38969830
#> 50       74 -0.33039862
#> 51       75 -0.44366446

Or, equivalently,

ex3_fasta_smooth$cycle_1[1:51,c("position", "C0S_norm")]
#>    position    C0S_norm
#> 1        25  0.94794828
#> 2        26  0.88397902
#> 3        27  1.05313075
#> 4        28  1.31837559
#> 5        29  1.43364513
#> 6        30  1.47490740
#> 7        31  1.68857002
#> 8        32  1.79380059
#> 9        33  1.78517485
#> 10       34  1.51873457
#> 11       35  1.83588874
#> 12       36  1.81811345
#> 13       37  1.89403594
#> 14       38  1.94740200
#> 15       39  1.67138946
#> 16       40  1.55992007
#> 17       41  1.66846406
#> 18       42  1.62409890
#> 19       43  1.54238129
#> 20       44  1.46230042
#> 21       45  1.41134250
#> 22       46  1.27757108
#> 23       47  1.02153277
#> 24       48  1.01221037
#> 25       49  0.93556404
#> 26       50  1.00608754
#> 27       51  1.09152973
#> 28       52  1.32879055
#> 29       53  1.25343049
#> 30       54  1.34268761
#> 31       55  1.13988316
#> 32       56  1.00737357
#> 33       57  1.13837850
#> 34       58  1.11469960
#> 35       59  0.96556181
#> 36       60  0.85989267
#> 37       61  0.91100568
#> 38       62  0.87353712
#> 39       63  0.55437237
#> 40       64  0.60366714
#> 41       65  0.41495007
#> 42       66  0.22158629
#> 43       67  0.14434627
#> 44       68  0.08841624
#> 45       69 -0.09656694
#> 46       70 -0.19039956
#> 47       71 -0.34425002
#> 48       72 -0.41227332
#> 49       73 -0.38969830
#> 50       74 -0.33039862
#> 51       75 -0.44366446

Next, we will consider the .txt format:

ex3_txt_file <- system.file(
    "extdata", 
    "ex3_single_seq.txt", 
    package = "DNAcycP2"
)
ex3_txt <- read.csv(ex3_txt_file, header = FALSE)
ex3_txt_smooth <- DNAcycP2::cycle(ex3_txt$V1, smooth=TRUE)
#> Reading sequences...
#> Not all sequences are length 50, predicting every subsequence...
ex3_txt_original <- DNAcycP2::cycle(ex3_txt$V1, smooth=FALSE)
#> Reading sequences...
#> Not all sequences are length 50, predicting every subsequence...

The output (ex3_txt_smooth or ex3_txt_original) is a list with 1 entry (unnamed).

Note, that ex3_fasta_smooth and ex3_txt_smooth are essentially equivalent. The only exceptions are perhaps slight rounding differences that come from the computation, and that the list ex3_fasta_smooth has named entries (‘cycle_1’) while ex3_txt_smooth does not. The same applies for ex3_fasta_original and ex3_txt_original.

Therefore, we can use a similar command to access the outputs for our subsequence of interest:

ex3_txt_smooth[[1]][1:51,c("position", "C0S_norm")]

If there is a sequence (or group of sequences) we want to make predictions on, we can also input them directly as strings. For example:

input_seq1 = 
    "CATGACTGCAGCTAAAACGTTGACCTAGTCGTCAGTCTACGTACTAGCGTAGCTATATCGAGTCTAGCGTCTAG"
input_seq2 = "ATCTTTTGTATATCAAAAGACTAGATCGATTAGCGTACGCCCCTGACTAGATAGATCG"
seq1_smooth = DNAcycP2::cycle(c(input_seq1), smooth=TRUE)
both_seqs_smooth = DNAcycP2::cycle(c(input_seq1, input_seq2), smooth=TRUE)

0.3.9 Example 4: DNAStringSet object input

library(Biostrings)
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
ex4_string_set <- readDNAStringSet(system.file("extdata", "ex1.fasta", package="DNAcycP2"))
ex4_smooth_output <- DNAcycP2::cycle(ex4_string_set, smooth=TRUE)
#> Reading sequences...
#> Not all sequences are length 50, predicting every subsequence...
#> 
  1/389 [..............................] - ETA: 3:17
  8/389 [..............................] - ETA: 2s  
 15/389 [>.............................] - ETA: 2s
 22/389 [>.............................] - ETA: 2s
 29/389 [=>............................] - ETA: 2s
 36/389 [=>............................] - ETA: 2s
 43/389 [==>...........................] - ETA: 2s
 50/389 [==>...........................] - ETA: 2s
 57/389 [===>..........................] - ETA: 2s
 64/389 [===>..........................] - ETA: 2s
 71/389 [====>.........................] - ETA: 2s
 78/389 [=====>........................] - ETA: 2s
 86/389 [=====>........................] - ETA: 2s
 93/389 [======>.......................] - ETA: 2s
100/389 [======>.......................] - ETA: 2s
107/389 [=======>......................] - ETA: 2s
114/389 [=======>......................] - ETA: 2s
121/389 [========>.....................] - ETA: 2s
128/389 [========>.....................] - ETA: 1s
135/389 [=========>....................] - ETA: 1s
142/389 [=========>....................] - ETA: 1s
149/389 [==========>...................] - ETA: 1s
156/389 [===========>..................] - ETA: 1s
163/389 [===========>..................] - ETA: 1s
170/389 [============>.................] - ETA: 1s
177/389 [============>.................] - ETA: 1s
184/389 [=============>................] - ETA: 1s
191/389 [=============>................] - ETA: 1s
198/389 [==============>...............] - ETA: 1s
205/389 [==============>...............] - ETA: 1s
212/389 [===============>..............] - ETA: 1s
219/389 [===============>..............] - ETA: 1s
226/389 [================>.............] - ETA: 1s
233/389 [================>.............] - ETA: 1s
240/389 [=================>............] - ETA: 1s
247/389 [==================>...........] - ETA: 1s
254/389 [==================>...........] - ETA: 1s
261/389 [===================>..........] - ETA: 0s
268/389 [===================>..........] - ETA: 0s
275/389 [====================>.........] - ETA: 0s
282/389 [====================>.........] - ETA: 0s
289/389 [=====================>........] - ETA: 0s
296/389 [=====================>........] - ETA: 0s
305/389 [======================>.......] - ETA: 0s
312/389 [=======================>......] - ETA: 0s
320/389 [=======================>......] - ETA: 0s
328/389 [========================>.....] - ETA: 0s
335/389 [========================>.....] - ETA: 0s
342/389 [=========================>....] - ETA: 0s
349/389 [=========================>....] - ETA: 0s
356/389 [==========================>...] - ETA: 0s
363/389 [==========================>...] - ETA: 0s
370/389 [===========================>..] - ETA: 0s
377/389 [============================>.] - ETA: 0s
384/389 [============================>.] - ETA: 0s
389/389 [==============================] - 3s 8ms/step
#> 
  1/389 [..............................] - ETA: 13s
  9/389 [..............................] - ETA: 2s 
 15/389 [>.............................] - ETA: 2s
 21/389 [>.............................] - ETA: 2s
 27/389 [=>............................] - ETA: 3s
 34/389 [=>............................] - ETA: 2s
 39/389 [==>...........................] - ETA: 2s
 45/389 [==>...........................] - ETA: 2s
 50/389 [==>...........................] - ETA: 2s
 55/389 [===>..........................] - ETA: 2s
 61/389 [===>..........................] - ETA: 2s
 67/389 [====>.........................] - ETA: 2s
 73/389 [====>.........................] - ETA: 2s
 79/389 [=====>........................] - ETA: 2s
 86/389 [=====>........................] - ETA: 2s
 93/389 [======>.......................] - ETA: 2s
100/389 [======>.......................] - ETA: 2s
107/389 [=======>......................] - ETA: 2s
114/389 [=======>......................] - ETA: 2s
121/389 [========>.....................] - ETA: 2s
128/389 [========>.....................] - ETA: 2s
135/389 [=========>....................] - ETA: 2s
142/389 [=========>....................] - ETA: 2s
148/389 [==========>...................] - ETA: 2s
153/389 [==========>...................] - ETA: 2s
159/389 [===========>..................] - ETA: 1s
165/389 [===========>..................] - ETA: 1s
171/389 [============>.................] - ETA: 1s
178/389 [============>.................] - ETA: 1s
185/389 [=============>................] - ETA: 1s
192/389 [=============>................] - ETA: 1s
199/389 [==============>...............] - ETA: 1s
206/389 [==============>...............] - ETA: 1s
213/389 [===============>..............] - ETA: 1s
220/389 [===============>..............] - ETA: 1s
227/389 [================>.............] - ETA: 1s
235/389 [=================>............] - ETA: 1s
240/389 [=================>............] - ETA: 1s
245/389 [=================>............] - ETA: 1s
252/389 [==================>...........] - ETA: 1s
259/389 [==================>...........] - ETA: 1s
265/389 [===================>..........] - ETA: 1s
272/389 [===================>..........] - ETA: 0s
277/389 [====================>.........] - ETA: 0s
283/389 [====================>.........] - ETA: 0s
289/389 [=====================>........] - ETA: 0s
295/389 [=====================>........] - ETA: 0s
302/389 [======================>.......] - ETA: 0s
309/389 [======================>.......] - ETA: 0s
315/389 [=======================>......] - ETA: 0s
320/389 [=======================>......] - ETA: 0s
327/389 [========================>.....] - ETA: 0s
333/389 [========================>.....] - ETA: 0s
339/389 [=========================>....] - ETA: 0s
346/389 [=========================>....] - ETA: 0s
353/389 [==========================>...] - ETA: 0s
360/389 [==========================>...] - ETA: 0s
367/389 [===========================>..] - ETA: 0s
374/389 [===========================>..] - ETA: 0s
381/389 [============================>.] - ETA: 0s
388/389 [============================>.] - ETA: 0s
389/389 [==============================] - 3s 8ms/step
#> 
  1/226 [..............................] - ETA: 5s
 11/226 [>.............................] - ETA: 1s
 20/226 [=>............................] - ETA: 1s
 28/226 [==>...........................] - ETA: 1s
 35/226 [===>..........................] - ETA: 1s
 42/226 [====>.........................] - ETA: 1s
 49/226 [=====>........................] - ETA: 1s
 56/226 [======>.......................] - ETA: 1s
 63/226 [=======>......................] - ETA: 1s
 70/226 [========>.....................] - ETA: 1s
 77/226 [=========>....................] - ETA: 1s
 84/226 [==========>...................] - ETA: 1s
 91/226 [===========>..................] - ETA: 0s
 98/226 [============>.................] - ETA: 0s
105/226 [============>.................] - ETA: 0s
112/226 [=============>................] - ETA: 0s
119/226 [==============>...............] - ETA: 0s
126/226 [===============>..............] - ETA: 0s
133/226 [================>.............] - ETA: 0s
140/226 [=================>............] - ETA: 0s
147/226 [==================>...........] - ETA: 0s
154/226 [===================>..........] - ETA: 0s
161/226 [====================>.........] - ETA: 0s
168/226 [=====================>........] - ETA: 0s
175/226 [======================>.......] - ETA: 0s
182/226 [=======================>......] - ETA: 0s
189/226 [========================>.....] - ETA: 0s
196/226 [=========================>....] - ETA: 0s
203/226 [=========================>....] - ETA: 0s
210/226 [==========================>...] - ETA: 0s
217/226 [===========================>..] - ETA: 0s
224/226 [============================>.] - ETA: 0s
226/226 [==============================] - 2s 7ms/step
#> 
  1/226 [..............................] - ETA: 6s
 10/226 [>.............................] - ETA: 1s
 18/226 [=>............................] - ETA: 1s
 25/226 [==>...........................] - ETA: 1s
 32/226 [===>..........................] - ETA: 1s
 39/226 [====>.........................] - ETA: 1s
 46/226 [=====>........................] - ETA: 1s
 53/226 [======>.......................] - ETA: 1s
 60/226 [======>.......................] - ETA: 1s
 67/226 [=======>......................] - ETA: 1s
 74/226 [========>.....................] - ETA: 1s
 81/226 [=========>....................] - ETA: 1s
 88/226 [==========>...................] - ETA: 1s
 95/226 [===========>..................] - ETA: 0s
102/226 [============>.................] - ETA: 0s
109/226 [=============>................] - ETA: 0s
116/226 [==============>...............] - ETA: 0s
123/226 [===============>..............] - ETA: 0s
130/226 [================>.............] - ETA: 0s
137/226 [=================>............] - ETA: 0s
144/226 [==================>...........] - ETA: 0s
151/226 [===================>..........] - ETA: 0s
158/226 [===================>..........] - ETA: 0s
165/226 [====================>.........] - ETA: 0s
172/226 [=====================>........] - ETA: 0s
179/226 [======================>.......] - ETA: 0s
186/226 [=======================>......] - ETA: 0s
193/226 [========================>.....] - ETA: 0s
200/226 [=========================>....] - ETA: 0s
206/226 [==========================>...] - ETA: 0s
211/226 [===========================>..] - ETA: 0s
217/226 [===========================>..] - ETA: 0s
223/226 [============================>.] - ETA: 0s
226/226 [==============================] - 2s 8ms/step

ex4_string_set here is a DNAStringSet object using readDNAStringSet function from Biostrings package.

0.4 References

1 Session info

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] Biostrings_2.75.4   GenomeInfoDb_1.43.4 XVector_0.47.2     
#> [4] IRanges_2.41.3      S4Vectors_0.45.4    BiocGenerics_0.53.6
#> [7] generics_0.1.3      DNAcycP2_0.99.4     BiocStyle_2.35.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.7-3            jsonlite_1.9.1          crayon_1.5.3           
#>  [4] compiler_4.6.0          BiocManager_1.30.25     filelock_1.0.3         
#>  [7] Rcpp_1.0.14             parallel_4.6.0          jquerylib_0.1.4        
#> [10] png_0.1-8               yaml_2.3.10             fastmap_1.2.0          
#> [13] reticulate_1.41.0.1     lattice_0.22-6          R6_2.6.1               
#> [16] knitr_1.50              bookdown_0.42           GenomeInfoDbData_1.2.14
#> [19] bslib_0.9.0             rlang_1.1.5             cachem_1.1.0           
#> [22] dir.expiry_1.15.0       xfun_0.51               sass_0.4.9             
#> [25] cli_3.6.4               withr_3.0.2             digest_0.6.37          
#> [28] grid_4.6.0              basilisk_1.19.1         lifecycle_1.0.4        
#> [31] evaluate_1.0.3          rmarkdown_2.29          httr_1.4.7             
#> [34] basilisk.utils_1.19.1   tools_4.6.0             htmltools_0.5.8.1      
#> [37] UCSC.utils_1.3.1