Background: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving.

Funding: This work was supported by the European Union, FP7 small medium focused project 277849 EurHEALTHAgeing ( ). Platomics GmbH provided support in the form of salaries for authors [DK, AN, AK], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the "author contributions" section.

Due to the low entry cost and medium throughput, bench-top next-generation DNA sequencers (454 GS Junior from Roche, Illumina MiSeq, or Ion Torrent Personal Genome Machine (PGM)) are especially equipped for targeted sequencing [13]. The PGM [14] applies a sequencing-by-synthesis approach, uses native dNTP chemistry, and relies on a modified silicon chip to detect hydrogen ions released during base incorporation by DNA polymerase. The sequencer generates single-end (SE) reads in varying quality and length. A known caveat of the PGM is its susceptibility to over-call or under-call the number of homopolymer bases, a feature which needs to be specifically addressed by dedicated downstream analysis methods and tools [15].

Bisulfite sequencing data analysis involves several steps including quality assessment, alignment, and methylation calling [16]. Several tools are available, which use different approaches to analyze the data [17,18]. An important part of the bisulfite sequencing workflow is the translation of raw sequence information into bisulfite calls for each investigated base. The widely used tool Bismark [19] contains multiple routines to carry out alignment of bisulfite-treated reads to a reference genome as well as cytosine methylation calling.

According to there are over 300 Ion Torrent PGM machines in use, which could potentially be applied to targeted bisulfite sequencing. However, currently no protocol or analysis solution for bisulfite sequencing on the PGM is officially provided. We have therefore created a novel tool called TABSAT for the analysis and visualization of targeted bisulfite sequencing data generated by Ion Torrent instruments. The tool accepts raw sequencing files as input and outputs result tables containing information about the methylation status of covered CpG sites. Read mapping and methylation calling is handled by Bismark, which has been modified to use the TMAP [20] mapper instead of the default mapper Bowtie2. Results are aggregated in tabular format and automatically visualized as lollipop figures. TABSAT has been designed to run with a minimal set of input parameters but can be customized to support specific questions. In addition, it can be used with data from the Illumina MiSeq platform. The software is freely available at

We have developed a novel tool for the analysis of targeted bisulfite sequencing, which is especially equipped to handle, in addition to Illumina data, sequences generated on Ion Torrent systems. To date, several tools exist for the analysis of bisulfite data (reviewed here [17]), amplicons from bisulfite flowgram sequencing (Amplikyzer [28]) and locus-specific analysis of 5-methylcytosine (BiQ Analyzer HiMod [29]). However, none of these tools is specifically tailored for the analysis of Ion Torrent sequencing data and provides a one-stop solution from raw sequencing data to final results. Furthermore, the Ion Torrent PGM software platform currently does not support the analysis of bisulfite sequencing data. TABSAT comprises an analysis pipeline containing quality control, alignment, methylation calling, and output generation. In order to select the best mapping software for our purpose, we have evaluated several bisulfite analysis programs, such as Bismark [19], BS-Seeker2 [30], and BSMAP [31]. All programs produced similar or poorer mapping results, and offer different downstream analysis capabilities. Preliminary analysis with one input file using default parameters resulted in around 45%, 42%, and 20% of mapped reads for Bismark, BsSeeker2, and BSMap, respectively (see _tools). Based on the availability of the source code, the possibility to integrate a different mapping program, and the positive reviews, we decided to include Bismark in our workflow to handle mapping of sequencing reads and methylation calling.

Due to the initial suboptimal alignment results of the default Bismark version, we decided to incorporate TMAP [20] into the Bismark software, a dedicated mapper for Ion Torrent reads. Reads from Ion Torrent sequencing devices are usually longer than their Illumina counterparts and show a distinct different error profile, especially in homopolymer regions. The boost in read length causes an increased number of sequencing errors per read, which requires changing the mapping settings as the standard parameters for controlling mapping error may be too strict. As the supported aligners in Bismark (bowtie or bowtie2) are configured to be used with Illumina sized reads, we decided to include the Ion Torrent TMAP program, which has been designed to overcome these limitations.

The whole analysis solution has been designed to work with minimal user input and outputs results in clearly arranged tables and lollipop diagrams. The large number of Ion Torrent PGM sequencers available world-wide shows that there is a large community, which would benefit from this tailored analysis of bisulfite sequencing data. In addition, more than 100 Ion Proton sequencers have been registered on, which generate reads with a similar error profile as the PGM. Consequently, they would also benefit from a dedicated analysis workflow for generating high-quality results. Therefore, the presented work will help to unlock the power of Ion Torrent (and potentially Ion Proton) for bisulfite sequencing and DNA methylation analysis.

TABSAT is designed for amplicon studies and does not support the analysis of whole genome bisulfite sequencing projects. Furthermore, it is not limited to, but works best with amplicons smaller than 500bp as these can be efficiently visualized using lollipop diagrams. The aim of TABSAT is to cover the primary analysis of raw targeted bisulfite sequencing data to obtain methylation information for each analyzed cite. Given the provided comprehensive output, researchers can use additional downstream tools, such as the R project for statistical computing, to compare the methylation level between different groups of samples.

TABSAT can be used as standalone software, conveniently available as a Docker container. Docker containers wrap up software in a complete filesystem that contains everything it needs to run, making it an ideal solution to execute bioinformatics software in a self-contained and precisely controlled environment [34]. In addition, TABSAT is available as an embedded application within the Platomics life-science data analysis system. This graphical, web-based user interface has been especially designed to be usable without informatics knowledge. The results are presented in an intuitive way enabling an exploration of the data. As the complete result set can be downloaded as a compressed file, further in-depth downstream analysis can be easily performed.

