A de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads. It calculates per-base error rates, paired-end fragment-size distributions and coverage metrics in the absence of a reference genome. Additionally, SGA estimates characteristics of the sequenced genome, such as repeat content and heterozygosity that are key determinants of assembly difficulty.