00_Contig screen

Fastp :was used to filter adapter sequences, primers and other low quality sequence from raw sequencing reads.

SPART/00_Contig_screen/fastp.sh $HiFi_reads $ONT_reads

Hifiasm

SPART/00_Contig_screen/hifiasm.sh $HiFi_reads $ONT_reads $output_prefix

Verkko

SPART/00_Contig_screen/verkko.sh $output_prefix $HiFi_reads $ONT_reads $threads $memory

Flye

SPART/00_Contig_screen/flye.sh $ONT_reads $output_prefix $threads

Remove MT & CP

SPART/00_Contig_screen/rm_mt_cp.sh $mitochondrion $chloroplast $ref

01_Contig scaffolding

Bionano

SPART/01_Contig_scaffolding/Bionano_DLS_map.sh threads bnx ref_cmap prefix xml Bio_dir cluster_xml ref bio_camp merge_xml RefAligner

Hi-C

SPART/01_Contig_scaffolding/HiC-Pro.sh ref ref_prefix hicpro_data hicpro_config hicpro_outdir

SPART/01_Contig_scaffolding/yahs.sh enzyme ref bed/bam/bin profix

02_Gap patching

SPART/02_Gap_patching/wfmash_ragtag.sh prefix ref region

Manual operation

cd ragtag_output

perl SPART/02_Gap_patching/paf_filter.pl -i ragtag.patch.debug.filtered.paf -minlen 10000000 -iden 0.5

Manually editing the ragtag.patch.debug.filtered.paf file.Keep the high-quality contig and preserve the location of the only high confidence match in ragtag.patch.debug.filtered.paf that matches the sequence at both ends of the gap.

perl SPART/02_Gap_patching/renameagp.pl -i ragtag.patch.ctg.agp -i1 ragtag.patch.debug.filtered.paf -start seq00000000 -end seq00000001 -o test.agp

Test.agp is merged into ragtag.patch.agp and fasta is generated.

telomere patching

We used _submit_telomere.sh in ONT reads >100kb.ONT reads with telomere sequence mapping to this locus based on minimap2 alignments were manually identified. The longest was selected as template , all others aligned to it and polished with Medaka:

medaka -v -i ONT_tel_reads.fasta -d longest_ont_tel.fasta -o ont_tel_medaka.fasta

Telomere signal in all HiFi reads was identified with the commands:

_submit_telomere.sh hifi_reads.fasta

Additional HiFi reads were recruited from a manual analysis. We looked for trimmed tips that could extend. All reads had telomere signal and were aligned to the medaka consensus and polished with Racon with the commands:

minimap2 -t16 -ax map-pb ont_tel_medaka.fasta hifi_tel.fasta > medaka.sam

racon hifi_tel.fasta medaka.sam ont_tel_medaka.fasta > racon.fasta

Finally, the polished result was patched into the assembly with ragtag patch or manually patched.

Citation

https://github.com/marbl/CHM13-issues/blob/main/error_detection.md.

Centromeric region analysis

SPART/02_Gap_patching/Centromeric_region_analysis.sh workdir FASTA INDEX prefix CHIP1 CHIP2 threads

03_Polishing

SPART/03_Polishing/calsv_snv.sh workdir ref threads

04_Evaluation

BUSCO

SPART/04_Evaluation/BUSCO.sh ref prefix

mapping rates & coverages

SPART/04_Evaluation/mapping_rates_coverages.sh hybrid_bam single_bam ont_bam

LTR

SPART/04_Evaluation/ltr.sh ref prefix

QV

SPART/04_Evaluation/qv.sh query ref

BACs

SPART/04_Evaluation/bac.sh bac_reads ref_chr