The diploid sequence of NA12878 genome, as of Jan 7 2017, has been constructed from hg19 using the following callsets: - Illumina Platinum Genomes 2016-1.0 SNVs and small indels (ref 7) ftp://platgene_ro@ussd-ftp.illumina.com/2016-1.0/hg19/small_variants/NA12878/NA12878.vcf.gz - Set of NA12878 SVs from Sudmant et al. (ref 5) used in the previous version (Feb 5 2015) of NA12878 diploid genome sv.gersteinlab.org/NA12878_diploid/NA12878_diploid_2015_feb5_3versions/NA12878.wgs.mergedSV.v5.20130502.svs.genotypes.redun.auto.SVdefined.sorted.pass.vcf.gz There are THREE more versions of NA12878 diploid genomes, as of Feb 5 2015, constructed from 1000GP Phase 3 SNVs, indels and SVs calls at coverage of ~7.4x 1) SNVs-only 2) SNVs-indels 3) SNVs-indels-SVs. These are built from b37_g1k_phase2.tar.gz reference genome (autosomes only). SNVs and indels: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ SVs: NA12878.wgs.mergedSV.v5.20130502.svs.genotypes.redun.auto.SVdefined.sorted.pass.vcf.gz on the Illumina platform at read length >=70bp. These genomes were used in Sudmant et al. (2015) (ref 5) The diploid sequence of NA12878 genome, as of Sep 4 2013, has been constructed from hg19 using variants called from the protocol of GATK Best Practices v4 (HaplotypeCaller, PCR-free), using hg19: * 3756893 SNVs and 947974 INDELS are from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20131015_p3_high_cov_calls/HaplotypeCaller or ftp://ftp.broadinstitute.org/bundle/2.8/hg19/ The diploid sequence of NA12878 genome, as of Dec 16, 2012, has been constructed from hg19 using the following variants: * 3863600 SNPs and 871759 INDELs in the HiSeq 64x sequencing call set that passed the filters. The call set is from the Broad Institute, following protocol Best Practices v3 (UnifiedGenotyper). --The call set can be found here: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/2.2/hg19/CEUTrio.HiSeq.WGS.b37.bestPractices.phased.hg19.vcf.gz --The personal genome can also be found at AlleleDB with 381 other personal genomes: http://archive.gersteinlab.org/alleledb/personal_genomes/ --Or at the 1000GP FTP site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/working/20160216_pgenome_fastas/ This genome has been used in Chen et al. (2016) (ref 6) The diploid sequence of NA12878 genome, as of May 3, 2011, has been constructed from hg18 using the following variants: 1) 33 fosmid sequenced deletions (ref 1). 2) 1,522 released deletions from pilot phase of 1000 Genome Project (ref 2&3). 3) 328,528 released indels from pilot phase of 1000 Genome Project (ref 2&3). 4) 2,755,607 released SNPS from pilot phase of 1000 Genome Project (ref 2&3). This sequence was constructed from THE SAME set of variants as the version of July 30, 2010. Minor differences between the two versions are due to fix of a bug in vcf2diploid software. The diploid sequence of NA12878 genome, as of April 4, 2011, has been constructed from hg18 using the following variants: 1) 33 fosmid sequenced deletions (ref 1). 2) 1,522 released deletions from pilot phase of 1000 Genome Project (ref 2&3). 3) 328,528 released indels from pilot phase of 1000 Genome Project (ref 2&3). 4) 2,755,607 released SNPs from pilot phase of 1000 Genome Project (ref 2&3). 5) 890,476 novel (compare to item 4 above) SNPs called using HiSeq 60x sequencing (courtesy of Mark DePristo). The diploid sequence of NA12878 genome, as of July 30, 2010, has been constructed from hg18 using the following variants: 1) 33 fosmid sequenced deletions (ref 1). 2) 1,522 released deletions from pilot phase of 1000 Genome Project (ref 2&3). 3) 328,528 released indels from pilot phase of 1000 Genome Project (ref 2&3). 4) 2,755,607 released SNPS from pilot phase of 1000 Genome Project (ref 2&3). This genome has been used in Rozowsky, Abyzov et al (ref 4). References: 1. Kidd et al. (2008) Mapping and sequencing of structural variation from eight human genomes. 2. The 1000 Genomes Consortium. (2010) A map of human genome variation from population scale sequencing. 3. Mills et al. (2010) Mapping structural variation at fine-scale by population genome sequencing. 4. Rozowsky, Abyzov et al. (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. 5. Sudmant et al (2015) An integrated map of structural variation in 2504 human genomes. 6. Chen et al. (2016) A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals. 7. Eberle et al., Genome Res. (2016) doi:10.1101/gr.210500.116