Available utilities for processing unzipped autosomal files from Family Tree DNA and/or 23andMe:
These utilities were developed by David Pike.
My original motivation for developing these utilities was so that I could privately perform some advanced analysis of autosomal DNA results, with my
objective being to better pursue genealogical research within my own family. Instead of limiting these utilities to my own personal use, I have made
them available in the hope that they might assist other members of the genetic genealogy community with their own individual research goals.
Given the do-it-yourself nature of the intended user, the utilities are presented with minimal documentation and without verbose explanation of analysis
results. Also note that the analysis performed by these utilities is based on my own methodology and implementation,
whereas Family Tree DNA, 23andMe and AncestryDNA
have their own proprietary analysis methodologies. Differences in methodology may give rise to minor interpretive differences between their analysis and
that performed by my utilities.
Some other notes about these utilities:
- They only accept *UNZIPPED* raw data files from either Family Tree DNA's Illumina-based FamilyFinder (.csv files) or 23andMe (.txt files)
or AncestryDNA (.txt files).
- They should work with Chrome, Safari, and some other browsers.
But they probably will not work with Internet Explorer.
They should work with Firefox 3.6 Beta or later versions of Firefox
(available from here).
Earlier versions of Firefox will not work with these utilities.
Note, however, that some of the utilities are computationally intensive and might run much slower in Firefox
than in a different browser such as Chrome.
- They do their processing on your computer and not mine.
My real point here is that your raw data files are not sent over the internet by these utilities.
I should also mention that the output from the utilities is also not sent over the internet,
and in particular, no copy of it gets saved anywhere (it gets shown on your screen, but that's all).
- No-Calls are reported differently by FTDNA versus 23andMe versus AncestryDNA.
These utilities convert all No-Called SNPs to question marks.
-
It is assumed that the files have been pre-sorted to have the SNPs on each chromosome listed in ascending position value.
The ROH utility performs a validation check to ensure that the input file is properly sorted.
- Utilities that involve the comparison of two or more raw data files do not need the files to all be from one company or to have the same file format.
Nor do the files need to contain the same set of SNPs.
Only those SNPs that occur in all files being processed will be compared. All other SNPs will be ignored.
Synchonisation of SNPs between files is performed based on the position numbers of the SNPs.
It is assumed that the files have been pre-sorted to have the SNPs on each chromosome listed in ascending position value.
- Note that Family Tree DNA's X Chromosome files tend to not be pre-sorted in this manner,
so it is necessary that they be editted accordingly (in which case it is imperative to save the resulting file
in one of the four data formats outlined below).
- Note also that data files based on different reference standards (such as Build 36 versus Build 37) should not be directly compared
without first converting to a common standard.
- If you want to trick these utilities into using other text files, then you'll need to conform to one of the following file formats:
- A tab-separated file with an initial line beginning with # and containing the string "23andMe", and no quotes on any data.
Data items should be in order: RSID, Chromosome Number, SNP Position Number, a single string with both alleles
- A tab-separated file with an initial line beginning with # and containing the string "AncestryDNA", and no quotes on any data.
Data items should be in order: RSID, Chromosome Number, SNP Position Number, first allele, second allele
- A comma-separated file with each data item enclosed in quotes.
Data items should be in order: RSID, Chromosome Number, SNP Position Number, a single string with both alleles
- A space-separated file with no quotes on any data.
Data items should be in order: RSID, Chromosome Number, SNP Position Number, a single string with both alleles
- A comma-separated file with no quotes on any data.
Data items should be in order: RSID, Chromosome Number, SNP Position Number, a single string with both alleles
Any line beginning with the # symbol or containing the string "RSID" will otherwise be ignored.
Note also that if you wish to process files from FTDNA's former Affymetrix-based FamilyFinder then
you should translate all instances of "--" to "DD"
before subjecting your data to analysis;
otherwise the utilities will erroneously treat these deletions as No-Calls.
Instances of "---" in Affymetrix-based data will still be correctly treated as No-Calls.
- If Firefox complains about "A script on this page is causing Mozilla to run slowly" then you might want
to configure Firefox to not complain as quickly when a script is running.
Click here for details about one way to change this Firefox configuration setting.
- Javascript source code is contained within the top frame of each individual utility.
If you want to view how I've implemented things, that's where you'll find the code that I've written.
If you want to express thanks for these utilities, then here are some helpful things you can do:
If you happen to know any males with the surname PIKE or PYKE then encourage them to join the
Pike DNA Project.
The Family History Society of Newfoundland and Labrador is a registered charity that
could use some assistance. Financial donations are tax deductible (at least in Canada).
And a special note to people in academia or industry who make substantive use of these utilities:
Can you let me know how you use them? Thanks.