Friday, September 7, 2018

Statistics for Surnames of Ancestors


Recently a member of the forum asked about extracting statistics for surnames of ancestors of matches. This is fairly simple to do. The following is a multi-step process because doing it in one step crashed SQLite Studio on my computer, which is Windows 10, 64-bit, with 12 GB memory and 1 TB disk.

The surname metaphone is included because spelling variations are important to one of my puzzles and might be important to you. Also, the number of ancestors with a surname and the number of matches with the surname can be two very different numbers. The idea is to bring the statistics into Excel or another spreadsheet program to work with the data.

First let's look at the output in Excel. As an example, I have a woman named Mary Williams in my ancestry. Williams is a fairly common name, unfortunately for me. 1164 of my matches have an ancestor named Williams and there are 2793 of those ancestors.





Another of my puzzles is a man named Vossler. This is a name that is spelled in various ways, so the metaphone is important. Filtering the metaphone for FSLR results in more information than trying to guess all the spelling variations.




Working in a copy of your database, here are the steps and queries.

Create a work table and insert the surname data into it. Put in your exact test name where the query says Your Test Name. Do not delete the work table until you have exported the data.




Here is the query:




CREATE TABLE ancestors_count (
    Surname           TEXT,
    Metaphone         TEXT,
    AncestorCount     INT,
    MatchCount        INT
);


with cte (matchid, surname, metaphone, ancestor_count)
as
(
SELECT matchid, upper(surname), metaphone, count(*)
FROM ancestors join tests on testid = id
where name= "Your Test Name"
group by testid, matchid, surname, metaphone
)
insert into ancestors_count
select surname, metaphone, sum(ancestor_count), count(*)
from cte
group by surname, metaphone;




After the query has run, double-click on the table named ancestors_count on the left side of the screen to open the table. Click on the data tab above the query. When the data appears, click on the exploding arrows, which is the export icon.




Choose to export ONLY the table data, click Next.




Select the export format, which works nicely as CSV. Use the disk icon to browse for where you want the file and to enter the file name. Choose to export the column names in the first row and choose the comma as the column separator. Click Finish. It takes only a couple of seconds.




After verifying the output file is what you want, you can delete the work table and disconnect from the work database. Or you can just disconnect from and delete the work database.

If you want to keep the database for other purposes, the command to delete the work table is:

drop TABLE ancestors_count;

If you have multiple tests that you want to run through this process, drop the work table and start over for each test.

Saturday, May 19, 2018

Shared Matches


You want to analyze your shared matches because that's going to help with your brick walls.

Unfortunately, the AncestryDNAHelper does not capture shared matches into the database, as of this writing in May, 2018. I've received a number of inquiries about how to analyze the data. Without the data, there is nothing I can provide for analysis.

If you want to capture and review shared matches, you will want to check out DNAGedcom. I have not yet worked with that database, so cannot at this time answer any questions about it. There is a group at Facebook where questions can be asked.

Tuesday, December 26, 2017

A Smorgasboard of Queries for the AncestryDNAHelper Database


A few months ago Roger, one of the mailing list members, shared a number of his SQLite queries with me to share with you. They range from simple to more complex. My favorite is the first, with the rest following. You'll have to read the comments to see what each query does. These queries will not change your database, so feel free to try them out.


Common Match Count Report


This handy query creates a list that shows the number of common matches that each of my 11 tests has with each of the other 10.


select a.name as "Test Name 1",
c.name as "Test Name 2",
count(*) as "Common Matches"
from tests a
inner join matches b on a.id = b.testid
left join tests c
inner join matches d on c.id = d.testid
where a.name <> c.name
and b.matchid = d.matchid
group by a.name, c.name
order by upper(a.name), upper(c.name);


Here's the sample output with changed names to protect privacy.





Record Counts


-- How many tests are in my database?
Select count(*) from tests;

-- How many total matches are in my database?
Select count(*) from matches;

-- How many total ancestors are in my database?
Select count(*) from ancestors;


Tests


-- List my tests by name
select * from tests order by upper(name);

-- List my test names with their unique ID
select id, name from tests order by upper(name), id;

-- List my tests by date first scanned
select name, scanned, imported, lastpage, scanphase, id, treeurl, imageurl
  from tests order by scanned;


Matches


-- How many matches does each of my tests have?
select a.id, a.name, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.id, a.name, b.testid
order by upper(a.name), b.testid;

-- For each test, list first and last scan dates, count of matches, count of scanned matches
select a.name as Name,
min(b.scanned) as "First Scan Date",
max(b.scanned) as "Last Scan Date",
sum(case when b.scanned is null then 0 else 1 end) as "Scanned Matches",
count(*) as "Total Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.testid
order by upper(a.name), b.testid;

-- Count matches by range for each test
select a.name, b.range, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.range
order by upper(a.name),
case when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end;

-- Count matches by range and confidence for each test
select a.name, b.range, confidence, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.range, confidence
order by upper(a.name),
case when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end,
case
when confidence = 'Extremely High' then 1
when confidence = 'Very High' then 2
when confidence = 'High' then 3
when confidence = 'Good' then 4
when confidence = 'Moderate' then 5
end;

-- List the relationship between all my tests with range, confidence, 
-- segments, centimorgans
-- Choose one sort order
select a.name as "Test Name 1", c.name as "Test Name 2",
  b.range, b.confidence, b.centimorgans, b.segments
from tests a
inner join matches b on a.id = b.testid
inner join tests c on b.matchid = c.id
where a.id <> c.id
-- sort by name
    order by upper(a.name), upper(c.name);
-- sort by descending centimorgans within name
    order by upper(a.name), b.centimorgans desc, b.segments desc;
-- sort by descending centimorgans across all names
    order by b.centimorgans desc, b.segments desc, upper(a.name), upper(c.name);

-- List of names and administrators for close matches of one test
-- Exclude my tests
-- Set range to 4 for less than 4th cousins, to 5 for less than 5th cousins
-- Set name of test to be evaluated
select a.name, b.name as 'Match Name', b.admin,
b.range,b.confidence, b.centimorgans
from tests a
inner join matches b on a.id = b.testid
where (b.range like 'Close%' 
or b.range < '4
or b.range like 'Parent%' 
or b.range like 'Immediate%')
and a.name = 'Elizabeth L Richards'
and matchid not in (select id from tests)
order by upper(a.name),
case
when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end,
b.name, b.admin;


Ancestors


-- How many ancestors of matches does each of my tests have?
select a.name, count(*) as "Ancestors of Matches"
from tests a
inner join ancestors b on a.id = b.testid
group by a.name, b.testid
order by upper(a.name), b.testid;


Other Ancestor Queries


There are also a few ancestor queries on one of the reference pages of this blog. See the page on running queries.