Saturday, May 19, 2018

Shared Matches


You want to analyze your shared matches because that's going to help with your brick walls.

Unfortunately, the AncestryDNAHelper does not capture shared matches into the database, as of this writing in May, 2018. I've received a number of inquiries about how to analyze the data. Without the data, there is nothing I can provide for analysis.

If you want to capture and review shared matches, you will want to check out DNAGedcom. I have not yet worked with that database, so cannot at this time answer any questions about it. There is a group at Facebook where questions can be asked.

Tuesday, December 26, 2017

A Smorgasboard of Queries for the AncestryDNAHelper Database


A few months ago Roger, one of the mailing list members, shared a number of his SQLite queries with me to share with you. They range from simple to more complex. My favorite is the first, with the rest following. You'll have to read the comments to see what each query does. These queries will not change your database, so feel free to try them out.


Common Match Count Report


This handy query creates a list that shows the number of common matches that each of my 11 tests has with each of the other 10.


select a.name as "Test Name 1",
c.name as "Test Name 2",
count(*) as "Common Matches"
from tests a
inner join matches b on a.id = b.testid
left join tests c
inner join matches d on c.id = d.testid
where a.name <> c.name
and b.matchid = d.matchid
group by a.name, c.name
order by upper(a.name), upper(c.name);


Here's the sample output with changed names to protect privacy.





Record Counts


-- How many tests are in my database?
Select count(*) from tests;

-- How many total matches are in my database?
Select count(*) from matches;

-- How many total ancestors are in my database?
Select count(*) from ancestors;


Tests


-- List my tests by name
select * from tests order by upper(name);

-- List my test names with their unique ID
select id, name from tests order by upper(name), id;

-- List my tests by date first scanned
select name, scanned, imported, lastpage, scanphase, id, treeurl, imageurl
  from tests order by scanned;


Matches


-- How many matches does each of my tests have?
select a.id, a.name, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.id, a.name, b.testid
order by upper(a.name), b.testid;

-- For each test, list first and last scan dates, count of matches, count of scanned matches
select a.name as Name,
min(b.scanned) as "First Scan Date",
max(b.scanned) as "Last Scan Date",
sum(case when b.scanned is null then 0 else 1 end) as "Scanned Matches",
count(*) as "Total Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.testid
order by upper(a.name), b.testid;

-- Count matches by range for each test
select a.name, b.range, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.range
order by upper(a.name),
case when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end;

-- Count matches by range and confidence for each test
select a.name, b.range, confidence, count(*) as "Matches"
from tests a
inner join matches b on a.id = b.testid
group by a.name, b.range, confidence
order by upper(a.name),
case when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end,
case
when confidence = 'Extremely High' then 1
when confidence = 'Very High' then 2
when confidence = 'High' then 3
when confidence = 'Good' then 4
when confidence = 'Moderate' then 5
end;

-- List the relationship between all my tests with range, confidence, 
-- segments, centimorgans
-- Choose one sort order
select a.name as "Test Name 1", c.name as "Test Name 2",
  b.range, b.confidence, b.centimorgans, b.segments
from tests a
inner join matches b on a.id = b.testid
inner join tests c on b.matchid = c.id
where a.id <> c.id
-- sort by name
    order by upper(a.name), upper(c.name);
-- sort by descending centimorgans within name
    order by upper(a.name), b.centimorgans desc, b.segments desc;
-- sort by descending centimorgans across all names
    order by b.centimorgans desc, b.segments desc, upper(a.name), upper(c.name);

-- List of names and administrators for close matches of one test
-- Exclude my tests
-- Set range to 4 for less than 4th cousins, to 5 for less than 5th cousins
-- Set name of test to be evaluated
select a.name, b.name as 'Match Name', b.admin,
b.range,b.confidence, b.centimorgans
from tests a
inner join matches b on a.id = b.testid
where (b.range like 'Close%' 
or b.range < '4
or b.range like 'Parent%' 
or b.range like 'Immediate%')
and a.name = 'Elizabeth L Richards'
and matchid not in (select id from tests)
order by upper(a.name),
case
when substr(b.range,1,6) = 'Parent' then 1
when substr(b.range,1,9) = 'Immediate' then 2
when substr(b.range,1,5) = 'Close' then 3
else substr(b.range,1,1) * 10
end,
b.name, b.admin;


Ancestors


-- How many ancestors of matches does each of my tests have?
select a.name, count(*) as "Ancestors of Matches"
from tests a
inner join ancestors b on a.id = b.testid
group by a.name, b.testid
order by upper(a.name), b.testid;


Other Ancestor Queries


There are also a few ancestor queries on one of the reference pages of this blog. See the page on running queries.

Sunday, January 8, 2017

Shared Matches for Two Tests


The AncestryDNA Helper gives you the ability to find the matches in common (compare matches) between any two tests in your database. It's more powerful than the shared matches feature provided by Ancestry, so is more useful. But it returns all the matches, regardless of value. Many of the matches have no tree, so they provide no value to me.

Today I'll share four different ways to use SQLite Studio to create a compare list similar to the Helper, but with varying levels of usefulness. The output, when exported as CSV and imported to Excel, will allow you to click directly to the links. Thanks to Steve for providing the code change to make the spreadsheet clickable. 

Before you can use any of these queries, visit the page named Views for Matching and run the code from that page. You'll also need to know the names associated to your tests.

Let's start with the compare from Helper. For this compare exercise, I'll use the test of my Swedish half-uncle as my "control", or person A, and my own test for person B. The Helper finds 416 matches and gives me clickable links to the A match and the B match. Note that the sort is A...Z and a...z. The upper and lower case letters sort separately.


 


Here's the SQL query that gives the same result, but with the sort ignoring the upper/lower case difference. After you paste the query into the window, you will replace the two test names with your compare names.

Rather than running it, click on the Export results icon and export the output straight to a CSV file.



-- simple two test compare
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonA.id || '/match/' || PersonA.matchid || '","A" )' as "MatchURL A",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonB.id || '/match/' || PersonB.matchid || '","B" )' as "MatchURL B"
from match_mining PersonA
join match_mining PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................."
  and PersonB.name = "Elizabeth L Richards"
order by upper(PersonA.admin) asc;






By opening the CSV file in Excel, I can review the information for the matches and click on the links. It gives a bit more information than the Helper list.




There are three more ways to run this compare to further refine the list and get even more valuable output.

Option two: All matches except my own tests.


Omitting my own tests drops the results from 416 to 407. The output is the same.



-- simple two test compare without my own tests
select PersonA.MatchName as "MatchName",
PersonA.admin as "Admin",
PersonA.people as "People",
PersonA.private as "Private",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonA.id || '/match/' || PersonA.matchid || '","A" )' as "MatchURL A",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonB.id || '/match/' || PersonB.matchid || '","B" )' as "MatchURL B"
from match_mining_omit_mine PersonA
join match_mining_omit_mine PersonB
on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................."
  and PersonB.name = "Elizabeth L Richards"
order by upper(PersonA.admin) asc;
order by upper(PersonA.admin) asc;



Option three: Only matches with ancestors, with a lower limit.


Limiting matches to tests with ancestors gives only 144 results. Limiting to tests with more than 50 ancestors drops the result list to 42.

Choose your lower limit based on what you find useful. By the definition of the view, the absolute lower limit with this query is always 1. You can use the number 0 ( > 0) as your lower limit.

This output shows the number of ancestors found for each match and sorts based on the number of ancestors found in the A person's matches.






-- two test compare, must have ancestors 
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
    PersonA.AncestorCount,
    PersonB.AncestorCount,
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonA.id || '/match/' || PersonA.matchid || '","A" )' as "MatchURL A",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonB.id || '/match/' || PersonB.matchid || '","B" )' as "MatchURL B"
from match_mining_with_ancestors PersonA
join match_mining_with_ancestors PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................"
  and PersonB.name = "Elizabeth L Richards"
  and (PersonA.AncestorCount > 50
     or PersonB.AncestorCount > 50 )
order by PersonA.AncestorCount desc;



Option four: Matches with ancestors, except my own tests, with a lower limit.

Limiting matches to tests with more than 50 ancestors drops the result list to 38 when I remove my own tests. The output is the same format as option three. 



-- two test compare, must have ancestors, exclude my own tests
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
    PersonA.AncestorCount,
    PersonB.AncestorCount,
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonA.id || '/match/' || PersonA.matchid || '","A" )' as "MatchURL A",
'=HYPERLINK("https://www.ancestry.com/dna/tests/' || PersonB.id || '/match/' || PersonB.matchid || '","B" )' as "MatchURL B"
from match_mining_with_ancestors_omit_mine PersonA
join match_mining_with_ancestors_omit_mine PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................"
  and PersonB.name = "Elizabeth L Richards"
  and (PersonA.AncestorCount > 50
     or PersonB.AncestorCount > 50 )
order by PersonA.AncestorCount desc;
order by PersonA.AncestorCount desc;



I hope you'll find these options helpful as you compare your collection of tests.

Originally posted January, 2017. Revised November, 2017.