Sunday, January 8, 2017

Shared Matches for Two Tests


The AncestryDNA Helper gives you the ability to find the matches in common (compare matches) between any two tests in your database. It's more powerful than the shared matches feature provided by Ancestry, so is more useful. But it returns all the matches, regardless of value. Many of the matches have no tree, so they provide no value to me.

Today I'll share four different ways to use SQLite Manager to create a compare list similar to the Helper, but with varying levels of usefulness. The output won't let you click directly to the links. By pasting the results into a spreadsheet, the links can be converted to hyperlinks (not simple). Each link can always be copied from the spreadsheet and pasted into the browser.

Before you can use any of these queries, visit the page named Views for Matching and run the code from that page. You'll also need to know the names associated to your tests.

Let's start with the compare from Helper. For this compare exercise, I'll use the test of my Swedish half-uncle as my "control", or person A, and my own test for person B. The Helper finds 199 matches and gives me clickable links to the A match and the B match. Note that the sort is A...Z and a...z. The upper and lower case letters sort separately.



Here's the SQL query that gives the same result, but with the sort ignoring the upper/lower case difference. When you run the query, you will replace the two test names with your compare names.



-- simple two test compare
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
"https://www.ancestry.com/dna/tests/" || PersonA.id || "/match/" || PersonA.matchid as "MatchURL A",
"https://www.ancestry.com/dna/tests/" || PersonB.id || "/match/" || PersonB.matchid as "MatchURL B"
from match_mining PersonA
join match_mining PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................."
  and PersonB.name = "Elizabeth L Richards"
order by upper(PersonA.admin) asc;



Here are the results, ready to be saved. Click on Actions and save the output as a CSV file.






By opening the CSV file in my spreadsheet, I can review the information for the matches and copy and paste the links. It gives a bit more information than the Helper list.




There are three more ways to run this compare to further refine the list and get even more valuable output.

Option two: All matches except my own tests.


Omitting my own tests drops the results from 199 to 192.The output is the same.



-- simple two test compare without my own tests
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
"https://www.ancestry.com/dna/tests/" || PersonA.id || "/match/" || PersonA.matchid as "MatchURL A",
"https://www.ancestry.com/dna/tests/" || PersonB.id || "/match/" || PersonB.matchid as "MatchURL B"
from match_mining_omit_mine PersonA
join match_mining_omit_mine PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A.........."
  and PersonB.name = "Elizabeth L Richards"
order by upper(PersonA.admin) asc;



Option three: Only matches with ancestors, with a lower limit.


Limiting matches to tests with ancestors gives me 76 results. Limiting to tests with more than 50 ancestors drops the result list to 30.

Choose your lower limit based on what you find useful. By the definition of the view, the absolute lower limit with this query is always 1. You can use the number 0 ( > 0) as your lower limit.

This output shows the number of ancestors found for each match and sorts based on the number of ancestors found in the A person's matches.






-- two test compare, must have ancestors 
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
    PersonA.AncestorCount,
    PersonB.AncestorCount,
"https://www.ancestry.com/dna/tests/" || PersonA.id || "/match/" || PersonA.matchid as "MatchURL A",
"https://www.ancestry.com/dna/tests/" || PersonB.id || "/match/" || PersonB.matchid as "MatchURL B"
from match_mining_with_ancestors PersonA
join match_mining_with_ancestors PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A................"
  and PersonB.name = "Elizabeth L Richards"
  and (PersonA.AncestorCount > 50
     or PersonB.AncestorCount > 50 )
order by PersonA.AncestorCount desc;



Option four: Matches with ancestors, except my own tests, with a lower limit.

Limiting matches to tests with more than 50 ancestors drops the result list to 28 when I remove my own tests. The output is the same format as option three. 



-- two test compare, must have ancestors, exclude my own tests
select     PersonA.MatchName as "MatchName",
    PersonA.admin as "Admin",
    PersonA.people as "People",
    PersonA.private as "Private",
    PersonA.AncestorCount,
    PersonB.AncestorCount,
"https://www.ancestry.com/dna/tests/" || PersonA.id || "/match/" || PersonA.matchid as "MatchURL A",
"https://www.ancestry.com/dna/tests/" || PersonB.id || "/match/" || PersonB.matchid as "MatchURL B"
from match_mining_with_ancestors_omit_mine PersonA
join match_mining_with_ancestors_omit_mine PersonB
  on PersonA.matchid = PersonB.matchid
where PersonA.name = "A...................."
  and PersonB.name = "Elizabeth L Richards"
  and (PersonA.AncestorCount > 50
     or PersonB.AncestorCount > 50 )
order by PersonA.AncestorCount desc;



I hope you'll find these options helpful as you compare your collection of tests.

Saturday, January 7, 2017

Corrupt Test Names


A recent topic in the AncestryDNA Helper group has been about errors in test names, leading one person to use the term corrupt. So let's use that word. It happened to me, too. If you're having this problem, do read through the recent postings in the group about how people have solved the problem.

I have nine tests and my name ended up on two of them. I think it's due to rushing to start looking at matches before the Helper has settled in and is ready to start watching.

I can look at my nine tests and can easily see the name of the missing person. But for group members with far more tests, figuring out the errors may be difficult.

Today I'll share how to find the corruptions in your tests. If you're brave enough, you can also use code from this posting to fix those errors.

This code does have one very large limitation. If you have a corrupt test that does not match anyone else in your database, this code will not work. The premise is that because every test in my database has at least one match to another test in my database, I can use the name of the match to figure out what the name on the test is really supposed to be.

Here are my tests, seen from the Browse & Search tab in the SQLite Manager. Notice my name shows twice. Which one is my test and which one is not?





Running the following query will check each test name against the match list.


select distinct
    tests.name as "Corrupted Test Name",
    matches.name as "Correct Name",
    tests.id as "Test ID"
from     tests,
    matches
where tests.id = matches.matchid
  and tests.name != matches.name;


The output shows that the test ID starting 554CE is for my daughter KK, rather than for me.





Now that I know that KK's test is corrupt, I can try (and did try) to fix it within the Helper. I was not able to do so. Back to SQL for me.

Updating with SQL


Safety first. I cannot emphasize this enough. I don't claim to write perfect code for every situation. A trashed database can take weeks to rebuild, so take your time. Back up your database. Test your changes.

If you decide to use SQL to fix your database, be sure you keep a backup copy of your database. Close Chrome. Grab a brand new copy of your Helper database. Copy it, copy it again. Work in one copy and keep the other one safe.

Look at your test list before starting an update. Export the test list to a file. You will want to be able to refer back to the "before" list.

Run the following code which will change all corrupt names that it can identify from the match list.


update tests
set name =
(
    select distinct matches.name
    from  matches
    where tests.id = matches.matchid
)
where tests.name  !=
(
    select distinct matches.name
    from  matches
    where tests.id = matches.matchid
);


There is no feedback in SQLite Manager from running the code. It only takes a few seconds. After it is done, return to the Browse & Search tab. Review the names and ID numbers against your "before" list to make sure the results are as you expect.

In my case, KK's name ended up in the test with ID 554CE. All the other names stayed correct.




Because you've worked in a copy of the database, you'll now need to replace your real Helper database. You've kept a safety copy, right?

As scary as it is, delete the Helper database. Close out of your work database and copy it into the location of the Helper database. Name the file to the same name as your Helper database. For me that name is "12".

Open Chrome and AncestryDNA and give the Helper time to get settled in and to show the buttons. Experiment a bit with your tests and compares and exports. Be sure it is working as you expect.

Once you feel confident that all is working correctly, you can delete other database copies. If it does not work properly, use your extra safety copy to replace the Helper database and return it to the previous version.

Tuesday, December 27, 2016

Inconsistent Private Matches


Someone recently speculated that the team at Ancestry doesn't care about breaking the AncestryDNA Helper. I think they prefer it being broken. Jeff has an on-going challenge to figure out what they have changed and how to get the Helper going again. Each time they change it, there is a potential to cause data inconsistencies.

If you've scanned multiple tests over several months, you've likely encountered inconsistencies. A few days ago I shared a query that will help find those inconsistencies. I've since learned that getting consistent results is extremely difficult when the match has a private tree. This post is just to help everyone understand the data and process.

For an example, I'm using matches to a cousin, friend and research partner that I'll call Precious. She and I have family members who match in multiple ways. I don't have access to her private tree but I know her family and how to flush out each match.

What I found is that clicking into a match from a Helper users search won't fix the database. I'll focus on the three matches belonging to daughter K. First I searched for Precious, then clicked into each of the three results.



After clicking into the three tests, here is the important match data with the three matches underlined. None are marked as private. One has a tree count, while two do not. There are three different values for last login. The match scan date is not 12/27, as I would have expected.



Back to the drawing board (and a peek at the Helper code). I could see that some of the information is taken from the summary page and not from the match page. The problem is you have to either page to the right summary page or be able to get the summary by surname or birth location. For this experiment, I viewed K's match list and used the Ancestry surname filter search to build a new summary page. I did this for two surnames to get all three matches onto summary pages.




The data was updated, but not fully. The private flag was set (YAY!) and the people count was updated. Last login and scan date were not changed. The viewed flag changed from the value 1 to the value true.




I found a sequence that finally worked for me to update everything possible. That was to use the surname search to create a new summary page, click into each of the pertinent matches on the summary page, exit from the matches, then reload (F5) the summary page. The last login changed, but the match scan date never did.



Of course, having gone through the exercise, it yields no ancestors. However, I now understand the source and limitations of the private tree flag in the Christmas 2016 design of the Ancestry pages and the Helper code.

I hope this is useful to someone else and I certainly hope the private flag is able to be more robust in a future release.