Wednesday, December 26, 2018

ARGH! Ancestry's new feature is disappointing...

I just discovered THIS:  http://apv.ancestry.com/50558183%3A9009%3A66/overview?treeid=54485674&personid=13681670623

It has the "life story" of Célina Boulé!    Marvelous, you say!  Mystery solved.

Um, not so much.   Here it is:

When Célina Marie Boulé dit Laliberté was born on March 18, 1840, in Quebec City, Quebec, Canada, her father, Célestin, was 47, and her mother, Marie, was 44. She married ALEXANDRE GUIMOND and they had nine children together. She also had one son and three daughters from another relationship. She died on April 18, 1928, in Lotbinière, Quebec, Canada, at the age of 88, and was buried in Lotbinière, Quebec, Canada.
Here are the problems with that:

  1. We don't know when or where she was born.   It was AROUND 1840 based on her death record.   No one has ever been able to find anything that conclusively proves the March 18 date; it just gets passed around from Ancestry tree to tree.
  2. "Her father Célestin" - except that he's not her father.   People have mixed up two different Célinas; ours has no birth record, and her marriage record does not list her parents.
  3. "She also had one son and three daughters from another relationship".  No - she didn't.  That's the OTHER Célina.
This REALLY PISSES ME OFF, because it's being touted as a new "feature" of Ancestry, WITHOUT disclaimers that the data presented might entirely incorrect.

So that means that people who are doing research and don't know to dig under the surface might just "accept" it as fact. 

This completely goes against ALL the principles of genealogy.   Ancestry is being sloppy for the sake of marketing a new "feature" that has no checks and balances.

Wednesday, December 5, 2018

Updated DNA Ancestry - interesting

So, Ancestry has updated their regional reporting:

  • 79% Irish
  • 12% English
  • 9% French (they didn't have French before)
At the 4th generation back, I have:
  • 8 known Irish ancestors
  • 2 suspected Irish ancestors (based on last name)
  • 2 English ancestors
  • 3 French/Québecois ancestors
  • 1 unknown origin (Célina Boulé)
So that means:
  • 79% Irish vs.  62 1/2%
  • 12% English vs. 12 1/2%
  • 9% French vs. 18 3/4%
OK - so the English is a great match.   I don't understand the under-reporting of the French because even though a FEW Québec relatives married English or Irish immigrants, they were mostly at the distant cousin level, and not among the direct ancestors, but there are a few Europeans in there (at the 1-2% level or so).

If Célina is French/Québecois, then that would make it 25% French (observed) vs. 9% in the DNA results.  On the other hand, if she is Irish (a famine refugee) then it's 67 3/4% Irish vs. 79% which is a better (but not complete) match.

The only other possibility is that one entire sub-branch of my family (at the 5th generation) is not Québecois but is Irish.   That doesn't seem possible: the other Québecois branch has a set of first-cousins (which is another oddity, but I'll leave that aside) at the 4th generation, which ONLY leaves the Guimond/Sévigny marriage.and there's nothing odd in that family record to suggest that anyone (specifically Elusippe Guimond) was illegitimate.

So we're back to trying to discern what the range of errors are in the reporting.


Saturday, November 24, 2018

% DNA shared

When I had the opportunity to discuss genealogy with my (mostly-distant) brother, talking about some distant relative, his response to just about every distant relation was "So, no relation at all."

Of course this was idiotic for several reasons, but mostly it was annoying to me because the math says otherwise.

For direct ancestors you get one share from each parent/grandparent, so:


  1. Parent, 50% from each
  2. Grandparent, 25% from each
  3. Great-grandparent, 12 1/2% from each
and so on.  So basically 2- for each generation "up"?

But what about your ancestor's descendants? 

At first  - not thinking about it - I assumed that in the X,Y notation I use for consanguinity, it'd just be 2-(X+Y)  but that's wrong.   Why?  Because your uncles/aunt (and Nth grand uncles/aunts), don't suffer that first splitting, since your father/mother and their sibling (your aunt/uncle) also have the same share of DNA from their parents.

So instead it's 2-where Z is:


  • X,  if Y ≤ 1;
  • X + Y – 1, if Y > 1
Thus:

  • For first cousins (X = 2, Y = 2),  it's  2 –(2 + 2 – 1)  = 1/8 or, 12.5%;
  • Second cousins once-removed (4,3 or 3,4) it's 2  –(4 + 3 - 1) = 1/64 or 1.56%
OK - great.   Of course it's not ALWAYS a direct 50% contribution: there's a range: for first cousins, it's 7.3% to 13.8%.

But if you're family tree is like mine where there's lots of distant relatives marrying distant relative (though not necessarily relations to each other), how do you estimate the shared DNA to their descenants?

So, say your 5th cousin 3x removed (9,6) marries your 3rd cousin 2x removed (6,4).   From the former there's a 1/2^(-14) share and from the latter there's a 1/2^(-9) share.  I think you just add them, and divde by 2.   Or, 0.0061% + 0.1953% = 0.2014% / 2 - 0.1007% for their kid (your 4th cousin 1x removed = 6,5), who if the (9,6) weren't in the picture, would only be 0.0977%.   (Clearly this matters more the closer the relations are - it gets interesting if say, distant cousins on your father's side marries distant cousins on your mother's side.)

"But it's such a tiny number!    Yes, but consider that there are million of base pairs in your DNA.   In terms of all this genetic testing it's expressed in terms of centiMorgans (basically it's a "unit" of DNA).   Parents each contribute about ~3,400 cM, and so you can use the percentage formula to estimate the degree of overlap in DNA.

https://www.yourdnaguide.com/scp/ has a great article and table.   (They also give the range which is helpful, note that anything beyond/outside 3rd cousins (or Z = 7 above) does reach the possibility that you could be related, yet actually share NO DNA with your distant cousin, but by the same token, even eight cousins (= 17) could overlap with an average of 12 cM up as high as 50 cM.

Looking at this the other way - if you compare your DNA with someone and get an overlap you can invert the equation above to estimate the number of degrees of separation Z you have.   Comparing that to the cM Project's chart, you'll notice that there is a LOT of overlap among different distant cousins: say you find an overlap of 45 cM.    That's about Z = 7.2, where 3rd cousins are  = 7 (74 cM on average, and 4th cousins are Z = 7 (35 cM on average).    But the ranges are what's important (0–217 for 3rd cousins, 0–127 for 4th), plus all of the other cousins (3C1R, 3C2R, 4C2R, etc.) whose ranges also include 45 cM.   That's why your Ancestry DNA or 23andMe "distant relative" matches all have ranges in the predicted relationship; they're doing a comparison of shared DNA segments and comparing them to the expected ranges.

I still need to work out the math for consanguine relationships (e.g., the above example but they're also, say, 3rd cousins to each other).   I think in that case you have to follow each step along the way and apply the 50/50 mix separately (or at least do the formula above UNTIL you get to the consanguine relationship and then go step-by-step the rest of the way "down".

And, as of this morning, we're up to 67,916 people.  :-)




Thursday, August 16, 2018

DNA Results #2 - confirmation of... the same unexpected result?

As I mentioned before, I went and splurged on a 23andMe DNA test to correlate with the Ancestry one.   I did this because my DNA breakdown from Ancestry didn't match my expectations:
  • 75% British/Irish (the Donahue and Hall lines, plus Bradish)
  • 25% French
But the Ancestry results were:

  • 91% Ireland/Scotland/Wales (59%) and Great Britain (32%)
  • 9% everything else.
But the mapping of that 32% ALSO seems to include parts of France, particularly Normandie where I know is the origination of most of the French ancestors.


Probably not surprising (given that there's like little room for error in the DNA test itself), the 23andMe results are in line with the Ancestry ones:

  1. British and Irish:   86.5%
  2. French and German:  6.8%
  3. Broadly NW European:  5.6%
  4. everything else (European):  1.1%
So, lumping all of the non-British/Irish together, that's 13.5%, a little more than only half what I expected.

This leads into the whole "was Célina Boulé" actually an Irish adoptee hypothesis.

If she is French, then the 75/25 expected split is still there: there's just not enough English/Irish in the more distant ancestors to make up a 1/8th discrepancy.

BUT if she's Irish: then 13/16 of my 4th-generation ancestors are British/Irish = 81.25% and everything else is 18.75%.   Given that we know that SOME of the Québec/Acadian ancestors married non-French people (not many, but a few), then we START to get closer to the stated results.

The other thing that's cool about the 23andMe results is that their reporting is more in-depth:

Maternal Haplogroup:  J1c1

This mostly stems from central Europe, the Balkans and the Ukraine.   But I suppose it would extend to France to.

Paternal Haplogroup: R-S15280.1

Very, very Irish.

I'm also more Neaderthal than 68% of 23andMe customers!   Yay!

I was able to get Dad to do both Ancestry and 23andMe tests.   Results pending.   I did this because it will also help with the "is Célina Irish" test since it will let me immediately distinguish for all of the identified "distant cousins" that both sites offer which SIDE of the family they're on (if they're distant cousins of both Dad and me, then they're on the Irish side, otherwise the French/Irish side).  

Hopefully enough of these distant cousins might clump in THEIR overlaps to suggest who Célina really was.

Where things are now...

I've (finally) finished the first- and second-generation Acadiens.

Whew!

This was a lot of work, because most of my typical work flow had to be adjusted: the LaFrance doesn't have the Acadian records (since they only map Québec parishes).   Ancestry has some of them (many of them were destroyed by the British at one time or another).  While there are records for Beaubassin, Port Royal, and Grand-Pré, other locations had no records at all.

Fortunately, I found a web site with the Port Royal records neatly organized by family name and date.

But the abundance of gaps, combined with the entire population of French Acadia being dispersed in the 1750s made things hard to follow:  many went to the British colonies, others to France, and others to other French settlements: Louisiana, Québec or places like Miquelon (where I was also able to find some records).   At first I was puzzled by those who went to New England, the Carolinas, etc.:  why would you leave a British take-over of Acadia to go to another British colony.   Then I found out that most were forcibly deported TO those places by the British with the idea that the displaced families would re-integrate --- the attempt was made to send them to somewhat rural places (e.g., western Massachusetts); however, most didn't stay there and moved to the cities (which the British tried to avoid).    Things were particularly awful in the case of Québec City: an outbreak of measles became an epidemic, and hundreds of people died around 1757-1760, with entire families being wiped out.

Another situation happened while trying to determine if spouses of family members I was researching were also distant cousins (which with the Acadiens was extremely common: consanguine marriages of the 3rd and 4th degree were prevalent): in one case, the spouse ended up having a HUGE family tree archived on WikiTree: we're talking several THOUSAND people hitting pretty much every since royal family in Europe and aristocracy galose) - so that took several weeks to map.   (I would've quit, but my OCD kept me going, and I figure that it'll eventually come in handy if I ever get a breakthrough on the Irish parts of the family tree!)

So, now I'm going through all of the WikiTree entries for the direct descendants, filling in the blanks and getting a sense of what will be involved in finishing up the first- and second- generations mapping for the pre-Québec families.   That'll be the last "phase" of this project.

...  Then we start the third generation with (by my estimate) about 8-9,000 families to map.

Given that it took over three years to get this far, it might be 2021 or so before I'm finished with that.

Monday, April 23, 2018

Genomic Collapse is a Bitch

Sometimes distant cousins marry.

Sometimes not-so-distant cousins marry.

And then there's Jamie and Cersei Lannister, but that's another story.

In any case when this happens, the existence of shared ancestors begins to shrink down the number of people in the family tree, compared to what could possibly be there.

I'm defining "Genomic Collapse" as:

The ratio of unique people in a subset of the family tree to the number of "filled slots" in that subset.
 So in the extreme case of Joffrey Lanister (whose parents were brother and sister), that's 50% genome collapse at a minimum because the maternal and paternal blood lines are identical; and "at a minimum" because another consanguine relationships further up continue to contribute to the genome collapse.

So - what is it for my maternal family tree?


  • Out to 10 generations:  41.4% complete tree but with 30.2% collapse (296 people in 424 filled slots out of 2046 total slots);
  • Out to 15 generations: 5.9% complete but with 49.0% collapse (988 people in 1,937 filled slots out of 65,534 total slots);
  • Out to 20 generations: 0.2% complete but with 51.4% collapse (1,074 people in 2,208 filled slots out of 2,097,150 total slots).
I'm not sure if this is the right way to do this because one you hit a non-unique ancestor, that effect double with every generation back.

In any case, going from generation to generation it looks like this:



The blue line is the percent overlap at that generation.   The green line is the % overlap for that generation and all the ones preceding it.    The orange line is the highest possible percentage of overlap (i.e., all unknown ancestors for that generation are all repeats of known ancestors), the red line is the lowest possible percentage (all unknown ancestors are new people).



Wednesday, April 18, 2018

I think I know how to find Célina Boulé

It occurred to me last night as I was falling asleep that I might be able to use the DNA results to identify Célina Boulé's parents.

It's something of a long-shot, but given this tree:


starting with my grandmother, what we're trying to find is the set of 3rd great-grandparents that's missing.  

All of the DNA matching products try to match you up with distant cousins.   I'm slowly making my way through that list of people, looking at their family trees (when they've bothered to make one) to see if I can find common ancestors, and thereby establish our relationship.   A few have had Alexandre Guimond and Célina Boulé as the common ancestor.

But some of these supposedly "high-confidence" matches appear to have no common ancestors.  This got me thinking - what if the common ancestors are the missing parents for Célina Boulé?  And - how could we identify who they are?

What I need to do is find 4th cousins, determined genetically but NOT through a shared family tree.  Why?  Because if we can identify them through a family tree, then we know they're not Célina's parents.    What we want to find are all the 4th cousins for whom it's not possible to find a common ancestor but that we are certain is on the Québec side of the family (because otherwise they might be one of the unknown Irish or British 3rd great-grandparents).

So:
  1.  Are we genetic 4th cousins?
  2.  Do our family trees overlap in Québec?   (Or not - see below!)
  3.  Do we not see any overlap at the 5th generation?    

Note that this isn't as difficult to confirm as you might think.  There are only three family groups to consider:   a) Narcisse Guimond and Céleste Sévigny,  Basile Tousignant and Marguerite Maillot,  and Pierre Bélanger and Thérèse Maillot.  If none of these families are there, then the only one that remains are the parents of Célina Boulé.   

Nonetheless, for this to work, it still requires one other thing:  Célina MUST have a sibling and my 4th cousin must be a descendant of that sibling.   Otherwise my 4th cousin and I will have the same "hole" in the family tree with Célina being a dead-end.
So - given a set of 4th cousin candidates, under the conditions above, the intersection of all of our family trees' common ancestors should point to the parents of Célina Boulé.   This can get tricky, because it's possible that her parents are already in my family tree through some other route.   If that location in the tree is far away from the 3rd great-grandparent position (i.e., they're also C:9,4's or something) it might still work because the DNA correlation for such a distant cousin would be negligible.  

What if Célina is not Québecois?

One of the likely possibilities is that Célina is adopted and might not have a Québec background!  Instead, it's possible she's an Irish immigrant escaping the Irish potato famine.  There's no evidence to support this but the timing fits, and there was a huge influx of refugees from Ireland in 1847 (at which point Célina would be about 7 years old).  Nearly 100,000 immigrants came though quarantine at La Grosse Île, about 30 miles downstream from Québec before being relocated to Québec, Canada West, and the US:

The mortality rate from a typhus was huge: about 1 in 6 died during, or shortly after the crossing.  So, it's also possible that she was orphaned in this way.     There were programs set up for adoption, typically done by the church, where (usually older) children were placed with families.  It was more of a "foster" system than adoption; the children were seen more as convenient labor than part of the family, and typically kept their original family names.   So Célina might be one of the exceptions: if  - as I suspect - she were adopted/fostered by Moïse Boulé and Domitille Bernier (who do not appear to have had any children of their own), she took on their name (at least some of the time).

(Also, "orphan" has a slightly different meaning in the context of the situation:  some "orphans" had a living parent, but one who was not able to care for them (sickness or destitution) which makes the "adoption" more like foster care.

But her Irishness can be tested too, if there's clearly Québec integration from the late 19th century (i.e., Célina's sibling also married into a Québec family and had children) but the only common ancestors are Irish and never Québecois, then that would lend support to the "Célina is an Irish immigrant" theory.

It doesn't appear there's much in the way of records showing which children were placed with which families.  Another avenue I have not attempted is to see if there are other children that might've been adopted/fostered by Moïse Boulé.  

UPDATE: 8/16/2018

No conclusive results - yet!   But an interesting thing came out of the DNA testing.  According to both Ancestry and 23andMe, my DNA is ~85% British/Irish and ~15% everything else.   Based on the locations of ancestors in my family tree, at the 4th generation, I'd expect that to be more like 75/25, UNLESS Célina is Irish, in which case it becomes more like 81/19 which STARTS to look like the actual results.

That doesn't "prove" anything but it's one more datum that supports the hypothesis.