In [1]:
import numpy as np
from IPython.display import HTML
from bokeh.plotting import output_notebook, show
import genomes_dnj.lct_interval.lct_plots as dm
output_notebook(hide_banner=True)

Lactase Persistence SNP Series

The notebooks in this analysis study the region of chromosome 2 that includes the gene lct that codes for the protein needed to digest lactase. The original reason for focusing on this region of chromosome 2 was the importance of the phenotype lactase persistence that allows the digestion of milk to extend into adult life. The selective advantage of genetic variations that result in this phenotype have been a particularly large topic for human genetics. Most studies and analyses have assumed that lactase persistence is the result of the action of a single SNP and not paid much attention to the phenomena of SNP correlation and haplotypes. The results presented in this notebook come from an initial analysis before the SNP series hierarchies and associations covered in other notebooks of this analysis had been identified. When all of the results that have emerged from this analysis are summed up, the idea that multiple SNPs play a role in the lactase persistence phenotype seems a better fit with the data than the idea that the phenotype is a consequence of the action of only a single SNP.

The SNP rs4988235 has generally been identified as the source of lactase persistence in European populations. The method for grouping SNPs used for this analysis groups rs4988235 with 10 other SNPs into a series that is expressed by 765 of the 5008 thousand genomes phase 3 chromosome 2 samples. The samples that express 11_765, also express seven other series, 6_1503, 4_1699, 4_911, 26_1414, 64_1575, 10_2206, and 7_1868. This plot shows these series. A black line has been drawn at the location of each SNP. The green color indicates an overexpressed European population. The aggregate series at the bottom of the plot shows a line for each of the 132 SNPs that are are strongly associated with the expression of rs4988235.

In [2]:
plt0_obj = dm.lct_agg()
plt0 = plt0_obj.do_plot()
show(plt0)
Out[2]:

<Bokeh Notebook handle for In[2]>

This table shows the data for the individual series. Most were expressed by all 765 samples. None of those samples came from East Asian populations. Only 2 came from true African populations. But 32 instances came from the American Southwest and Caribbean populations that the thousand genome data counts as part of the African region. Those samples are counted in the column labeled "afx". The column labeled "sax" represents the part of the 1000 genomes phase 3 South Asian region population that lives in the United States or the United Kingdom.

In [3]:
plt1_obj = dm.superset_11_765()

plt1 = plt1_obj.do_plot()
HTML(plt1_obj.get_html())
Out[3]:
indexfirstlengthsnpsallelesmatchesafrafxamreaseursassax
353921136,501,84053,8191022060.357651.0020.01320.671461.3800.004843.15561.01450.48
354170136,682,27493,624718680.417641.0020.01320.671461.3800.004843.15561.01440.47
353462135,915,35879,721416990.457651.0020.01320.671461.3800.004843.15561.01450.48
353901136,494,186271,7656415750.497651.0020.01320.671461.3800.004843.15561.01450.48
353283135,771,974368,330615030.517651.0020.01320.671461.3800.004843.15561.01450.48
353797136,398,17475,9242614140.547651.0020.01320.671461.3800.004843.15561.01450.48
353604136,092,061315,41849110.847621.0020.01310.651461.3800.004843.16540.97450.48
353380135,837,906870,076117651.007651.0020.01320.671461.3800.004843.15561.01450.48

Series SNPs

A set of tables shows summary data for each series and a table of data for each SNP in the series.

The column in the SNP data tables labeled "niv" means not_expressed_is_variant. The SNP allele with the lowest frequency in the thousand genomes data is always considered to be the variant even when the reference genome labeled the more common allele as the variant. An niv value of 1 means the allele considered as the variant for this analysis is the one that the reference genome considers to be the standard. Note that the word allele is used to indicate a single chromosome sample that expresses an SNP or a series.

In [4]:
import genomes_dnj.lct_interval.lct_series_html as dh
HTML(dh.lct_html)
Out[4]:

11_765 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353380135,837,906870,0761176520.01320.671461.3800.004843.15561.01450.48

11_765 series snps

indexposidnivallelesafrafxamreaseursassax
875450135,837,906rs7570971178520.01320.651491.3700.004933.13601.05490.51
875770135,907,088rs6730157179120.01320.651511.3800.004933.10611.06520.54
875981135,954,797rs1375131178920.01320.651491.3600.004923.10611.06530.55
876953136,138,627rs3940549178520.01320.651501.3800.004923.12611.07480.50
877231136,176,540rs13384711179530.02320.641511.3700.004933.09631.09530.54
877937136,328,890rs56369224179320.01320.641511.3720.014923.09621.08520.53
878126136,381,348rs12465802179520.01320.641511.3700.004963.11621.07520.53
878351136,429,366rs62168795180720.01340.671541.3800.005063.12601.02510.52
879308136,608,646rs4988235080820.01340.671501.3400.005113.15601.02510.51
879345136,616,754rs182549081820.01340.661531.3500.005123.12631.06540.54
879828136,707,982rs6754311182100.00340.661541.3500.005143.12621.04570.57

4_911 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353604136,092,061315,418491120.01310.542051.6200.005242.86711.07780.70

4_911 series snps

indexposidnivallelesafrafxamreaseursassax
876701136,092,061rs1561277196220.01320.532071.5500.005342.76851.221020.86
877683136,273,578rs6735329193820.01330.562071.5900.005332.83781.14850.74
877903136,322,676rs6759321196820.01320.532071.5410.015342.75891.261030.87
878243136,407,479rs1446585195120.01340.572091.5900.005422.84791.14850.73

6_1503 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353283135,771,974368,33061503140.05370.393071.473061.016082.011091.001220.66

6_1503 series snps

indexposidnivallelesafrafxamreaseursassax
875075135,771,974rs1018740211528140.05370.393081.453111.016152.001131.021300.69
875541135,859,371rs134131011152890.03380.403131.483131.026132.001131.021290.69
875644135,877,562rs186982911544380.12430.443141.473050.986081.961110.991250.66
876349136,022,798rs93561311560410.13430.443151.463120.996091.941131.001270.66
876551136,058,820rs201663611559390.12430.443161.463120.996111.951110.981270.66
876971136,140,304rs643057111564400.13440.453171.463161.006101.941110.981260.66

4_1699 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353462135,915,35879,721416991510.44780.733201.363090.906091.781100.891220.59

4_1699 series snps

indexposidnivallelesafrafxamreaseursassax
875801135,915,358rs6747073117241600.46820.763241.363130.906091.761100.881260.60
875978135,954,405rs1375132117271590.46790.733271.373150.916101.761110.881260.60
876197135,995,073rs12471508117421640.47810.743281.363160.906101.741130.891300.61
876198135,995,079rs7609517118401870.50890.773451.353250.886331.711220.911390.62

26_1414 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353797136,398,17475,924261414930.33600.682881.471280.456082.141111.081260.73

26_1414 series snps

indexposidnivallelesafrafxamreaseursassax
878194136,398,174rs1261936511402780.28560.642891.491300.466082.161121.101290.75
878208136,402,117rs495427611399780.28560.642891.491290.466072.161121.101280.75
878220136,403,749rs1341363911424950.33610.682891.461280.456092.131121.081300.74
878251136,409,073rs495427911423950.33610.682891.471300.456072.121121.081290.74
878256136,410,299rs1092854211423950.33610.682891.471290.456082.131121.081290.74
878278136,413,359rs760804511421950.33600.672891.471280.456082.131121.081290.74
878294136,416,855rs671585611421950.33600.672891.471290.456082.131121.081280.73
878295136,416,941rs675538311421950.33600.672891.471290.456082.131121.081280.73
878304136,418,348rs672028711420950.33600.672891.471280.456082.131121.091280.74
878308136,419,961rs144658411421960.34600.672891.471280.456082.131121.081280.73
878313136,420,690rs495428011419940.33600.672891.471280.456082.131121.091280.74
878322136,422,171rs203427611422960.34600.672891.471280.456082.131121.081290.74
878348136,428,460rs65632611421960.34600.672891.471280.456072.131121.081290.74
878362136,430,866rs31352211422960.34600.672891.471290.456082.131121.081280.73
878364136,432,103rs31352311422960.34600.672891.471290.456082.131121.081280.73
878389136,437,507rs31351711422960.34600.672891.471290.456082.131121.081280.73
878398136,439,090rs31351811422950.33600.672891.471290.456082.131121.081290.74
878402136,439,517rs31351911368570.21490.572851.501290.476082.211121.131280.76
878430136,444,123rs31352411422950.33600.672881.461290.456092.131121.081290.74
878443136,445,526rs31352611420930.33600.672881.461290.456092.131121.091290.74
878445136,445,869rs31352811421940.33600.672881.461290.456092.131121.081290.74
878486136,455,600rs7134871401367560.20490.572851.501290.476092.221111.121280.76
878492136,456,642rs1218571201385930.33580.672841.481290.465852.101081.071280.75
878514136,462,441rs735535901368560.20490.572851.501290.476102.221111.121280.76
878556136,470,714rs1341239701421930.33600.672921.481270.446102.141111.071280.73
878567136,474,098rs1167950801421930.33600.672921.481270.446102.141111.071280.73

10_2206 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353921136,501,84053,8191022062740.621090.793321.094431.006571.481621.012290.85

10_2206 series snps

indexposidnivallelesafrafxamreaseursassax
878705136,501,840rs10928545022082750.621090.793321.094431.006591.491621.012280.84
878719136,505,546rs3820794024062740.571090.723521.066121.266571.361680.962340.79
878786136,516,748rs12616520022062740.621090.793321.094431.006571.481621.012290.85
878813136,522,710rs9287442022072740.621090.793321.094431.006581.481621.012290.85
878846136,528,004rs2304599024152800.581110.733521.056121.266591.361680.962330.79
878905136,539,513rs10188066024062750.571090.723531.066111.266571.361680.962330.79
878929136,544,752rs6760329024072750.571090.723511.056121.266591.361680.962330.79
878937136,546,110rs2278544022112750.621100.793321.084431.006601.491621.012290.84
878979136,553,529rs1030764024142750.571100.733551.066141.266601.361680.962320.78
878998136,555,659rs2322659022172840.641140.823241.054661.046421.441611.002260.83

64_1575 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
353901136,494,186271,765641575550.17490.502571.183541.125841.851141.001620.84

64_1575 series snps

indexposidnivallelesafrafxamreaseursassax
878662136,494,186rs1020248901803950.26610.543101.243861.066181.711391.061940.88
878669136,495,300rs12469709018481270.34670.583151.233941.066171.661381.031900.84
878670136,495,619rs1019562001758620.18500.453071.263941.116171.751381.081900.88
878692136,499,166rs143830701806970.27610.543111.243931.086161.701381.051900.86
878710136,502,792rs143830501808970.27610.543111.243941.086171.701381.051900.86
878711136,503,157rs143830401809980.27610.543111.243941.086171.701381.051900.86
878724136,507,039rs1299838701809980.27610.543111.243941.086161.701381.051910.86
878753136,511,575rs321388901809970.27610.543111.243941.086171.701381.051910.86
878793136,518,103rs1299427001815970.27610.543111.244001.096161.691381.051920.86
878803136,521,514rs1019282701734430.12450.413071.283941.136161.771381.091910.90
878812136,522,675rs1017339401809970.27610.543111.243941.086171.701381.051910.86
878817136,522,941rs2845384001758610.17500.453071.263941.116171.751381.081910.89
878842136,526,981rs55825471018261130.31630.553131.243941.076141.671381.041910.85
878887136,535,410rs71348715018291130.31630.553131.233941.076171.681381.041910.85
878901136,539,122rs11684545018301140.31630.553131.233941.076161.681391.051910.85
878924136,544,197rs1092854601761630.18510.463061.253951.116171.741381.081910.88
878956136,550,109rs208272901812980.27610.543111.243951.086171.701381.051920.86
878963136,551,694rs12998016018331140.31640.563141.243941.076181.681381.041910.85
878976136,553,188rs1030765019261410.36730.603191.203901.016291.631551.112190.93
878980136,553,639rs101136101814980.27610.543131.253961.086181.701381.051900.85
878992136,554,800rs6430589018341140.31640.563161.243951.076181.681371.031900.84
879004136,557,319rs2322660018661590.42770.663021.174011.076001.601310.971960.86
879006136,558,157rs748841018641590.42780.673031.174011.075991.601310.971930.84
879055136,569,848rs1298807601784970.27620.553001.214031.125991.671301.001930.88
879078136,575,199rs671948801783970.27610.553001.214031.125991.671301.001930.88
879087136,576,577rs89271501783970.27610.553001.214031.125991.671301.001930.88
879097136,578,536rs495445001788970.27610.543001.214031.125991.671311.011970.90
879107136,580,287rs216421001788970.27610.543001.214031.125991.671311.011970.90
879126136,583,192rs74550001787970.27610.543001.214021.125991.671311.011970.90
879158136,586,958rs1018684301787970.27610.543001.214021.125991.671311.011970.90
879167136,588,478rs1020765201767820.23570.512991.224021.135991.691311.021970.91
879181136,589,612rs12620033017561020.29620.562791.153851.096001.701311.031970.92
879188136,591,178rs112742092118991800.47970.812861.094061.066031.581300.941970.85
879193136,591,859rs62159034017561010.29620.562801.153851.096001.701311.031970.92
879217136,593,760rs6730196118601700.45960.822831.103831.026001.611310.971970.86
879220136,594,158rs2236783017451010.29620.572791.153751.076001.711311.031970.92
879266136,603,276rs375468601744990.28610.562811.163861.105951.701301.031920.90
879268136,603,366rs3769005018451890.51770.672851.113781.025951.611290.961920.85
879305136,608,231rs4954490017371040.30600.552811.173781.085931.701291.021920.90
879314136,609,975rs4954493017411050.30610.562811.163781.085951.701291.021920.90
879316136,610,598rs498822601685990.29610.582681.153781.115871.731180.961740.84
879323136,611,624rs30917801679950.28610.582671.153781.125871.741170.961740.85
879332136,613,780rs309179016901040.31610.582681.143781.115871.731170.951750.84
879335136,614,255rs30918001684980.29620.592681.153781.125871.741170.961740.84
879337136,614,813rs309181016891040.31610.582681.153781.115871.731170.951740.84
879349136,617,524rs309173016891040.31610.582681.153781.115871.731170.951740.84
879379136,622,216rs30917601694990.29610.572681.143871.145871.731180.961740.84
879400136,625,602rs309811016991040.30610.572681.143871.135871.721180.961740.84
879427136,629,911rs30912801685990.29610.582681.153791.125871.731170.961740.84
879434136,630,757rs68042801679990.29610.582681.153791.125871.741160.951690.82
879436136,630,989rs18868001679990.29610.582681.153791.125871.741160.951690.82
879437136,631,031rs30913001682990.29620.592701.163781.125871.741170.961690.82
879450136,633,771rs19107911626580.18510.502641.173791.165871.801160.981710.86
879465136,636,324rs55830620116841040.31620.592681.153761.115871.741160.951710.83
879483136,638,216rs632632116931040.31630.592681.143851.135861.721160.941710.82
879492136,640,233rs666407116881060.31620.592681.153771.115871.731170.951710.83
879500136,641,882rs1435576116841040.31620.592681.153761.115871.741160.951710.83
879509136,643,555rs309125117401480.42680.622681.113841.105881.681160.921680.79
879594136,658,345rs2181741154870.02380.392531.183741.205891.891191.061680.89
879651136,670,298rs30916811619590.18480.472591.153761.155891.811201.021680.85
879711136,685,228rs30916011627590.18480.472591.153841.175891.801201.011680.84
879761136,696,138rs30914511618590.18480.472591.163751.155891.811201.021680.85
880007136,740,900rs68767011618600.18480.472591.163751.155891.811181.001690.85
880134136,765,951rs30913711598550.17480.482591.173541.105811.811201.031810.92

7_1868 series

indexfirstlengthsnpsallelesafrafxamreaseursassax
354170136,682,27493,624718682380.631130.962661.033751.005891.571190.881680.73

7_1868 series snps

indexposidnivallelesafrafxamreaseursassax
879698136,682,274rs192822118702380.631130.962661.033761.005891.571200.881680.73
879743136,691,825rs309164120262740.671190.942821.004291.056201.521280.871740.70
879775136,698,098rs309149120362740.671190.932821.004381.076201.521280.861750.70
879914136,721,603rs12615624120272740.671190.942821.004301.056201.521270.861750.70
879917136,721,995rs13404551118682380.631130.962671.033751.005881.571180.871690.74
880075136,755,684rs309134120432810.681200.942821.004381.076201.511270.861750.70
880178136,775,898rs2322818118462310.621130.982631.033590.975861.581210.901730.76

Lactase Persistence Chromosome Countries

This table shows the countries for the 765 lactase persistence chromosome samples. One interesting result is the high number of samples in the Columbia Medellin population. Both of the true African samples come from the Gambia Western Divisions population and seem almost certain to be the result of some kind of back migration. Both the African Caribbean Barbados, and the African Ancestry US South West populations that the thousand genome data considers part of the African region do include significant numbers of samples that express the lactase persistence series. That observation was the reason for creating an African External region for those two country populations.

In [5]:
cntx = plt1_obj.plot_context
HTML(cntx.get_country_html())
Out[5]:
popcntotp
ACB120.41
ASW201.07
BEB70.27
CDX00.00
CEU1384.56
CHB00.00
popcntotp
CHS00.00
CLM571.98
ESN00.00
FIN1153.80
GBR1224.39
popcntotp
GIH270.86
GWD20.06
IBS912.78
ITU120.39
JPT00.00
popcntotp
KHV00.00
LWK00.00
MSL00.00
MXL301.53
PEL180.69
popcntotp
PJL491.67
PUR411.29
STU60.19
TSI180.55
YRI00.00