P-substitution than o-substitution 5) The incoming group replaces a hydrogen, it will not usually displace a substituent already in place (in the case of electrophilic aromatic substitution) Directing effect if more than one substituent is present.
Chapter 8 Notes Organic Chemistry IDr.
Contents.Take a look at what was the last time I taught the course. Final paup blockThe fourth (paup) block comprises an lset command that specifies the likelihood settings. The nst option specifies the number of substitution parameters, which is 1 for the JC model, and basefreq=equal specifies that base frequencies are assumed to be equal. Together, nst=1 and basefreq=equal specify the JC model because the only other model with one substitution parameter is the F81 model (which has unequal base frequencies).The command lscores 1 tells PAUP. to compute likelihood scores for the first tree in memory (which is the one we entered in this file).
The keyword userbrlen tells PAUP. to use the branch lengths in the tree description (i.e. Don't estimate branch lengths), and the sitelike keyword tells PAUP. to output the individual site likelihoods (the default behavior is to just output the overall likelihood).Ok, go ahead and execute the file in PAUP. Make sure that you understand the output.
Can you think of a way to get PAUP. to give you the likelihood for a particular site and ancestral character state reconstruction? Return PAUP. to its factory default settingsIn part A, we told PAUP. to use user-defined branch lengths and output site likelihoods whenever the lscores command was issued. PAUP. remembers these settings, and sometimes this causes unexpected results.
You can cause PAUP. to forget these changes to default settings in one of two ways:.restart PAUP.use the factory commandIssue the factory command now to cause PAUP. to revert to its factory default settings without having to quit and restart the program. (Re)create the data file algae.nexDownload (again) the data file.
You may remember that we found last week that only one model (LogDet) gave us the accepted phylogeny for these data using various distance-based approaches (i.e. That the chlorophyll-a/b-containing plastids group together, excluding the cyanobacterium Anacystis and the chromophyte chlorophyll-a/c-containing Olithodiscus).This week we will see if we can tease apart which aspects of sequence evolution that are important for getting the tree correct. Obtain the maximum likelihood tree under the F81 modelThe first goal is to learn how to obtain maximum likelihood estimates of the parameters in several different substitution models. Use PAUP. to answer the following questions. Start by obtaining the maximum likelihood tree under the F81 model.
Create a run.nex file and save in it the following:#nexusbegin paup;execute algae.nex;set criterion=likelihood;lset nst=1 basefreq=empirical;hsearch;end;The nst=1 tells PAUP. that we want a model having just one substitution rate parameter (the JC69 and F81 models both fall in this category).
The basefreq=empirical tells PAUP. that we want to use simple estimates of the base frequencies. The empirical frequency of the base G, for example, is the value you would get if you simply counted up all the Gs in your entire data matrix and divided by the total number of nucleotides. The empirical frequencies are not usually the same as the maximum likelihood estimates (MLEs) of the base frequencies, but they are quick to calculate and often very close to the corresponding MLEs.Execute run.nex in PAUP.
and issue the following command to show the tree:showtrees;One problem is that the tree drawn in such a way that it appears to be rooted within the flowering plants (tobacco and rice). Specifying the cyanobacterium Anacystis as the outgroup makes more sense:outgroup Anacystisnidulans;showtrees;Note that the branches are not drawn proportional to the expected number of substitutions. To fix this, use the describetrees command rather than the simpler showtrees command:descr 1 / plot=phylogram;As with all PAUP. commands, it is usually not necessary to type the entire command name, only enough letters that PAUP. can determined unambiguously which command you want. Here, you typed descr instead of describetrees, and it worked just fine.Note that you will work with this tree for quite awhile. Resist the temptation to do heuristic searches with each model, as it will be important to compare the performance of all of the models on the same tree topology!
To be safe, save this tree to a file named f81.tre using the savetrees command:savetrees file=f81.tre brlens;If you ever need to read this tree back in, use the gettrees command:gettrees file=f81.tre;Now get PAUP. to show you the maximum likelihood estimates for the parameters of the F81 model used in this analysis (the 1 here refers to tree 1 in memory):lscores 1;. What are the empirical base frequencies for this data set? Answer.
What is the lnL of this tree under this 'empirical base frequencies' version of the F81 model? Answer. What proportion of sites are constant? (The cstatus command will give you this information) answer. Estimate base frequenciesNow estimate the base frequencies on this tree with maximum likelihood as follows. Note how the lscores command is used to force PAUP. to recompute the likelihood (under the revised model) and spit out the parameter estimates.lset basefreq=estimate;lscores 1;.
What are the maximum likelihood estimates (MLEs) of the base frequencies? Answer. What is the lnL of this tree under the 'estimated base frequencies' version of the F81 model? Answer. How many parameters are being estimated using the F81 model?
Answer. Is it better than the lnL under the 'empirical base frequencies' version of the F81 model?
Estimate transition/transversion biasSwitch to the HKY85 model now and estimate the transition/transversion ratio along with the base frequencies. The way you specify the HKY model in PAUP.
is to tell it you want a model with 2 substitution rate parameters ( nst=2), and that you want to estimate the base frequencies ( basefreq=estimate) and the transition/transversion ratio ( tratio=estimated). Note that these specifications also apply to the F84 model, so if you want PAUP. to use the F84 model, you would need to add variant=f84 to the lset command.lset nst=2 basefreq=estimate tratio=estimate;lscores 1;. What is the MLE of the transition/transversion ratio under the HKY85 model?
Answer. What is the MLE of the transition/transversion rate ratio under the HKY85 model? Answer. What is the lnL of this tree under the HKY85 model? Answer. How many parameters are being estimated using the HKY85 model?
Answer. Does the HKY model fit the data better than the F81 model? Estimate the proportion of invariable sitesNow ask PAUP.
to estimate pinvar, the proportion of invariant sites, using the command lset pinvar=estimate. The HKY85 model with among-site rate heterogeneity modeled using the two-category invariant sites approach is called the HKY85+I model. What is the MLE of pinvar under the HKY85+I model? Answer. Is the MLE of pinvar larger or smaller than the proportion of constant sites? Answer.
Why are these two proportions different? That is, how can a site be constant but not invariant?. What is the lnL of this tree under the HKY85+I model? Answer. How many parameters are being estimated using the HKY85+I model? Estimate the heterogeneity in rates among sitesNow set pinvar=0 and tell PAUP.
to use the discrete gamma distribution with 5 rate categories. Here are the commands for doing this all in one step:lset pinvar=0 rates=gamma ncat=5 shape=estimate;lscores 1;The HKY85 model with among-site rate heterogeneity modeled using the discrete gamma approach is called the HKY85+G model. What is the MLE of the gamma shape parameter under the HKY85+G model? Answer. What is the lnL of this tree under the HKY85+G model? Answer.
How many parameters are being estimated using the HKY85+G model? Estimate both pinvar and the gamma shape parameterNow issue the command lset pinvar=estimate to create the HKY85+I+G model and ask PAUP. to estimate both pinvar and the gamma shape parameter simultaneously. What is the MLE of the gamma shape parameter under the HKY85+I+G model? Answer.
What is the MLE of the pinvar parameter under the HKY85+I+G model? Answer. Is the MLE of the shape parameter higher or lower under the HKY85+I+G model compared to the HKY85+G model? Answer Explain why this is so. What is the lnL of this tree under the HKY85+I+G model?
Answer. How many parameters are being estimated using the HKY85+I+G model? A challengeThe data file was generated under one of the following models: JC69, F81, K80, or HKY85.All of the sites either evolved at the same rate, or rate heterogeneity was added in the form of gamma distributed relative rates with or without some invariant sites. Can you identify which of the four basic models was used, and in addition tell me how much rate heterogeneity was added?Hint: start by getting a NJ tree and estimating all parameters of the most complex model (HKY85+I+G) on that tree. You should be able to tell by examining the parameter estimates which model was used.