| |
|
- Introduction
-
- Molecular Biology Databases
- Database Search and Sequence Alignment
-
- PyCogent
-
- Please read Knight 2007
- Please additionally review the PyCogent project page and
in particular look at the Application
Controller Framework documentation.
- I also encourage you to
download and install the PyCogent toolkit from the
developer site (so you have access to the latest code) before the
lecture, and begin experimenting.
- Use case 1: tol_example.py
This illustrates how to apply pycogent to evaluate the idea that life
on Earth clusters into three related domains, which are detectable by
distances between their rRNA sequences. Using sequences collections
derived from the Silva database (filtered with cd-hit-est so the max
pairwise identity between any two sequences is 97%), I randomly select
sequences, build a tree, and then visualize the tree. Note that you'll
need muscle, fasttree, and matplotlib installed to run this example.
-
Use case 2: applying_an_existing_appc.txt
This illustrates three different ways to apply the RDP classifier
application controller: via the RDPController object, via the
assign_taxonomy convenience function, and via the command line
interface to the python module.
-
Use case 3: defining_a_new_appc.txt,
minimal_formatdb.py
This discusses how to define a new application controller class to
wrap the formatdb program packages with NCBI's blast package. (I had
Blast-2.2.20 install when we ran through this in class.) As I
mentioned, I frequently use formatdb to create temporary blast
databases which I to clean up after using. So, I designed this new app
controller for class and will be added an extended version, including
some convenience functions, to PyCogent soon. The code I wrote in
class is attached as minimal_formatdb.py
- Multiple Sequence Alignment
-
- Edgar & Batzoglou, Multiple Sequence Alignment, Current Opinion
in Structural Biology 16(3):368-373, June 2006
- Thompson,
et al, CLUSTAL W: Improving the Sensitivity of Progressive Multiple
Sequence Alignment Through Sequence Weighting, Position-Specific Gap
Penalties and Weight Matrix Choice.
Nucleic Acids Res. 1994 November 11; 22(22): 4673
- The ClustalW2 FAQ
-
Wrabi & Grishin, Gaps in structurally similar proteins: Towards
improvement of multiple sequence alignment Proteins: Structure, Function, and Bioinformatics 2003 54(1): 71 - 87
- Sauder, et al, Large-scale
comparison of protein sequence alignment algorithms with structure
alignments.Proteins. 2000 Jul 1;40(1):6-22.
- Research plans and reviews of same
-
- Hidden Markov Models
-
- Reviewing Research Proposals
-
- Sequence Assembly
-
- Computational Phylogeny
-
- Protein Protein
Interactions and networks
-
- Reporting on
your research
-
- Protein Structure Prediction
-
- Reviewing Research Manuscripts
-
- Mechanics, Dynamics & Docking
-
- Genetic Analysis
-
- Short and opinionated overview
of linkage and association analysis by Robert Elston, one of
the founders of the field. Genetic Epidemiology
15(6):565-576, 1998. This was the 1997 International Genetic
Epidemiology Society Presidential Address [NB: only works from
UCHSC IP address.]
- The slides and notes, from
the first lecture in Gil McVean's outstanding course on population genetics
at Oxford (slides from 2004, notes from 2002 version). I strongly recommend reviewing the rest of the notes and slides from the population
genetics course.
Kent Holsinger also has produced a good set of lecture notes in
population genetics (the PDFs have amusing footnotes missing in the HTML versions). The Holsinger notes are from an undergraduate course, and are easier to follow if you've never had any population genetics before.
- Slides from Terry Speed's outstanding 2007 ISMB keynote
- Lin and Zou, Assessing genomewide statistical significance in linkage studies. Genet Epidemiol. 2004 Nov;27(3):202-14.
- Marchini, et al, Genome-wide strategies for detecting multiple loci that influence complex diseases, Nature Genetics 37:413 - 417 (2005)
- The Introduction to the 14th Genetic Analysis Workshop. You can review the entire collection of associated papers for ones of particular interest to you. GAW 15 happened in 2006, but the papers aren't published yet; some results are available.
- Mailund, et al, Whole genome association mapping by incompatibilities and local perfect phylogeniesBMC Bioinformatics 2006, 7:454
- Wang, et al., Genome-wide Association Studies: Theoretical and Practical Concerns
Nature Reviews Genetics 6, 109-118 (2005)
- The International HapMap Consortium, A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-861. 2007.
- Barrett, et al, Haploview: analysis and visualization of LD and haplotype maps Bioinformatics 21:2(263-265), 2005.
- Optional:
- Presenting research
-
- Biomedical Language Processing
-
- Altman et al, Text mining
for biology - the way forward: opinions from leading
scientists. Genome Biology 2008, 9(Suppl 2):S7.
- Renear and Palmer, Strategic Reading, Ontologies, and the
Future of Scientific Publishing Science, Vol. 325, no. 5942, 14 August
2009.
- Shotton, Portwin, Klyne, and Miles. Adventures in Semantic
Publishing: Exemplar Semantic Enhancements of a Research Article PLoS
Computational Biology, Vol. 5, no. 4, April 2009.
|
|
|