1. (6pts) You have a newly sequenced gene fragment that you would like to know something about. Run a BLAST search using the default parameters on the following protein sequence:

    SSNVSLRYNALEAVSRPGLGEPSSLDWYNEQF

    1. What did you find?
    2. What could you do to get more useful results? Describe what you did and list the protein, organism and score for the highest scoring hit. c.) In general, is there anything you might learn from lower significance hits (an E value of say 5)?

  2. (6pts) A collaborator has sent you a puzzling nucleotide sequence fragment:

    ctgtttgcagctgctggtgtccagtacaatgaccgcaggatcgaaacatcagaatggagcaacatgcgaa gcaagatgccatgttccatgatgccaatgctggatattgacaacagacatcaaattccccagactatggc tattgctaggtacctggccagagaatttggtttccatggcaagaacaacatggagatggccagagttgaa tacatctcagactgtttctatgacattttggcccggataagccgaaccgatttcgtgcttgatacgttct gcggtggcttcaccgatcagagaaccgtaattacgacgcacatagttgatgatagcttcgtcgaaacggt caccaccaatgatgactacttgaggatgtaccaagatgataactgtagaatgatgtttcagagatctggt

    What is this, and what is strange about it? (Hint: Look at the graphical overview of the results. Mouse-over the different bars to see what they represent.) Propose an explanation for it. (Another hint: You may want to search against a nucleotide database as well as a protein database using a translated query. Also, search the full non-redundant databases; don't limit the search to a single organism since you don't know what it is.)

  3. (3pts) Compare p value and BLAST E value. Which is dependent on the size of the database?

  4. (5tps) Proteins A, B and C have identical similarity scores [Sim(A,B)=Sim(A,C)=Sim(B,C)] high enough to be considered good evidence of homology. Proteins A and B have different functional annotations. What can we infer about the function of protein C? What's the point of this question?

  5. (2pts) In the NCBI BLAST education pages, the nr nucleotide database is described as: All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences). What do EST, STS, GSS and HTGS stand for?

  6. (5pts) Briefly explain the concept of dynamic programming and how it leads to fast solutions. Any disadvantages or limitations?

  7. (15pts) Given the following scoring matrix and gap penalties, align the two strings:

    DAACBCBACBA
    DCABCDCBACABA

    Gap Opening: -3
    Gap Extension: -1

     
    	A	B	C	D
    A	4	-2	-5	-1
    B	-2	2	-1	-2
    C	-5	-1	1	-4
    D	-1	-2	-4	3
    

    Show your matrix and the final global alignment.

  8. (16pts) Run a BLAST search and a WU-BLAST2 search to identify this protein domain: (WU-BLAST2 can be found here: http://blast.wustl.edu/)

    TLRLCLKRISPDAELVAFGSLESGLALKNSDMDLCVLMDSRVQSDTIALQ

    1. List any deviations from the default parameters and explain briefly why you used those options.
    2. Look over the summaries for the first several entries returned. Generally, what does this class of proteins do?
    3. What organism is this specific protein from?
    4. What does this specific protein do, and what is the result of its absence (knockout) OR over-expression in this organism? [You don't need to answer for both KO and over-expression, describing one or the other is sufficient.]
    5. List a PubMed ID for a reference providing evidence for this function. (Not the reference for the publication of the complete genome of this organism.)

   
         
Course home page | Computational Bioscience Program home page | Professor Hunter's home page