Descriptron – A Large Scale System for
Biomedical Data Integration
Daniel J McGoldrick Ph.D.
Center For Computational Pharmacology
University of
Descriptron’s
Relational Backend and Data Model:
Table
overview: Data Integation Components
The
Namespace Table – Specific Types, Access, Global Connectivity Constraints.
The
Relevancy Table – Type Groupings Relevant to Semantic Types.
The
Fact Tables - Identifiers and Fact Registration. Specific Connectivity
Constraints.
The
Map Tables – Method, Time of Linking, Source and linkages to Registered Facts.
The
Graph Table – Normalized Fact linkages referenced by Method.
Appendix
1. Registered Fact Types.
-
Scaleable for large systems.
-
Language independent (accessable to all database API’s)
Workflow
– a connected set of processors that combine to perform computational tasks.
mysql> show tables;
+-----------------+
| Tables_in_Dtron |
+-----------------+
| FlyFacts |
| FlyGraphs |
| FlyMaps |
| HumanFacts |
| HumanGraphs1 |
| HumanGraphs2 |
| HumanGraphs3 |
| HumanMaps |
| MouseFacts |
| MouseGraphs |
| MouseMaps |
| RatFacts |
| RatGraphs |
| RatMaps |
| WormFacts |
| WormGraphs |
| WormMaps |
| YeastFacts |
| YeastGraphs |
| YeastMaps |
| namespace |
+-----------------+
21 rows in set (0.00
sec)
mysql> describe namespace;
+------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-------------+------+-----+---------+-------+
| idtype | varchar(40) | YES | |
NULL | |
| typeddescription
| text | YES | |
NULL | |
| webmethod | varchar(10) | YES | |
NULL | |
| terminalp | char(3) | YES
| | NULL |
|
+------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00
sec)
mysql> select * from namespace order by idtype limit 5;
+-----------------------------------------+-------------------------------------------+------------+-----------+
| idtype |
typeddescription
| webmethod | terminalp |
+-----------------------------------------+-------------------------------------------+------------+-----------+
|
affymetrix_netaffyx_affyprobe_uid |
Affymetrix affyprobe
| wwwlnk0013 | nil |
|
affymetrix_netaffyx_chip_annot |
Affymetrix Chip
| wwwlnk9999 | t |
|
biobase_transfac_domain_annot |
BIOBASE transfac domain
| wwwlnk0014 | t |
|
biobase_transfac_domain_uid |
BIOBASE transfac domain
| wwwlnk0014 | t |
| chs-fitzimmons_vh-dissector_anatomy_uid | Center for Human
Simulation Anatomy model | wwwlnk9999 | t |
…
+-----------------------------------------+-------------------------------------------+------------+-----------+
5
rows in set (0.00 sec)
+---------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------+------+-----+---------+-------+
| concept
| text | YES | | NULL
| |
| idtypes
| text | YES | | NULL
| |
+---------+------+------+-----+---------+-------+
| genesymbol | (hugo_hgnc_officialsymbol_uid
hugo_hgnc_aliassymbol_annot ncbi_ll_aliasprotien_annot
ncbi_ll_aliassymbol_annot ncbi_ll_officialsymbol_annot
ncbi_ll_preferredsymbol_annot stanford_spd_officialsymbol_annot |
| genename | (hugo_hgnc_officialname_annot
um_bbd_enzyme_annot ncbi_ll_aliasname_annot ncbi_ll_officialname_annot
ncbi_ll_aliasname_annot ncbi_ll_preferredname_annot
stanford_sgd_genename_annot) |
| gene | (affymetrix_netaffyx_affyprobe_uid
ebi-sib_trembl-sp_p_acc ebi-sib_trembl-sp_p_uid ebi_trembl_p_acc
ebi_trembl_p_uid gu_pir_p_uid hugo_hgnc_officialsymbol_uid jax_mgd_gene_uid
ncbi_genbank_p_acc ncbi_genbank_p_uid ncbi_ll_aliasprotien_annot
ncbi_ll_aliassymbol_annot ncbi_ll_locus_uid ncbi_ll_officialsymbol_annot
ncbi_ll_preferredsymbol_annot ncbi_refseq_np_uid ncbi_refseq_xm_acc ncbi_refseq_xm_uid
ncbi_refseq_xp_acc ncbi_refseq_xp_uid
…
| officialsymbol
| (hugo_hgnc_officialsymbol_uid ncbi_ll_officialsymbol_annot
stanford_spd_officialsymbol_annot ut-ca_bind_officialsymbol_annot)
mysql> describe HumanFacts;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| nuid | varchar(30) | YES | |
NULL | |
| idvalue | text | YES
| | NULL |
|
| idtype | varchar(50) | YES | |
NULL | |
| authority | varchar(30) | YES | |
NULL | |
| deprecated | char(3) | YES | |
NULL | |
| tstamp | bigint(20) | YES
| | NULL |
|
| terminalp | char(3) | YES
| | NULL |
|
+---------------------------------------------------------+
+--------------+--------------------+--------------------------------+------------+------------+------------+-----------+
| nuid |
idvalue | idtype | authority | deprecated | tstamp | terminalp |
+--------------+--------------------+--------------------------------+------------+------------+------------+-----------+
| nuid-2072786 | 7340969 | ncbi_genbank_n_uid | NCBI-LL | NULL
| 0 | nil |
| nuid-2215370 | 22761563 | ncbi_genbank_n_uid | NCBI-LL | NULL
| 0 | nil |
|
nuid-767613 |
Q8N9J3 |
sib_swissprot_p_acc |
AFFYMETRIX | NULL | 0 | nil |
| nuid-2384655 | FLJ13556 | ncbi_ll_aliassymbol_annot | NCBI-LL | NULL
| 0 | nil |
| nuid-2544940 | AAH09326 | ncbi_genbank_p_acc | NCBI-LL | NULL
| 0 | nil |
| nuid-4428303 | 51475048
|
ncbi_genbank_contig_uid | NCBI-LL
| nil | 3324241944 | t |
| nuid-4428310 | 3610142
|
ncbi_pubmed_literature_uid |
NCBI-LL | nil
| 3324228940 | t |
…
7 rows in set (0.01
sec)
mysql> describe HumanMaps;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| muid | varchar(30) | YES | |
NULL | |
| source
| text | YES | |
NULL | |
| nuid | varchar(30) | YES | |
NULL | |
| linker
| varchar(50) | YES | | NULL
| |
| tstamp
| bigint(20) | YES | |
NULL | |
+--------+-------------+------+-----+---------+-------+
5 rows in set (0.01
sec)
+--------------+-------------+------+---------------------------+-----------+
| muid | source | nuid | linker | tstamp |
+--------------+-------------+------+---------------------------+-----------+
| muid-2118263
| 4572462 | nil | ncbi-ll-tmpl-parse-Hs | 599223651 |
| muid-1255565
| 71285_at | nil | affymetrix-Hs-annot-parse | 535646377 |
| muid-1285564
| Hs.368007 | nil | affymetrix-Hs-annot-parse | 535801118 |
| muid-1585604
| BAB14925 | nil | ncbi-ll-tmpl-parse-Hs | 597552731 |
| muid-1825598
| 1524068 | nil | ncbi-ll-tmpl-parse-Hs | 598140408 |
| muid-2718208
| NG_002676 | nil | ncbi-ll-tmpl-parse-Hs | 600301027 |
| muid-1224916
| 62274_at | nil | affymetrix-Hs-annot-parse | 535486389 |
| muid-1284913
| AA174142 | nil | affymetrix-Hs-annot-parse | 535798748 |
…
+--------------+-------------+------+---------------------------+-----------+
mysql> describe HumanGraphs1;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| muid | varchar(30) | YES | |
NULL | |
| nuid | varchar(30) | YES | |
NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00
sec)
mysql> select * from HumanGraphs1 limit 15;
+--------------+--------------+
| muid | nuid |
+--------------+--------------+
| muid-2636134
| nuid-2636080 |
| muid-2636134
| nuid-2636081 |
| muid-2636134
| nuid-2636082 |
| muid-2636134
| nuid-2636083 |
| muid-2636134
| nuid-2636084 |
| muid-2636134
| nuid-2636085 |
| muid-2636134
| nuid-2636086 |
| muid-2636134
| nuid-2636087 |
| muid-2636134
| nuid-2636088 |
| muid-2636134
| nuid-2636089 |
+--------------+--------------+
15 rows in set (0.03
sec)
+-----------------------------------------+------------------------------------------
| idtype |
typeddescription |
+-----------------------------------------+-------------------------------------------
|
affymetrix_netaffyx_affyprobe_uid |
Affymetrix affyprobe |
|
affymetrix_netaffyx_chip_annot |
Affymetrix Chip |
|
biobase_transfac_domain_annot |
BIOBASE transfac domain |
|
biobase_transfac_domain_uid |
BIOBASE transfac domain |
| chs-fitzimmons_vh-dissector_anatomy_uid | Center for Human
Simulation Anatomy model|
| doe-mbi-ucla_dip_pxp_uid | DOE-MBI-UCLA DIP protein
interaction |
| doe-mbi-ucla_dip_p_uid | DOE-MBI-UCLA DIP protein id
|
| ebi-sib_trembl-sp_p_acc | EBI-SIB trembl-sp protein |
| ebi-sib_trembl-sp_p_uid | EBI-SIB trembl-sp protein |
|
ebi_interpro_domain_uid |
EBI domain |
|
ebi_interpro_pfam_uid |
EBI interpro protein family |
|
ebi_interpro_p_uid |
EBI interpro protein |
|
ebi_trembl_p_acc |
EBI trembl protein |
|
ebi_trembl_p_uid |
EBI trembl protein |
|
embl_hssp_structure_uid |
EMBL homology-structure id |
|
embl_smart_pfam_uid |
EBI smart protein family |
|
flybase_flybase_gene_uid |
Flybase gene |
|
germonline_germonline_pathway_uid |
UofBasel-ch germonline pathway |
| goc_go-bp_concept_uid | GO consortium biological
process uid |
| goc_go-bp_term_annot | GO consortium go
biological process term|
| goc_go-cc_concept_uid | GO consortium cellular
component uid |
| goc_go-cc_term_annot | GO consortium go cellular
component term|
| goc_go-mf_concept_uid | GO consortium molecular
function uid |
| goc_go-mf_term_annot | GO consortium go
molecular function term|
|
gu_pirsf_pfam_uid |
UofGeorgetown protein super-family |
|
gu_pir_p_uid |
UofGeorgetown protein |
|
hugo_hgnc_aliassymbol_annot |
HUGO HGNC alias symbol |
|
hugo_hgnc_genbank_acc |
HUGO HGNC genbank accession -mixed |
|
hugo_hgnc_officialname_annot |
HUGO HGNC official gene name |
|
hugo_hgnc_officialsymbol_uid |
HUGO HGNC official symbol |
|
incyte_ypd_p_uid | Incyte yeast protein
database |
| inra-fr_prodom_domain_uid | Prodom domain |
|
iubmb_ec_p_acc |
IUBMB EC protein |
|
jax_mgd_gene_uid | MGD gene id |
|
jax_mgd_phenotype_uid |
Jax MGD phenotype |
|
jax_mgi_marker_uid |
Jax MGI genetic marker |
|
jhu_gdb_orf_uid |
Johns Hopkins GDB orf |
|
ku_kegg_ligand_annot |
UofKyoto ligand |
|
ku_kegg_ligand_uid |
UofKyoto ligand |
|
ku_kegg_pathway_annot |
UofKyoto KEGG pathway |
|
ku_kegg_pathway_uid |
UofKyoto KEGG pathway |
|
mips_mips_pxp_uid |
Munich MIPS protein interaction |
|
ncbi_cdd_domain_acc |
NCBI-conserved-domain |
|
ncbi_cdd_domain_annot |
NCBI-conserved-domain |
|
ncbi_cog_pfam_acc |
NCBI-COG protein family |
|
ncbi_genbank_contig_acc |
NCBI-genbank contig |
|
ncbi_genbank_contig_uid |
NCBI-genbank contig |
|
ncbi_genbank_n_acc |
NCBI-genbank nucleotide |
|
ncbi_genbank_n_uid |
NCBI-genbank nucleotide |
|
ncbi_genbank_p_acc |
NCBI-genbank protein |
|
ncbi_genbank_p_uid |
NCBI-genbank protein |
|
ncbi_gene_gene_uid |
NCBI Entrezgene gene uid |
|
ncbi_grif_function_annot |
NCBI-LocusLink grif |
|
ncbi_ll_aliasname_annot |
NCBI-LocusLink name alias |
|
ncbi_ll_aliasprotien_annot |
NCBI-LocusLink protein alias |
|
ncbi_ll_aliassymbol_annot |
NCBI-LocusLink symbol alias |
|
ncbi_ll_go_annot |
NCBI-LocusLink gene ontology definition|
|
ncbi_ll_grif_annot |
NCBI-LocusLink gene reference into function|
| ncbi_ll_loc-map_annot | NCBI-LocusLink map
location |
|
ncbi_ll_locus_uid |
NCBI-LocusLink locus |
|
ncbi_ll_officialname_annot |
NCBI-LocusLink official name |
|
ncbi_ll_officialsymbol_annot |
NCBI-LocusLink official symbol |
|
ncbi_ll_organism_annot |
NCBI-LocusLink organism |
|
ncbi_ll_phenotype_annot |
NCBI-LocusLink phenotype |
|
ncbi_ll_phenotype_uid |
NCBI-LocusLink phenotype |
|
ncbi_ll_preferredname_annot |
NCBI-LocusLink preferred name |
|
ncbi_ll_preferredsymbol_annot |
NCBI-LocusLink preferred symbol |
|
ncbi_ll_summary_annot |
NCBI-LocusLink summary |
|
ncbi_mmdb_structure_annot |
NCBI-MMDB structure |
|
ncbi_mmdb_structure_uid |
NCBI-MMDB structure id |
|
ncbi_omim_gene_uid |
NCBI-OMIM |
|
ncbi_pubmed_literature_uid |
NCBI-pubmed literature |
|
ncbi_refseq_nc_acc |
NCBI-refseq complete genomic |
|
ncbi_refseq_nc_uid |
NCBI-refseq complete genomic |
|
ncbi_refseq_ng_acc |
NCBI-refseq genomic |
|
ncbi_refseq_ng_uid |
NCBI-refseq genomic |
|
ncbi_refseq_nm_acc |
NCBI-refseq mRNA |
|
ncbi_refseq_nm_uid |
NCBI-refseq mRNA |
|
ncbi_refseq_np_acc |
NCBI-refseq protein |
|
ncbi_refseq_np_uid |
NCBI-refseq protein |
|
ncbi_refseq_xm_acc |
NCBI-refseq model mRNA |
|
ncbi_refseq_xm_uid |
NCBI-refseq model mRNA |
| ncbi_refseq_xp_acc | NCBI-refseq model
protein |
|
ncbi_refseq_xp_uid |
NCBI-refseq model protein |
|
ncbi_refseq_xr_acc |
NCBI-refseq non-coding RNA |
| ncbi_refseq_xr_uid | NCBI-refseq non-coding
RNA |
|
ncbi_taxonomy_taxon_uid |
NCBI-taxonomy taxon |
|
ncbi_ug_orf_uid |
NCBI-unigene orf |
| nlm_medline_literature_uid | NLM medline reference |
|
null_null_null_null |
null tag |
| pasteur-fr_candida_gene_uid | Pasteur-fr candida gene |
|
sanger_pfam_pfam_uid | Sanger pfam protein family |
|
sanger_wormpep_p_uid |
Sanger inst. protein id |
|
sib_prosite_domain_acc |
SIB prosite domain acc |
|
sib_prosite_domain_uid | SIB prosite domain id |
|
sib_swissprot_p_acc |
Swissprot protein acc |
|
sib_swissprot_p_uid |
Swissprot protein id |
| sri-carnegie_metacyc_pathway_acc | Stanford-Carnegie metacyc pathway |
| sri-carnegie_metacyc_pathway_annot | Stanford-Carnegie metacyc pathway
description|
|
stanford_sgd_genename_annot |
Stanford SGD genename |
|
stanford_sgd_gene_uid | Stanford SPD id |
|
stanford_sgd_literature_uid |
Stanford SGD literature |
|
stanford_sgd_orf_uid |
Stanford SGD orf |
|
stanford_sgd_phenotype_annot |
Stanford SGD phenotype |
|
stanford_sgd_summary_annot |
Stanford SGD description |
|
stanford_smd_expt_uid |
Stanford microarray experiment |
|
stanford_spd_aliassymbol_annot |
Stanford SPD alias gene symbol |
|
stanford_spd_genesymbol_annot |
Stanford SPD gene symbol |
|
stanford_spd_officialsymbol_annot |
Stanford SPD official gene symbol |
|
tigr_cmr_microbial_uid | TIGR comprehensive microbial |
|
tigr_egad_m_uid |
TIGR EGAD rna |
|
tigr_tgi_orf_uid |
TIGR orf |
|
tigr_tigrfams_pfam_uid |
TIGR pfam |
| ucsd_tc-db_p_uid | UCSD tc-db protein |
| uf-de_euroscarf_strain_uid | UofFrankfurt-de EUROSCARF
strain |
| umber-uk_prints_domain_uid | UofManchester domain id |
|
umn_cgc_nomenclature_genename_annot |
|
umn_cgc_nomenclature_genesymbol_annot |
|
um_bbd_enzyme_annot | UofMichigan biodegredation enzyme |
|
um_bbd_enzyme_uid |
UofMichigan biodegredation enzyme |
|
um_bbd_pathway_annot |
UofMichigan biodegredation pathway |
|
uniprot_swissprot_genesymbol_annot |
Uniprot_swissprot_genesymbol |
|
uniprot_swissprot_keyword_annot |
Uniprot_swissprot keyword |
|
uniprot_swissprot_litdates_annot |
Uniprot citation dates |
|
uniprot_swissprot_plength_annot | Uniprot protein length |
|
uniprot_swissprot_pmw_annot |
Uniprot molecular weight |
|
uniprot_swissprot_proteinname_annot |
Uniprot_swissprot protein name |
|
uniprot_swissprot_p_acc | Uniprot_swissprot protein |
|
uniprot_swissprot_p_seq |
Uniprot_swissprot protein sequence |
|
uniprot_swissprot_seqheader_annot |
Uniprot_swissprot sequence header |
|
uniprot_trembl_genesymbol_annot | Uniprot_trembl_genesymbol |
|
uniprot_trembl_go_uid |
GO term id |
|
uniprot_trembl_keyword_annot |
Uniprot_trembl keyword |
|
uniprot_trembl_litdates_annot |
Uniprot citation dates |
|
uniprot_trembl_plength_annot |
Uniprot protein length |
|
uniprot_trembl_pmw_annot |
Uniprot molecular weight |
|
uniprot_trembl_proteinname_annot |
Uniprot_trembl protein name |
|
uniprot_trembl_p_acc |
Uniprot_trembl protein |
|
uniprot_trembl_p_seq |
Uniprot_trembl protein sequence |
|
uniprot_trembl_seqheader_annot |
Uniprot_trembl sequence header |
|
UofW_foundation-anatomy_anatomy_uid |
UofW Foundation model of Anatomy id |
| ut-ca_bind_officialsymbol_annot |
| ut-ca_bind_pxp_uid |
| whitehead-mit_human-snp-db_marker_uid | Mit-whitehead Human SNPs |
| whitehead-mit_mouse-snp-db_marker_uid | Mit-whitehead mouse SNPs |
| wusm-sanger_wormbase_genename_annot | Sanger-WUSM C.elegans coding gene name |
| wusm-sanger_wormbase_genesymbol_annot | Sanger-WUSM C.elegans gene symbol |
| wusm-sanger_wormbase_orf_uid | Sanger-WUSM C.elegans coding
sequence [CDS] id|
| wusm-sanger_wormbase_p_acc | Sanger-WUSM C.elegans peptide
accession|
| wusm-sanger_wormbase_p_pid | Sanger-WUSM C.elegans peptide
id |
+-----------------------------------------+----------------------------------------|
150
rows in set (0.00 sec)
Descriptron Has a Web
Services Interface.
Demo