The NCBI (National Center for Biotechnology Information) compiles the NR database as a protein database for Blast searches. Non-identical sequences from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF are included. NR's merits are its comprehensiveness and frequency of updates.
The Molecular Modeling Database connects protein structures to sequence data (MMDB). NCBI provides strong retrieval and search tools, including as Entrez and BLAST, to access this data. NCBI also provides a variety of computational services to assist in the examination of various types of data. These include: DNA and RNA sequencing projects that generate large amounts of data that need to be analyzed and compared with other sequences; structural modeling projects that use computer programs to predict how molecules such as proteins will fold into three-dimensional shapes; and phylogenetic analysis projects that determine the relationships among different species based on molecular similarities.
In addition to these automated resources, MMDB is accessible through direct contact with scientists who are willing to share data. More than 18,000 scientists from over 900 institutions have registered with NCBI to receive email alerts for new data in MMDB.
Scientists can submit data to NCBI by completing a short data submission form online at makemdetailspublic.com. If they choose, they can also send their data directly to MMDB by emailing it to [email protected] Note that authors must be aware of any copyright restrictions when submitting data.
Data submitted to MMDB can be made public immediately or kept private until a scientist decides to make it available.
The National Center for Biotechnology Information (NCBI) offers a comprehensive set of online biological information and data resources, including the GenBank(r) nucleic acid sequence database and the PubMed database of citations and abstracts for published life scientific publications. These resources are accessible through a web browser as well as a variety of other software applications.
GenBank is the world's largest bio-library. It contains over 10 million sequences from organisms across the tree of life, including humans. The database is maintained by the National Library of Medicine (NLM). Funding for GenBank comes from the National Science Foundation, the U.S. Department of Energy, and various private sources.
PubMed is a free search engine that indexes the results of biomedical research articles. There are three main types of records in PubMed: articles, reviews, and patents. Articles are original reports of studies conducted on human subjects or animals. Reviews summarize previous studies on a similar topic. Patents report the results of patent applications filed by universities, companies, and individuals.
PubMed articles can be accessed by anyone with an internet connection. The only requirement is that you must have a web browser open to the PubMed site. Some features of the website require a subscription. For example, you cannot view abstracts or download full-text articles unless you pay a fee.
The Protein database stores the text record for individual protein sequences collected from a variety of sources, including the NCBI Reference Sequence (RefSeq) project, GenBank, PDB, and UniProtKB/SWISS-Prot. Protein records are available in a variety of forms, including FASTA and XML, and are connected to further NCBI resources. The Nucleotide database contains nucleotide sequence data obtained from genomic libraries or direct sequencing projects. It includes raw sequence data as well as annotated sequences such as genes and transcripts.
The Gene database captures information about genes and their products. Each gene entry includes information about the gene, its variants, and orthologs. Data included in this database help identify the function of proteins and predict how they might interact with other molecules. The Gene Expression database provides information about the expression patterns of genes across different tissues, stages of development, and under different conditions such as disease or drug treatment. This information can be used to understand what parts of the genome are active inside cells and provide clues about what these genes do.
The Taxonomy database captures information about living organisms classified into groups called taxa. Organisms are grouped together by shared characteristics such as common ancestors or genetic similarities. New groups are created when scientists find new relationships between existing groups. Old groups may be split into multiple new groups if more evidence becomes available to support these divisions.