Viral Host Range database

Contents:

  • Explore
  • Importing data into Viral Host Range database
  • Search
    • Quick tip
    • Operating the Search tool
    • How search is performed
      • Examples
    • Tuning search
      • Searching a sentence, i.e: preventing to split it by space
      • Mandatory word in the results
      • Preventing a word to be in the results
      • Searching in specific field
      • Combining options
    • Greek letters
    • Available filters
      • Search for
      • Life domain
      • Only published data
      • Only identified hosts
      • Only identified viruses
      • Filter by owner
  • What is a compatible file?
  • Identifying viruses and hosts
  • Permissions attached to a data source
  • History of data sources, and backup policy
  • API access to data
  • Administrator documentation
  • Glossary
Viral Host Range database
  • Search
  • Edit on GitLab

Search

To quickly search through the Viral Host Range database, users have two options: use the search bar, located at the top right corner of the website, or directly go to the “Search” page with the link in the header of the website.

Quick tip

When using the search bar, hit “Enter” to continue your search in the “Search” page.

Operating the Search tool

The search tool will try to find the words entered by the user in the totality of the VHRdb. Name of viruses, hosts as well as their identifiers can be searched. For data sources, name can be searched, as well as name of their owner/provider, their description, or their publication url. It is important to note that the search tool is not sensible to special characters or accents (e.g. users can search “herelle” to find “Félix d’Hérelle”).

In both the Search page and the search tool bar, results will appear organized by their nature: virus, host or data source. Only a limited number of results are presented. With the search tool bar, users can hit “Enter” to continue the search in the Search page which allows to see some more results. When in the Search page, users need to click “See all results” (next to section name “Virus”, “Host” or “Data source”) in order to see the totality of results available.

Users can click on any result of interest at any moment to visit its dedicated page.

How search is performed

When searching for the data source “Félix D’Hérelle collection of bacterial viruses”, one can type "herelle collection". As the two words might not be located one next to the other, the search engine will return all data sources matching either herelle or collection and sort the results based on the relevance of each data source. The relevance of a data source if based on the TF-IDF weighting schemes, the field-coverage of each term (percentage of letters of the field that are covered by the searched word), and the presence of valid identifier or publication.

Note that Viruses and hosts are also searched and sorted this way.

Examples

TF-IDF: When searching for "herelle collection", we search for two terms herelle and collection. There is more data sources associated to the term collection than the term herelle. The TF-IDF weighting schemes will give a higher score to herelle than to collection, in the end data sources containing the term herelle will have a higher score and be more relevant.

Field coverage of term: When searching for "T4", let us consider that only two entries match : T4(NC_000866.4) and Mt1B1_P10(MT496971). The name of the first virus is T4, thus the coverage of the term for the field name is 100%. The identifier of the second virus is MT496971, the coverage of the term for the field identifier is 2/8=25%. The score associated to T4(NC_000866.4) will be higher than Mt1B1_P10(MT496971) as T4 cover 100% of the name for the first virus while only 25% of the identifier of the second virus.

Preference to identified and published resources: When two results have the same score (i.e: the same relevance), the data source with a publication is preferred over the one without it. For viruses and hosts the preference is given to entries with valid NCBI/Hérelle identifier, again only when they have the same score of relevance.

Tuning search

Searching a sentence, i.e: preventing to split it by space

To prevent the search engine de split the searched text, one can put any part of the search text between double quote: "bacterial virus" response will not search for “bacterial or virus or response” but “bacterial virus or response”.

Single letter search are not permitted as supposed to be not relevant, nevertheless one can search for entries match a by putting it between double quote: "a"

Mandatory word in the results

When searching for bacterial virus response algae, if the word algae must be present is each found entry, prefix it with a +: bacterial virus response +algae.

Preventing a word to be in the results

When searching for bacterial virus response, if one wants to filter out results containing algae somewhere, one can prefix the word with - : bacterial virus response -algae.

Searching in specific field

Users can use prefixes in order to define where the query should be present.

For instance, when searching for T4, this word is searched in all fields, but you can search only for entries that have it in their name with NAME:T4, or their description DESC:T4.

Here are all the available prefix:

Prefix

Search in

Kind

ID:

NCBI ID, Hérelle ID, custom ID

Host, virus

NCBI:

NCBI nuccore ID

Host, virus

HER:

Félix d’Hérelle Reference Center ID

Host, virus

TAX:

Taxonomic ID

Host, virus

NAME:

entry name

Data source, host, virus

DESC:

description

Data source

Prov:

Provider if specified, owner otherwise

Data source

Provider:

Provider if specified, owner otherwise

Data source

owner:

Provider if specified, owner otherwise

Data source

Combining options

Tuning options can be combined: +"were isolated" chicken sewage -desc:ECOR will search for all entries containing were isolated, with either chicken or sewage, but never ECOR in the description.

Greek letters

In addition of searching for the string of characters entered by the user, the search tool also try to replace greek letter name by their greek letter. Note that the tool is not sensitive to the upper/lower case.

Here are some virus containing greek letter, and how you can search for :

Virus

Can be found with

λvir

lambdavir, Lambdavir, ΛVIR, lambda vir, vir lambda, λ vir

Φ10

phi10, φ10

Available filters

As explained before, the search tool has been designed to find the highest number of results possible. To limit the number of results and make them relevant, users can use multiple filters.

Search for

Users can search for only one kind of element, either virus, host or data source. Note that you can only select one at a time.

Life domain

Users can select in which life domain they are searching: bacteria, archaea or eukaryote. Note that you can only select one at a time.

Only published data

Users can restrict the search to data associated to a publication, directly for data sources, or through the data source for viruses and hosts.

Only identified hosts

Users can restrict the search to hosts possessing a valid NCBI identifier, data sources containing such hosts, and viruses documenting infection status of such hosts.

Only identified viruses

Users can restrict the search to viruses possessing a valid NCBI identifier, data sources containing such viruses, and hosts documenting infection status by such viruses.

Filter by owner

Users can narrow their search to data contributed by one or multiple data source owners, i.e, data sources but also viruses and hosts that can be found in such data sources.


© Copyright 2020, Bryan Brancotte.

Built with Sphinx using a theme provided by Read the Docs.