Search
To quickly search through the Viral Host Range database, users have two options: use the search bar, located at the top right corner of the website, or directly go to the “Search” page with the link in the header of the website.
Quick tip
When using the search bar, hit “Enter” to continue your search in the “Search” page.
Operating the Search tool
The search tool will try to find the words entered by the user in the totality of the VHRdb. Name of viruses, hosts as well as their identifiers can be searched. For data sources, name can be searched, as well as name of their owner/provider, their description, or their publication url. It is important to note that the search tool is not sensible to special characters or accents (e.g. users can search “herelle” to find “Félix d’Hérelle”).
In both the Search page and the search tool bar, results will appear organized by their nature: virus, host or data source. Only a limited number of results are presented. With the search tool bar, users can hit “Enter” to continue the search in the Search page which allows to see some more results. When in the Search page, users need to click “See all results” (next to section name “Virus”, “Host” or “Data source”) in order to see the totality of results available.
Users can click on any result of interest at any moment to visit its dedicated page.
How search is performed
When searching for the data source “Félix D’Hérelle collection of bacterial viruses”, one can type "herelle collection"
. As the two words might not be located one next to the other, the search engine will return all data sources matching either herelle
or collection
and sort the results based on the relevance of each data source. The relevance of a data source if based on the TF-IDF weighting schemes, the field-coverage of each term (percentage of letters of the field that are covered by the searched word), and the presence of valid identifier or publication.
Note that Viruses and hosts are also searched and sorted this way.
Examples
TF-IDF: When searching for "herelle collection"
, we search for two terms herelle
and collection
. There is more data sources associated to the term collection
than the term herelle
. The TF-IDF weighting schemes will give a higher score to herelle
than to collection
, in the end data sources containing the term herelle
will have a higher score and be more relevant.
Field coverage of term: When searching for "T4"
, let us consider that only two entries match : T4(NC_000866.4)
and Mt1B1_P10(MT496971)
. The name of the first virus is T4
, thus the coverage of the term for the field name is 100%. The identifier of the second virus is MT496971
, the coverage of the term for the field identifier is 2/8=25%. The score associated to T4(NC_000866.4)
will be higher than Mt1B1_P10(MT496971)
as T4
cover 100% of the name for the first virus while only 25% of the identifier of the second virus.
Preference to identified and published resources: When two results have the same score (i.e: the same relevance), the data source with a publication is preferred over the one without it. For viruses and hosts the preference is given to entries with valid NCBI/Hérelle identifier, again only when they have the same score of relevance.
Tuning search
Searching a sentence, i.e: preventing to split it by space
To prevent the search engine de split the searched text, one can put any part of the search text between double quote: "bacterial virus" response
will not search for “bacterial
or virus
or response
” but “bacterial virus
or response
”.
Single letter search are not permitted as supposed to be not relevant, nevertheless one can search for entries match a
by putting it between double quote: "a"
Mandatory word in the results
When searching for bacterial virus response algae
, if the word algae
must be present is each found entry, prefix it with a +
: bacterial virus response +algae
.
Preventing a word to be in the results
When searching for bacterial virus response
, if one wants to filter out results containing algae
somewhere, one can prefix the word with -
: bacterial virus response -algae
.
Searching in specific field
Users can use prefixes in order to define where the query should be present.
For instance, when searching for T4
, this word is searched in all fields, but you can search only for entries that have it in their name with NAME:T4
, or their description DESC:T4
.
Here are all the available prefix:
Prefix |
Search in |
Kind |
---|---|---|
ID: |
NCBI ID, Hérelle ID, custom ID |
Host, virus |
NCBI: |
NCBI nuccore ID |
Host, virus |
HER: |
Félix d’Hérelle Reference Center ID |
Host, virus |
TAX: |
Taxonomic ID |
Host, virus |
NAME: |
entry name |
Data source, host, virus |
DESC: |
description |
Data source |
Prov: |
Provider if specified, owner otherwise |
Data source |
Provider: |
Provider if specified, owner otherwise |
Data source |
owner: |
Provider if specified, owner otherwise |
Data source |
Combining options
Tuning options can be combined: +"were isolated" chicken sewage -desc:ECOR
will search for all entries containing were isolated
, with either chicken
or sewage
, but never ECOR
in the description.
Greek letters
In addition of searching for the string of characters entered by the user, the search tool also try to replace greek letter name by their greek letter. Note that the tool is not sensitive to the upper/lower case.
Here are some virus containing greek letter, and how you can search for :
Virus |
Can be found with |
---|---|
λvir |
|
Φ10 |
|
Available filters
As explained before, the search tool has been designed to find the highest number of results possible. To limit the number of results and make them relevant, users can use multiple filters.
Search for
Users can search for only one kind of element, either virus, host or data source. Note that you can only select one at a time.
Life domain
Users can select in which life domain they are searching: bacteria, archaea or eukaryote. Note that you can only select one at a time.
Only published data
Users can restrict the search to data associated to a publication, directly for data sources, or through the data source for viruses and hosts.
Only identified hosts
Users can restrict the search to hosts possessing a valid NCBI identifier, data sources containing such hosts, and viruses documenting infection status of such hosts.
Only identified viruses
Users can restrict the search to viruses possessing a valid NCBI identifier, data sources containing such viruses, and hosts documenting infection status by such viruses.
Filter by owner
Users can narrow their search to data contributed by one or multiple data source owners, i.e, data sources but also viruses and hosts that can be found in such data sources.