# What is a compatible file?¶

## Short version¶

A compatible file is an Excel spreadsheet (.xlsx) in which hosts are indicated in columns and viruses are indicated in rows. Each cell is filled with the response (a digit) of the interaction between one host and one virus (see example below). Warning: only data located in the first Excel sheet will be taken into account!

 coli MG1655 (NC_000913.3) coli O157:H7 (AE005174.2) T4 (NC_000866.4; HER:27) 2 0 T7 (NC_001604.1) 1 2 MyVirus 2 1

## Detailed version¶

### Values accepted as responses¶

Each user has its own way to record the output of host range tests. From basic two-state (infection/no infection) to more detailed information such as numerical values corresponding to efficiency of plating calculations. The VHRdb will accept any response that is a digit. It is highly recommended to allocate the lowest value to the no infection state and the highest value to the infection state. Therefore, optionals intermediates states should correspond to any value between the lowest and the highest. The range between the lowest and the highest values can be as large as you wish. For example, you can upload data source ranging from 0 to 100 or from 0.0001 to 1. The VHRdb mapping scheme allow users to freely modify the threshold values defining the possible states according to the global scheme (No infection, Intermediate, Infection).

### Providing the identifier of a virus or a host¶

Users can add identifiers for viruses and hosts in two different ways. When filling the spreadsheet users can add identifiers (between parentheses) in the same heading cell than the virus or host name (see example below). Identifiers can be NCBI identifier and/or HER identifier, and or any custom identifier. Multiple identifiers must be separated by a semicolon ;. More documentation can be found here.

The second way to enter identifiers is to edit the VHRdb data after the source table has been uploaded. This is particularly convenient for adding NCBI identifiers sometimes obtained after the uploading the source table.

Examples :

### How to compare data across several data sources¶

Every data source is processed using the same mapping procedure linked to a simplified three-state global scheme. Therefore, all data sources can be compared.

### Can cells of the Excel source file be colored?¶

Cell colors in the source file are not taken into account. Therefore, if you ranked your responses by using a color scheme only, you must convert it to digits before submission to the VHRdb. If you are using colors in addition to digits, you don’t need to remove the colors, they will not interfere with the uploading process. Note that colors are also used in exported file when possible to improve readability.

### Variants in the file disposition¶

1. The header of a row (or column) in the spreadsheet can be preceded by one or many columns (or rows). Only the last column (or row) will be taken into account. In the following example, only the cell in bold will be uploaded into the VHRdb.

 Obtained from John doe Obtained from Jane doe On 2009-02-21 On 2012-03-21 E. coli MG1655 E. coli O157:H7 Virus T4 (NC_000866.4) 2 0 Virus T7 (NC_001604.1) 1 2 Virus MyVirus 2 1
1. The data must be on the first sheet of the file.

2. There can be additional rows after the responses, they will not be imported as long as there is nothing written where the virus (or host) name is expected. In the following example, the last row will not be imported.

 E. coli MG1655 E. coli O157:H7 Virus T4 (NC_000866.4) 2 0 Virus T7 (NC_001604.1) 1 2 Virus MyVirus 2 1 Infection Ratio 100% 50%

### Robustness of file import¶

The resilience of the importation module to read and interpret the file is of a paramount importance, we generated multiple configuration in which a file could be written and how we should read it. At each change in the programme we test that each file is still read as expected. The file collection can be browsed at https://gitlab.pasteur.fr/hub/viralhostrangedb/tree/master/src/viralhostrange/test_data, where for an input file <filename>.xlsx the data we extract from it is <filename>.xlsx.json.

If you tried to import a file which should work but did not, please to not hesitate to submit an issue at https://gitlab.pasteur.fr/hub/viralhostrangedb/issues with the file cleaned of its private data. You can also e-mail the file and steps to reproduce the bug at viralhostrangedb@pasteur.fr. We will either correct the importation module so that the file can now be imported, or return a error message clear enough so users understand what is wrong with their file.