A program to convert raw fluorescence data generated by Genepix™ microarray scanner to to expression data.
This program is initially designed for lectin microarray analysis (dual color and single color) but it is also applicable to the microarrays.
This program is integrated with a GUI (tkinter).
User can initiate the program by running the python file (.py) in an IDE on a machine with python 3.6 or above installed.
For users who do not wish to install python or IDE, a convenient, user-friendly version (for 64 bit windows only) is also provided (Microarray Analysis_verMar2021_win64.zip). See part III for instructions.
I. Description of the data analysis workflow
- Graphic description of the analysis process
-
Output files
(1) Dual color mode
• An annotated raw data file.
• A processed final data file containing the unadjusted/adjusted average ratios of each lectin for each sample. Both files are text (.txt) files.
(2) Single color mode
• An annotated raw data file.
• A processed final data file containing the unadjusted fluorescence values of each lectin for each sample. Both files are text (.txt) files.
II. Input files
-
Raw data file(s)
• The files must be .txt files (the default output file format in Genepix) or .csv files.
• The program extracts data from the columns with the following names (case sensitive, note the spaces):
When using median values:
Block, Column, Flags, F635 Median - B635, F532 Median - B532, SNR 635, SNR 532
When using mean values:
Block, Column, Flags, F635 Mean - B635, F532 Mean - B532, SNR 635, SNR 532
Making sure the names of these columns are correct is critical.
• Important: this program assumes that 532nm is the sample channel in dual color mode, 635nm is the reference channel in dual color mode and that 635nm is the data channel in single color mode. These assumptions were made based on the fact that Cy3 (and equivalents) and Cy5 (and equivalents) are the most commonly used fluorophores in microarray studies. If this is not true for the experiment, column names in the raw data file(s) must be manually changed.
-
Sample list file(s)
• The files must be .txt files (the default output file format in Genepix) or .csv files.
• The program ignores the first row of each sample list file. Therefore, the first row of each sample file can be basically any text (e.g., “sample”, “sample ID”)
• The number of samples in each sample list must match the corresponding raw data file.
-
Lectin list file
• Only one file can be selected at a time.
• The file must be .txt file (the default output file format in Genepix) or .csv file.
• The program ignores the first row of the lectin list file. Therefore, the first row of the lectin list file can be basically any text (e.g., “lectin”, “lectin or antibody”)
• The number of lectins (including the spots with no probes) must match the corresponding raw data files.
• Important: this program assumes that starting from spot one, all replicates of a probe are next to each other in the raw data files.
-
Multiple input files
• Multiple raw data files and sample list files can be selected at a time; the number of raw data files must be the same as the number of sample list files.
• Users cannot select raw data files in different folders. This is also true when selecting sample list files.
• The raw data files are first sorted by FILE PATHS. Same for sample list files. Then the raw data files and sample list files are matched based on their position in the sorted queues.
**III. USER INSTRUCTION **
*If user is running the .py file (Lectin array data analysis with GUI - 2021Mar05.py) in an IDE
-
Make sure required packages are installed: tkinter, numpy, pandas
-
Run the script. A user interface should appear on the screen.
-
Select input files, options and enter parameters using the user interface.
• Number of replicates of lectins per array must be a positive integer
• Grubbs’ test cut-off must be a positive real number
• SNR cut-offs must be positive real numbers
• Percentage sample cut-off must be a real number between 0 and 1
- Run analysis. Users will be notified of the analysis result (finished/error).
*If user is using the user-friendly version (Microarray Analysis_verMar2021_win64.zip)
-
Unzip the file.
-
In the unzipped folder, find "Run analysis.bat" and open this file.
-
If a user interface appears on the screen, select input files, options and enter parameters:
• Number of replicates of lectins per array must be a positive integer
• Grubbs’ test cut-off must be a positive real number
• SNR cut-offs must be positive real numbers
• Percentage sample cut-off must be a real number between 0 and 1
- Run analysis. Users will be notified of the analysis result (finished/error).
Note:
• Do not change the content or the file names of the content of the folder.
• This only works on win64 platform.