Excel - Working With Data
Excel - Working With Data
Saving your steps: unlike statistical Ambiguous algorithms: statistical Zeros and N/A: some formulas
software such as SAS, Stata, R, you algorithms may not behave won't work with zeros or blanks as
can't keep track of the modifications correctly or as expected, or the expected, you'll have to decide
you make to your data in Excel results may be hard to interpret what to do about missing values
Data worksheet preparation (repeat for each data worksheet in your file)
Create a "Notes" worksheet using a naming convention that helps preserve the link with the data worksheet (for example:
2021_survey_data and 2021_survey_data_notes). You can use this worksheet to summarize the contents of the data
worksheet, keep a written record of modifications made to the data worksheet, and/or put in metadata about data
variables.
If applicable, move anything above and under the data table of each of your data worksheets to the "Notes" worksheet.
If your column headings take up more than one row in your spreadsheet, collapse the headings so that there is only one
row with headings. This might mean concatenating an overarching category and repeating over multiple columns.
Make sure that your columns have significant and distinct headings.
Unmerge any merged rows. Repeat overarching categories in every row of the column rather than leaving rows blank.
Resize columns to desired width, select the data table, and use Wrap Text (Home ribbon, Alignment section) to make the
contents of the cells display entirely on-screen.
Use Freeze Panes to freeze your header row and any significant columns in place (View ribbon, Window section).
Consider using a tool like Open Refine to clean up your data. Otherwise, try the following steps:
o Use the TRIM formula to remove extra blanks at the beginning or end of a cell.
o Create a new column at the end of your data table that can flag missing data. Use the COUNTA or COUNTBLANK
formulas to investigate any counts that might be problematic with your dataset (the count will depend on your
data and the context – is it fine if some columns contain blank values?).
If you want, you can 'color' any blank cell in your data table (Excel for Windows) by going to the Home
ribbon, Editing section, Find and Select, Go to Special, selecting blanks, and selecting a highlight color.
o Create a new column at the end of your table to flag any "silly" or problematic rows of data that you may need to
exclude from your analysis.
o Add a filter to your data table (click on a cell in your data table, go to Home ribbon, Editing section, Filter). Click
on the filter dropdown for each variable and look for any unusual values: look at the beginning and end of the list
of values, look for "N/A", look for silly answers.
Examine your worksheet for "issues" identified in the Quartz guide to bad data – especially the issues flagged in the first
section (Issues that your source should solve).
Other considerations
Resist any and all temptation to merge cells. Stay strong!
If you run into any problems when using a formula or visualizing your data in Excel, double-check the format that has
been applied to the cells referenced by the formula/visualization.
Follow the RDM practices shared in Brief guide: Research Data Management.
Follow the practices shared in Data Carpentry's Data Organization in Spreadsheets for Social Scientists. The Formatting
Problems page is especially important to go over.
Ask your data librarian for help if any of these steps seem like they might take a lot of manual work or a long time to do.
Learning resources