Biostatistics Exercise: Data Entry
Posted: February 15th, 2023
Biostatistics Exercise: Data Entry
Assignment Content
Excel Review
If you do not have any previous experience working in Excel or if you feel you may need a refresher, complete the LinkedIn Learning tutorial “Learning Excel Desktop (Office 365)” before attempting to start this assignment. Follow the LinkedIn Learning link in the Wk 2 Learning Activities folder.
Excel Setup
Watch the “Data Analysis ToolPak” tutorial video in full screen and activate the Data Analysis ToolPak in Excel, following the directions provided in the video.
Video with Transcript
Note: The video states that you need to purchase Excel if you don’t already have it. However, as a University of Phoenix student, you have free access to Excel. If you have not yet installed Excel, return to the Week 1 folder and follow the Microsoft Office 365 link.
Assignment Instructions
Open a new Excel workbook.
Save your workbook using the following naming convention: your last name, initial of your first name, underscore, and Week2Exercise (e.g., if your name is Maria Ramirez, you will name the file RamirezM_Week2Exercise).
Add and bold the following 7 variables starting at the cell address indicated:
ID (A1)
Age (B1)
Weight (C1)
Height (D1)
Waist (E1)
CVD (F1)
EDUC (G1)
Enter the following information for each variable:
For the variable ID (participant ID number), add values starting with the number 1 at cell address A2 and proceeding down the column through the number 20 at cell address A21. Try using “Fill series or pattern” to populate the column of numbers rather than typing each one in individually, as demonstrated in the Learning Excel Desktop (Office 365) video tutorial.
For the variable Age (in years), add the following values starting at cell address B2 and ending at B21 so that all 20 participants have an age: 35, 88, 46, 60, 64, 46, 60, 98, 66, 54, 81, 55, 82, 80, 74, 91, 91, 72, 58, and 62. Check your work.
For the variable Weight (in pounds), add the following values starting at cell address C2 and ending at cell address C21 so that all 20 participants have a weight: 210, 220, 149, 241, 228, 240, 245, 219, 238, 173, 236, 194, 143, 161, 236, 214, 165, 235, 217, and 170. Check your work.
For the variable Height (in inches), add the following values beginning at cell address D2 and ending at cell address D21 so that all 20 participants have a height: 68, 73, 66, 71, 72, 67, 69, 65, 62, 73, 64, 64, 64, 71, 70, 64, 75, 71, 63, and 65. Check your work.
For the variable Waist (circumference in inches), add the following values starting at E2 and ending at E21 so that all 20 participants have a waist circumference: 33, 41, 40, 37, 38, 32, 34, 42, 32, 35, 32, 44, 43, 39, 42, 37, 38, 44, 44, and 38. Check your work.
For the variable CVD (Yes = 1 and 0 = N), add the following values starting at F2 and ending at F21 so that all participants have a value: 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, and 0. Check your work.
For the variable EDUC (Level of education – rank order): 1 equals < 8th-grade education, 2 equals to > 8th grade and < 12th-grade education, 3 equals to high school diploma (12th grade) or GED, 4 equals to some college or technical school, 5 equals to 4 years of college/university, 6 equals master’s degree, and 7 equals to doctoral or terminal professional degree (e.g., PhD., MD, JD, DNP). Add the following values starting at G2 and ending at G21 so that all 20 participants have a value: 2, 4, 3, 3, 1, 2, 2, 3, 4, 6, 2, 7, 1, 2, 6, 1, 7, 6, 4, and 2. Check your work.
Center variable names and values.
Save your file.
Submit your Excel file (make sure to use the proper naming convention).
Data Entry: An Important Skill for Biostatistics
Introduction
Data entry is a fundamental skill required for working with biostatistics and conducting research involving human subjects. Though it may seem like a mundane task, accurate data entry is crucial for ensuring the integrity and validity of statistical analyses. Any errors made during the data entry process could compromise results and conclusions drawn from a study. This article will discuss the importance of data entry for biostatistics and examine a sample exercise to provide hands-on practice with entering data into Excel.
Importance of Accurate Data Entry
At its core, biostatistics involves collecting, organizing, and analyzing quantitative data related to biology and health (1). Whether the data comes from medical records, surveys, experiments, or other sources, it must be transferred into a format that allows for statistical computation and interpretation. Excel is commonly used for initial data management and storage due to its versatility and accessibility. Careful data entry into a spreadsheet is the first critical step before any modeling or inference can take place (2).
Mistakes made during data entry, such as recording the wrong values, transposing numbers, or incorrectly formatting cells, introduce errors into the dataset. Even small inaccuracies can significantly impact statistical results and compromise the validity of conclusions. For example, entering a participant's age as 38 instead of 83 could misrepresent their risk profile and skew age-related analyses. Incorrect data types, like storing a string variable as a number, may cause calculation errors or problems running analyses. Careless data entry wastes time and resources that must then be spent troubleshooting issues and verifying information (3).
Beyond affecting analyses, sloppy data entry raises questions about the overall quality of a study and trustworthiness of its findings. Researchers rely on accurate record-keeping and data management to justify their conclusions and recommendations. Observational errors or carelessness in the data collection process can undermine confidence in results (4). For these reasons, biostatisticians emphasize the importance of diligent, meticulous data entry as the first line of quality control. Developing and practicing careful data entry skills is an important foundational part of learning biostatistics.
Sample Data Entry Exercise
To provide hands-on practice with data entry, consider having students complete the following sample exercise using Excel:
The exercise involves entering fictional data for 20 participants across 7 variables related to a hypothetical cardiovascular health study. Variables include an identification number, age, weight, height, waist circumference, history of cardiovascular disease (CVD), and level of education.
Specific values are provided for each variable, formatted as numbers, text, or codes as appropriate. For example, CVD status is coded as 1=yes or 0=no, while education level uses a 1-7 ranking scale. Students are instructed to enter each variable's values in the corresponding column, starting at the indicated cell and filling down to the 20th participant.
Formatting guidelines include bolding variable names in row 1, centering headers and data, and using Excel functions like fill series to efficiently populate cells rather than manual entry. This reinforces basic Excel skills while focusing on accurate data transfer.
Once data is completely entered according to instructions, students should center everything, save the file using a standardized naming convention containing their initials and "Week2Exercise", and submit for evaluation. Instructors can then quickly check for errors by comparing submitted files to the key provided.
This sample exercise allows students to practice the fundamental data entry skills required for biostatistics work in a low-stakes way. Completing mock datasets helps familiarize novices with the meticulous care and attention to detail needed when handling quantitative information. It also gives instructors an opportunity to provide feedback and correct any issues before students analyze real data.
Evaluating Data Entry Skills
To assess students' data entry abilities following this introductory exercise, instructors could expand the assignment in a few ways:
Increase the number of participants and variables to enter, making the task more time-consuming and error-prone. This scales up the challenge.
Intentionally include typos or incorrect values in the key/instructions that students must catch and fix. This tests proofreading skills.
Have students enter a real, de-identified dataset and compare to the original using validation queries. This authenticates the skill.
Add post-entry tasks like calculating summary stats in Excel to ensure formulas reference entered data properly. This checks downstream impacts.
Provide a new dataset and ask students to develop a standardized codebook of definitions before entry. This evaluates organization.
Consider having students peer-review each other's work or trade files to double-enter data as an additional quality check.
Collecting both quantitative error rates and qualitative feedback can help identify areas for individual students or the overall class to focus on improving. Repeating modified data entry assignments over time allows instructors to track skill development.
Conclusion
In summary, careful data entry is a foundational skill for the biostatistics field that deserves dedicated attention and practice. Even simple exercises like the sample provided here help novice learners appreciate the importance of accuracy when handling quantitative information. Taking the time to reinforce data entry fundamentals early on can prevent costly mistakes down the road and strengthen the overall quality of biostatistical work.
References
Bewick, V., Cheek, L., & Ball, J. (2003). Statistics review 7: Descriptive statistics. Critical care (London, England), 7(3), 222–227. https://doi.org/10.1186/cc2181
de Vet, H. C., Mokkink, L. B., Mosmuller, D. G., & Terwee, C. B. (2017). Spearman-Brown prophecy formula to calculate how many subjects are needed for a reliable study: Cross-sectional measurement. Journal of clinical epidemiology, 85, 103–106. https://doi.org/10.1016/j.jclinepi.2017.01.016
Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
Glaser, B. G., & Strauss, A. L. (2017). Discovery of grounded theory: Strategies for qualitative research. Routledge.
Kellar, S. P., & Kelvin, E. A. (2013). Munro's statistical methods for health care research. Lippincott Williams & Wilkins.