Workflow of Data AnalysisContact and vita
(Instructor Scott Long)
Intensive Summer Session I, May 13-June 6, 2014, 9AM-5PM weekdays
Managing Statistical Research teaches you how to plan, organize, document, and execute sophisticated quantitative analyses regardless of the statistical methods used. The goal is to help you develop an workflow that allows you to work efficiently and accurately while producing results that are replicable. Topics include: 1) Planning your research. 2) Documenting your work. 3) Organizing, backing up, and archiving files. 4) Writing robust, effective programs for data analysis. 5) Using automation (basic programming methods) to work more accurately and efficiently. 6) Preparing data for analysis. 7) Systematically conducting statistical and graphical analyses. 8) Incorporating results into papers and presentations while maintaining their provenance. 9) Backing up your files. 10) Collaboration and data analysis. Lectures, exercises and applications are designed to help you develop a workflow for your own research.
The class assumes that you are planning to do quantitative data analysis and that you have completed at least one graduate class in statistics. Students starting their dissertation have found it a great way to get their work organized, plan new analyses, and conduct analyses that are efficient and replicable. Students who are earlier in their graduate career develop a workflow that they will grow into as they undertake larger research projects. To complete exercises in the class you will need to have access to a dataset that you want to work with. Details are given below.
While Stata is used to illustrate some of the ideas, the strategies and concepts apply to any statistical package and you are welcome to use programs such as SAS or R for your work. To do this, you will need to have that package installed on your laptop and know how to use the software since the instructor might not.
In the past, the course has been over-enrolled. Admitance to the class is "first come, first serve." If you would like to enroll or have questions, contact Scott Long firstname.lastname@example.org.
This intensive class helps you develop a workflow for conducting complex statistical research. Workflow in data analysis is a framework for the entire research process: planning, organizing, and documenting your work; importing data; naming, labeling, documenting, creating, and verifying variables; conducting and presenting statistical and graphical analyses; and preserving your work. Each step is guided by the demands of producing replicable and accurate results while working as quickly and efficiently as possible. While traditional classes in statistics deal with estimating and interpreting models, in "real world" research statistical analyses often involve less than ten percent of the total work. This class focuses on the other ninety percent. Developing an efficient workflow saves time, improves accuracy, and leads to replicable results. We will explore the following topics.
1, General principles that guide your research: replicability, accuracy, and efficiency.
2. Efficient methods for planning, organizing, documenting, executing, and preserving your work.
3. Tools that enhance and simplify your work: software, programming methods, organizational structure, and cyberinfrastructure.
4. Real world examples of what works and what does not in each stage of the process.
While many software tools are illustrated, Stata is use to illustrate an effective workflow, andthe course will refer to my The Workflow of Data Analysis Using Stata.
Requirements: Each student is expected to develop her own workflow and apply it to a real research project. The class is too short to complete the project, but you can develop critical skills to help you complete future research more quickly and accurately while ensuring replicability and maintaining the provenance of results. Students must participate in class, complete exercises applying the lectures to their project, develop a mock file structure with backup procedures, maintain a research log, and present their "final" workflow to the class. While some lab time for independent work will be available during the day, students will also need to work on their project outside of class.
AutoHotkey: A freeware macro program.
TextWrangler: A highly recommended, freeware editor.
Authorization is required for enrollment. Contact Scott Long email@example.com.
There are several things you can do to get ready for the class.
1. Spend time working with Stata or your statistical package of choice.
2. Get materials ready for your project, such as datasets and codebooks. Begin planning your project and try loading your data in Stata.
3. Think about how you want to organize your files.
4. Gather all of the materials that you used for a quantitative research project. Try to replicate the results from the project.
5. Backup your files!
|© 2014 J. Scott Long|