Workflow of Data AnalysisContact and vita
(Instructor Scott Long)
Intensive Summer Session I, May 7-31, 2013, 9AM-5PM weekdays
The workflow of data analysis is not a class about a specific statistical technique. Instead, it is a class that teaches you how to plan, organize, document, and execute sophisticated quantitative analyses regardless of the statistical methods used. The goal is to help you develop an workflow that allows you to work efficiently and accurately while producing results that are replicable. Topics include: 1) Planning your research. 2) Documenting your work. 3) Organizing, backing up, and archiving files. 4) Writing robust, effective do-files. 5) Using automation (basic programming methods) to work more accurately and efficiently. 6) Preparing data for analysis. 7) Systematically conducting statistical and graphical analyses. 8) Incorporating results into papers and presentations while maintaining their provenance. 9) Backing up your files. Lectures, exercises and applications are designed to help you develop a workflow that you can apply to your own research.
Orientation is on Tuesday, May 7 at 10AM in Ballantine Hall 006. The main lectures and labs which will be held from from May13 through May 24 from 9:00AM till 5PM. May 27 to May 31 is for independent work, consulting, and presentation of your project.
This intensive workshop helps you develop a workflow for conducting complex statistical research. Workflow in data analysis is a framework for the entire research process: planning, organizing, and documenting your work; importing data; naming, labeling, documenting, creating, and verifying variables; conducting and presenting statistical and graphical analyses; and preserving your work. Each step is guided by the demands of producing replicable and accurate results while working as quickly and efficiently as possible. While traditional classes in statistics deal with estimating and interpreting models, in "real world" research statistical analyses often involve less than ten percent of the total work. This class focuses on the other ninety percent. Developing an efficient workflow saves time, improves accuracy, and leads to replicable results. This two-week workshop explores the following topics.
1, General principles that guide your research: replicability, accuracy, and efficiency.
2. Efficient methods for planning, organizing, documenting, executing, and preserving your work.
3. Tools that enhance and simplify your work: software, programming methods, organizational structure, and cyberinfrastructure.
4. Real world examples of what works and what does not in each stage of the process.
While many software tools are illustrated, Stata is the primary package for data management and analysis and the course will use Long (2009) The Workflow of Data Analysis Using Stata.. While the methods apply readily to other statistical packages, students must have some familiarity with Stata. If you are not familiar with Stata, you take the Stata netcourse NC101 or study an introduction to Stata (e.g., chapters 3 and 4 of my Workflow book).
Requirements: Each student is expected to develop her own workflow and apply it to a real research project. The class is too short to complete the project, but you can develop critical skills to help you complete future research more quickly and accurately while ensuring replicability and maintaining the provenance of results. Students must participate in class, complete exercises applying the lectures to their project, develop a mock file structure with backup procedures, maintain a research log, and present their "final" workflow to the class. While some lab time for independent work will be available during the day, students will also need to work on their project outside of class.
muCommander: A nice, freeware file manager.
AutoHotkey: A freeware macro program.
TextWrangler: A highly recommended, freeware editor.
muCommander: A very nice, freeware file manager. Highly recommended.
Authorization is required for enrollment to make sure that each student has sufficient quantitative experience to benefit from the course and has a research project that they can use in class.
There are several things you can do to get ready for the class.
1. Spend time working with Stata.
2. Get the materials ready for your project, such as datasets and codebooks. Begin planning your project and try loading your data in Stata.
3. Think about how you want to organize your files.
4. Gather all of the materials that you used for a quantitative research project. Try to replicate the results from the project.
|© 2013 J. Scott Long|