J. Scott Long - Indiana University
Department of Sociology | Department of Statistics
Home Teaching Research SPost Commands Workflow of Data AnalysisContact and vita Links Recommendations FTP downloads
ICPSR Summer Workshop on The Workflow of Data Analysis using Stata | Instructor Scott Long

June 17-21, 2013 at the Center for Research on Families at the University of Massachusetts-Amherst

ICPSR Summer Program Workshop on The Workflow of Data Analysis using Stata: June 17-21, 2013 at the Center for Research on Families at the University of Massachusetts-Amherst. For further details, check here. Contact Scott Long if you have questions.

The workflow of data analysis is not a class about a specific statistical technique. Instead, it is a class that teaches you how to plan, organize, document, and execute sophisticated quantitative analyses regardless of the statistical methods used. The goal is to help you develop an workflow that allows you to work efficiently and accurately while producing results that are replicable. Topics include: 1) Planning your research. 2) Documenting your work. 3) Organizing, backing up, and archiving files. 4) Writing robust, effective do-files. 5) Using automation (basic programming methods) to work more accurately and efficiently. 6) Preparing data for analysis. 7) Systematically conducting statistical and graphical analyses. 8) Incorporating results into papers and presentations while maintaining their provenance. 9) Backing up your files. Lectures, exercises and applications are designed to help you develop a workflow that you can apply to your own research.

For more information on the workflow of data analysis, check here. If you need further encouragement to take the class, check here.

Downloads · Schedule · Description · Books · Getting ready

News

  1. Using Stata: If you do not have experience with Stata or would like a refresher, I suggest that you work through the "Getting Started Using Stata" before the course begins.
  2. Books: Click here for ordering books.
  3. External drive: I encourage everyone to have an external drive to use for backup and testing methods for preserving data. I recommend a drive that does NOT require external power but rather gets power for the USB for Firewire port. Depending on the size of your dataset, a flash drive might work.
  4. Setting up your computer: Before class begins, I encourage you to: a) Install Dropbox; b) Install a file manager and text editor (suggestions are here); d) Having Stata installed on your computer is helpful but not required.

Schedule

  • To be completed...

Detailed description of class

This intensive workshop helps you develop a workflow for conducting complex statistical research. Workflow in data analysis is a framework for the entire research process: planning, organizing, and documenting your work; importing data; naming, labeling, documenting, creating, and verifying variables; conducting and presenting statistical and graphical analyses; and preserving your work. Each step is guided by the demands of producing replicable and accurate results while working as quickly and efficiently as possible. While traditional classes in statistics deal with estimating and interpreting models, in "real world" research statistical analyses often involve less than ten percent of the total work. This class focuses on the other ninety percent. Developing an efficient workflow saves time, improves accuracy, and leads to replicable results. This two-week workshop explores the following topics.

1, General principles that guide your research: replicability, accuracy, and efficiency.

2. Efficient methods for planning, organizing, documenting, executing, and preserving your work.

3. Tools that enhance and simplify your work: software, programming methods, organizational structure, and cyberinfrastructure.

4. Real world examples of what works and what does not in each stage of the process.

  • Planning and organizing research
  • Preparing data for analysis: importing data; developing consistent names and labels; documenting the sample and variables; and cleaning the data.
  • Conducting sophisticated data analysis that is replicable and efficient.
  • Accurately and quickly incorporating complex statistical results into your writing and presentations while maintaining the provenance of each result.
  • Methods to speed up the inevitable process of revising your work.
  • How to prevent catastrophic loss of files during the project and to ensure long term preservation of your materials.

While many software tools are illustrated, Stata is the primary package for data management and analysis and the course will use Long (2009) The Workflow of Data Analysis Using Stata.. While the methods apply readily to other statistical packages, students must have some familiarity with Stata. If you are not familiar with Stata, you take the Stata netcourse NC101 or study an introduction to Stata (e.g., chapters 3 and 4 of my Workflow book).

Books

  1. Long, J.S. 2008, The Workflow of Data Analysis Using Stata. Stata Press.College Station, TX. The book is cheaper from Stata Corp than at amazon.com, but if you have free shipping, amazon.com might be cheaper. Some copies of this book will be at the book store.
  2. Wong, Dona M. 2010. The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures. Highly recommended for tables and graphs.

Computing

Windows

Notepad++: A highly recommended, freeware editor. Information on enhancing Notepad++ for work with Stata is here.

muCommander: A nice, freeware file manager.

AutoHotkey: A freeware macro program.

Mac OSX

TextWrangler: A highly recommended, freeware editor.

muCommander: A very nice, freeware file manager. Highly recommended.

Getting ready

There are several things you can do to get ready for the class.

1. Spend time working with Stata.

2. Get the materials ready for your project, such as datasets and codebooks. Begin planning your project and try loading your data in Stata.

3. Think about how you want to organize your files.

4. Gather all of the materials that you used for a quantitative research project. Try to replicate the results from the project.

© 2013 J. Scott Long