Data Quality in an
Era of Big Data

September 28-29, 2016
Bloomington, IN

Welcome to Data Quality in an Era of Big Data

Welcome to the workshop on Data Quality in an Era of Big Data sponsored by the Midwest Big Data Hub(MBDH) and Indiana University. Throughout the history of modern scholarship, the exchange of scholarly data was undertaken through personal interactions among scholars or through highly curated data archives, such as ICPSR (Inter-University Consortium for Political and Social Research). In both cases, implicit or explicit provenance mechanisms gave a relatively high degree of insurance of the quality of the data. However, the ubiquity of the web and mobile digital culture has produced disruptive new forms of data such as those based on citizen science, social network transactions, or massively deployed automatic sensors. Integrity and trustworthiness of these data are uncertain due to issues such as sampling characteristics, expertise of the data producers, or quality of the instruments. As these data are shared, fused, homogenized, and mixed, we need to ask ourselves what we know about the data and what we can trust. Failure to answer these questions endangers the integrity of the science produced from these data.

Attendees will hear from leaders in the big data fields of citizen science, health, and cyberinfrastructure. The Data To Insight Center at Indiana University leads the Data Science Ring of the midwest regional innovation hub. Improving data quality and curation are our primary objectives in order to facilitate accessibility, interoperability, and accurate analysis of big data. This is a new workshop to introduce early career researchers to current developments for ensuring data quality.

Do join us - we look forward to seeing you!

Professor Beth Plale

School of Informatics and Computing

Indiana University

Professor Carl Lagoze

School of Information

University of Michigan


Professor Carl Lagoze

The overarching theme of Professor Lagoze's research for the past two decades is interoperability in information systems, spanning the full spectrum of the technical and human components that are critical to create networked information systems that really work. Although the results of this research apply across a variety of information contexts, the primary thread explores information systems to support scholarship and knowledge production. The principal question we ask is: how do we build and sustain cyberinfrastructure that supports the full cycle scholarship and that respects the cultural, methodological, and social variance among fields of scholarship? Some relevant sub questions of this primary issue are: what are the methodological approaches and theoretical foundations relevant to understanding the nuanced variations and scholars' attitudes towards technical cyberinfrastructure? And: how do we design technical cyberinfrastructure that simultaneously supports large-scale interoperability it respects the diversity scholarship? Finally: How do we accommodate multiple levels of expertise (e.g., citizen scientists) while maintaining quality and integrity? These are essential questions at a time of significant investment in cyberinfrastructure by governments and funders and the emergence of critical scientific questions such as climate change and global pandemics, the investigation of which requires cross-disciplinary collaborations and cyberinfrastructure support.

Professor Andrea Wiggins

Professor Wiggins is an Assistant Professor at Maryland's iSchool and director of the Open Knowledge Lab at UMD. She studies the design and evolution of sociotechnical systems for large-scale collaboration and knowledge production. Andrea's current work focuses on the role of technologies in citizen science, evaluating individual and collective performance and productivity in open collaboration systems, and the dynamics of open data ecosystems. Andrea serves on several working groups and advisory boards for citizen science projects across a variety of scientific disciplines, and regularly advises federal agencies and nonprofit organizations on citizen science project and technology design.

Professor Marcelline Harris

Professor Harris's career spans both research and practice experience in informatics and health services research, specifically clinical data integration and semantics, and the development and deployment of systems and processes that enable data integration for practice and large scale research. She currently serves as a co-investigator and faculty lead for the University of Michigan node of the Patient Centered Network of Learning Health Systems (LHSNet), a clinical data research network within PCORnet, and also serves on the University's Translational Research Council and Precision Medicine Council. Prior to joining the faculty at the University of Michigan, she held appointments at Mayo Clinic as a career scientist in the Department of Biomedical Statistics and Informatics and an executive operational role for nursing research and clinical informatics.

Scholarship: Travel Funding

The Midwest Big Data Hub and Indiana University will provide funding for lodging and travel for early career researchers to attend the 2-day workshop. In return we request that those who receive funding are able to participate in a poster session before dinner on Sept. 28, 2016 to showcase current developments in Big Data. Early career people representing open access groups, citizen science collaboratives, industry, or public sector data collection are also welcome to apply.

Scholarship Application


This two-day workshop will examine the following topics:

  • data quality in health records
  • data quality in citizen science
  • data quality and trust
  • trust in data publishing

The objective of the workshop is twofold: First to raise awareness particularly amongst early career researchers of the issue of data quality through distinguished invited talks and tutorials, and second to formulate questions that will motivate a follow-on workshop that defines a research agenda in the area.


September 28, 2016

08:30 - 11:45 Cyberinfrastructure Building Wrubel Lobby

08.30 - 9.00 Breakfast

09.00- 09:15 Training Session Welcome
Beth Plale(Indiana University) and Carl Lagoze(University of Michigan)

09:15 - 10:15 Training topic 1: Provenance for data quality: Introduction to PROV model and OPM, Carl Lagoze(University of Michigan)

10:15 - 10:45 Break

10:45 - 11:45 Training topic 2: Working with Data Sets: Ensuring quality, Inna Kouper(Indiana University)

11:45 - 01:00 Lunch

01:00 - 01:15 Welcome to Data Quality in a Big Data Era

01:15 - 02:15 The changing face of data quality, Carl Lagoze(University of Michigan)

02:15 - 02:45 Discussion

02:45 - 03:15 Coffee break

03:15 - 04:15 Aggregating patient data and building trustworthiness into clinical research practices, Marcelline Harris

04:15 - 04:45 Discussion

4:45 - 5:30 Transport or walk to IU Biddle Hotel

06:00 - 08:00 Reception Poster Session and Hors d’oeuvres, Indiana University Memorial Union University Club

September 29, 2016

08:15 - 11:45 Cyberinfrastructure Building Wrubel Lobby

08:15- 08:45 Breakfast

08:45- 09:45 Data Quality in Citizen Science and the Open Knowledge Lab, Andrea Wiggins

09:45 - 10:15 Discussion

10:15 - 10:30 Break

10:30 - 11:30 Panel: Other perspectives on data quality
Moderator: Jill Minor, Indiana University
Panel members: Beth Plale, School of Informatics and Computing, Indiana University
                              Hridesh Rajan, Iowa State University
                              H.V.Jagadish, University of Michigan
                              Inna Kouper, School of Informatics and Computing, Indiana University
                              Valentin Pentchev , Indiana University Network Science Institute, Indiana University

11:30 - 12:30 Synthesizing content, Approaching a research agenda

12:30 Boxed lunch

01:00 - 02:00 Training topic 3: Persistent identifiers in a data centric science, Beth Plale(Indiana University)

Suggested Readings

Bowker, G. C., Brine, K. R., Gruber Garvey, E., Gitelman, L., Steven J. Jackson, Jackson, V., Williams, T. D. (2013).
     "Raw Data" Is an Oxymoron. MIT Press.
     Retrieved from

Boyd, D., & Crawford, K. (2012). Critical Questions for Big Data. Information, Communication
      & Society,
15(5), 662 –679. doi:10.1080/1369118X.2012.678878
     Retrieved from

Chandola, V., Banerjee, A., & Kumar, V. (2007). Outlier detection: A survey. ACM Computing Surveys.
     Retrieved from

Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., Biewald, L., & Francisco, S. (2011). Programmatic Gold: Targeted and Scalable Quality
     Assurance in Crowdsourcing. Artificial Intelligence
     Retrieved from

Wiggins, A., Lagoze, C., & Kelling, S. (2014). A Sensor Network Approach to Managing Data Quality in Citizen Science, 2010–2011.
     Retrieved from

H. Zhu, S. Madnick, Y. Lee, R. W. (2012). Data and Information Quality Research : Its Evolution and Future Stuart Madnick, (December).
     Retrieved from

Important Dates

Scholarship submission deadline: September 9, 2016

Scholarship notification: September 13, 2016

Registration deadline: September 19, 2016

Workshop date: Sept, 28 & 29, 2016



Registration deadline: September 19, 2016

Register Now


General Chair

Beth Plale, Indiana University


Carl Lagoze, University of Michigan

Local Arrangements Chair

Jill Minor, Indiana Univesity

Program Committee

Early Career Chair: Devan Donaldson, Indiana University
Early Career Chair: Xiaozhong Liu, Indiana University
Valentin Pentchev, Indiana University
Hridesh Rajan, Iowa State University
H.V.Jagadish, University of Michigan

For questions please contact

Travel to Bloomington


If flying, you’ll probably fly into and out of Indianapolis International Airport. You can rent a car or take a shuttle (see below) to Bloomington.

Shuttles and private cars

GO Express Travel, (800) 589-6004, travels between Indianapolis International Airport and Bloomington several times each day. It also has private cars and a shuttle to and from Chicagoland.

Star of America, (800) 228-0814, runs a shuttle between Indianapolis International Airport and Bloomington multiple times a day.

Embarque, (800) 888-4639, and Classic Touch Limousine, (800) 319-0082, offer private, chauffeured cars and limos.


We encourage you to stay on campus at the Biddle Hotel and Conference Center in the IMU (Indiana Memorial Union). We are holding a block of rooms.

Getting around Bloomington and to and from the workshop venue

Bloomington is walk-friendly, it is easy to walk from Indiana Memorial Union to downtown or to many buildings on campus.

Green buses can take you around the city, visit for routes and times.

Buses 6, 6L and 9 go between Indiana Memorial Union and the workshop venue (Cyberinfrastructure Building CIB) about every 10 minutes (see route map).

A number of visitor parking permits will be available to non-IU visitors who drive, but parking space is very limited on IU campus. Please consider carpooling.

If you require special assistance, please let us know asap at and we’ll arrange transportation for you.


Both days of the workshop will be held in the Wrubel Lobby of the Cyberinfrastructure Building, located inside the Technology Park at the corner of 10th and the bypass on the northeast corner of the beautiful Bloomington campus. The evening dinner and poster session on September 28, 2016 are located in the President's Room of the University Club within the IMU.

Sample Gates