Welcome to the workshop on Data Quality in an Era of Big Data sponsored by the Midwest Big Data Hub(MBDH) and Indiana University. Throughout the history of modern scholarship, the exchange of scholarly data was undertaken through personal interactions among scholars or through highly curated data archives, such as ICPSR (Inter-University Consortium for Political and Social Research). In both cases, implicit or explicit provenance mechanisms gave a relatively high degree of insurance of the quality of the data. However, the ubiquity of the web and mobile digital culture has produced disruptive new forms of data such as those based on citizen science, social network transactions, or massively deployed automatic sensors. Integrity and trustworthiness of these data are uncertain due to issues such as sampling characteristics, expertise of the data producers, or quality of the instruments. As these data are shared, fused, homogenized, and mixed, we need to ask ourselves what we know about the data and what we can trust. Failure to answer these questions endangers the integrity of the science produced from these data.
Attendees will hear from leaders in the big data fields of citizen science, health, and cyberinfrastructure. The Data To Insight Center at Indiana University leads the Data Science Ring of the midwest regional innovation hub. Improving data quality and curation are our primary objectives in order to facilitate accessibility, interoperability, and accurate analysis of big data. This is a new workshop to introduce early career researchers to current developments for ensuring data quality.
Do join us - we look forward to seeing you!
Professor Beth Plale
School of Informatics and Computing
The overarching theme of Professor Lagoze's research for the past two decades is interoperability in information systems, spanning the full spectrum of the technical and human components that are critical to create networked information systems that really work. Although the results of this research apply across a variety of information contexts, the primary thread explores information systems to support scholarship and knowledge production. The principal question we ask is: how do we build and sustain cyberinfrastructure that supports the full cycle scholarship and that respects the cultural, methodological, and social variance among fields of scholarship? Some relevant sub questions of this primary issue are: what are the methodological approaches and theoretical foundations relevant to understanding the nuanced variations and scholars' attitudes towards technical cyberinfrastructure? And: how do we design technical cyberinfrastructure that simultaneously supports large-scale interoperability it respects the diversity scholarship? Finally: How do we accommodate multiple levels of expertise (e.g., citizen scientists) while maintaining quality and integrity? These are essential questions at a time of significant investment in cyberinfrastructure by governments and funders and the emergence of critical scientific questions such as climate change and global pandemics, the investigation of which requires cross-disciplinary collaborations and cyberinfrastructure support.
Professor Wiggins is an Assistant Professor at Maryland's iSchool and director of the Open Knowledge Lab at UMD. She studies the design and evolution of sociotechnical systems for large-scale collaboration and knowledge production. Andrea's current work focuses on the role of technologies in citizen science, evaluating individual and collective performance and productivity in open collaboration systems, and the dynamics of open data ecosystems. Andrea serves on several working groups and advisory boards for citizen science projects across a variety of scientific disciplines, and regularly advises federal agencies and nonprofit organizations on citizen science project and technology design.
Professor Harris's career spans both research and practice experience in informatics and health services research, specifically clinical data integration and semantics, and the development and deployment of systems and processes that enable data integration for practice and large scale research. She currently serves as a co-investigator and faculty lead for the University of Michigan node of the Patient Centered Network of Learning Health Systems (LHSNet), a clinical data research network within PCORnet, and also serves on the University's Translational Research Council and Precision Medicine Council. Prior to joining the faculty at the University of Michigan, she held appointments at Mayo Clinic as a career scientist in the Department of Biomedical Statistics and Informatics and an executive operational role for nursing research and clinical informatics.
This two-day workshop will examine the following topics:
The objective of the workshop is twofold: First to raise awareness particularly amongst early career researchers of the issue of data quality through distinguished invited talks and tutorials, and second to formulate questions that will motivate a follow-on workshop that defines a research agenda in the area.
08:30 - 11:45 Cyberinfrastructure Building Wrubel Lobby https://it.iu.edu/cib/
08.30 - 9.00 Breakfast
09:15 - 10:15 Training topic 1: Provenance for data quality: Introduction to PROV model and OPM, Carl Lagoze(University of Michigan)
10:15 - 10:45 Break
10:45 - 11:45 Training topic 2: Working with Data Sets: Ensuring quality, Inna Kouper(Indiana University)
11:45 - 01:00 Lunch
01:00 - 01:15 Welcome to Data Quality in a Big Data Era
01:15 - 02:15 The changing face of data quality, Carl Lagoze(University of Michigan)
02:15 - 02:45 Discussion
02:45 - 03:15 Coffee break
03:15 - 04:15 Aggregating patient data and building trustworthiness into clinical research practices, Marcelline Harris
04:15 - 04:45 Discussion
4:45 - 5:30 Transport or walk to IU Biddle Hotel
06:00 - 08:00 Reception Poster Session and Hors d’oeuvres, Indiana University Memorial Union University Club
08:15 - 11:45 Cyberinfrastructure Building Wrubel Lobby https://it.iu.edu/cib/
08:15- 08:45 Breakfast
08:45- 09:45 Data Quality in Citizen Science and the Open Knowledge Lab, Andrea Wiggins
09:45 - 10:15 Discussion
10:15 - 10:30 Break
10:30 - 11:30 Panel: Other perspectives on data quality Moderator: Jill Minor, Indiana University Panel members: Beth Plale, School of Informatics and Computing, Indiana University             Hridesh Rajan, Iowa State University             H.V.Jagadish, University of Michigan             Inna Kouper, School of Informatics and Computing, Indiana University             Valentin Pentchev , Indiana University Network Science Institute, Indiana University
11:30 - 12:30 Synthesizing content, Approaching a research agenda
12:30 Boxed lunch
01:00 - 02:00 Training topic 3: Persistent identifiers in a data centric science, Beth Plale(Indiana University)
Bowker, G. C., Brine, K. R., Gruber Garvey, E., Gitelman, L., Steven J. Jackson, Jackson, V., Williams, T. D. (2013).
"Raw Data" Is an Oxymoron. MIT Press.
Retrieved from http://mitpress.mit.edu/books/raw-data-oxymoron
Boyd, D., & Crawford, K. (2012). Critical Questions for Big Data. Information, Communication
& Society, 15(5), 662 –679. doi:10.1080/1369118X.2012.678878
Retrieved from https://people.cs.kuleuven.be/~bettina.berendt/teaching/ViennaDH15/boyd_crawford_2012.pdf
Chandola, V., Banerjee, A., & Kumar, V. (2007). Outlier detection: A survey. ACM Computing Surveys.
Retrieved from https://wwws.cs.umn.edu/tech_reports_upload/tr2007/old_files/07-017.pdf
Oleson, D., Sorokin, A., Laughlin, G., Hester, V., Le, J., Biewald, L., & Francisco, S. (2011). Programmatic Gold: Targeted and Scalable Quality
Assurance in Crowdsourcing. Artificial Intelligence,43–48.
Retrieved from http://publicassets.s3.amazonaws.com/papers/HCOMP2011_philosopher_stone.pdf
Wiggins, A., Lagoze, C., & Kelling, S. (2014). A Sensor Network Approach to Managing Data Quality in Citizen Science, 2010–2011.
Retrieved from https://www.aaai.org/ocs/index.php/HCOMP/HCOMP14/paper/viewFile/9263/9198
H. Zhu, S. Madnick, Y. Lee, R. W. (2012). Data and Information Quality Research : Its Evolution and Future Stuart Madnick, (December).
Retrieved from http://web.mit.edu/smadnick/www/wp/2012-13.pdf
If flying, you’ll probably fly into and out of Indianapolis International Airport. You can rent a car or take a shuttle (see below) to Bloomington.
GO Express Travel, (800) 589-6004, travels between Indianapolis International Airport and Bloomington several times each day. It also has private cars and a shuttle to and from Chicagoland.
Star of America, (800) 228-0814, runs a shuttle between Indianapolis International Airport and Bloomington multiple times a day.
We encourage you to stay on campus at the Biddle Hotel and Conference Center in the IMU (Indiana Memorial Union). We are holding a block of rooms.
Bloomington is walk-friendly, it is easy to walk from Indiana Memorial Union to downtown or to many buildings on campus.
Green buses can take you around the city, visit http://bloomingtontransit.com/ for routes and times.
Buses 6, 6L and 9 go between Indiana Memorial Union and the workshop venue (Cyberinfrastructure Building CIB) about every 10 minutes (see route map).
A number of visitor parking permits will be available to non-IU visitors who drive, but parking space is very limited on IU campus. Please consider carpooling.
If you require special assistance, please let us know asap at firstname.lastname@example.org and we’ll arrange transportation for you.