Welcome to my information studies portfolio. As I progress throughout my IS PhD program, I will be updating with information about conferences, workshops, news items, and publications. Please take a look around at my resume and examples of my graduate work in Drexel's MSLIS graduate program, and connect with me through E-mail, LinkedIn, & Goodreads. Below is my blog, where you can read about recent workshops, conferences, and fellowships.
Metadata Mixer: Carolee Mitchell, from Data.World ✥
During today's Metadata Mixer, at Drexel's College of Computing and Informatics, we heard from Carolee Mitchell, the Manager of User Operations representing the Austin-based startup company, Data.World.
Carolee elaborated about the purpose of the data sharing platform, and the importance of linking not only data, but the people who are creating the data.
We currently have both data and people in separate silos, with each data user having to perform their own data "janitorial work" on the same data set. It's an extraordinary waste of a researcher's time. Data.World addresses this by serving as a "self-service data prep" tool, and incorporates visualization tools to help create meaningful representations.
One key issue that crossed my mind, which was addressed by Dr. Jane Greenberg, is that in order for data to truly be linked (behind the confines of Data.World's own ecosystem), they need to incorporate ORCID identifiers and other standard vocabularies, such as Name Authority Records.
More information about Data.World can be found on the site's overview or their brief introductory video:
RDA Fellowship Overview ✥
In April, 2017, I was awarded a 12-month Research Data Alliance (RDA) Data Share Fellowship. My goal is to help facilitate safe and trustworthy data sharing between seemingly disparate open and closed data communities, creating a means by which researchers can share their datasets without worrying that sensitive data will be mismanaged or misused in the hands of a third-party participant.
In order to ensure compliance throughout the entire life-cycle of a dataset, even in the hands of a third-party, datasets must convey comprehensive rights metadata that communicates how the dataset can be used, re-used, shared, and accessed. The work will contribute to this goal by establishing institutional rights metadata best practices, guided by the needs established in individual data sharing agreements.
My goals for this fellowship are: 1) evaluate a sample of current Institutional Review Board protocols for data rights management and metadata, 2) conduct a crosswalk analysis of existing rights metadata standards, and 3) create recommendations for IRB protocol best practices for the creation of rights management metadata.
Fellow Orientation ✥
The RDA Fellow Orientation took place at Rensselaer Polytechnic Institute (RPI) in Troy, NY on May 16th and 17th. RDA leaders spoke to the 2017 Data Share cohort about what to expect as fellows in the program, what our proposed projects plan to accomplish, and what our first steps should be. We also discussed potential collaboration opportunities with other cohort members as well as interest/working group chairs.
Research Data Alliance Career Panel.
Day two of the orientation revolved around a four-person panel who spoke about data as a career. One important takeaway from the discussion was the notion that when you work in the data field, you have to be comfortable (and even embrace) that you won't be the smartest person in the room--but are nevertheless necessary facilitators. In Beth Plale's words, "the algorithmic model plus human intuition = an open space for solutions."
Workshop Overview ✥
The goals of the workshop were to discuss and innovate ways that big data (defined by Volume, Velocity, Veracity, Variety, & Value) can be used to confront new challenges faced by the smart grid community. Panel topics included Big Data Availability & Management; International Experiences: Synchrophasors BD; Data Analytics & tools; and Future Efforts.
Poster Presentation ✥
Dr. Jane Greenberg and PhD student Sam Grabus travelled down to Texas A & M University for the Smart Grids workshop, where Sam shared her poster, “ShareDB: A Licensing Model and Ecosystem for Data Sharing.”
Many speakers throughout the day discussed the difficulties they face with data sharing, whether the barriers are proprietary rights, size, the need for real-time data, or the sensitive nature of the Critical Energy/Electric Infrastructure Information (CEII) being shared. Mark Rice, of the Pacific Northwest National Laboratory (PNNL) commented that "Nationally, we just don't share data."
Meeting Overview ✥
The meeting was an opportunity to bring together PIs and students from all currently-funded NSF Big Data initiatives. The lightning talks and panels outlined progress on projects across all regional hubs and spokes, and identified current challenges.
Here is a full list of speakers and their presentation slides for all 3 days.
Data Sharing Challenges ✥
Many project PIs across the various regional hubs and spokes stressed the difficulties that they are currently experiencing with cross-organizational data sharing, particularly in terms of licensing, intellectual property, and trust.
PI Sam Madden (MIT), spoke about our current progress on the data sharing spoke initiative within the Northeast Big Data Innovation Hub, addressing many of the barriers that researchers face when trying to share their data.
The poster session on the 16th was a great success, with several attendees engaging in discussion about the data sharing spoke initiative. We made connections for potential collaboration across the regional hubs, and the data sharing license agreement examples are starting to filter in.
Workshop Overview ✥
The annual Northeast Big Data Innovation Hub Workshop was held at Columbia University on February 24th, 2017. Academic and Industry professionals from the 6 spokes (health, energy, cities & regions, finance, big data, discovery), spoke about progress on current cross-sector hub initiatives and cross-hub collaboration.
Speaker Highlights ✥
The workshop lightning talks addressed current initiatives, challenges, and upcoming events:
Chirag Patel, from Harvard Medical School, addressed finding a way to link disease and environmental data via exposome data warehouse and OHDSI.
Carsten Binnig, from Brown University, spoke about creating a data sharing platform with built-in licensing agreements to facilitate easier/safer data sharing between industry and academia, building on top of MIT’s pre-existing data sharing technology. This work is part of the larger project I'm on with the Metadata Research Center, Drexel University, and the Northeast Big Data Innovation Hub Data Sharing spoke. Our last workshop was held at Drexel University on September 29-30th, 2016. The workshop slides and final report are available on the Metadata Research Center website. The next data sharing workshop will be in Fall 2017.
Beverly Woolf, from the University of Massachusetts Amherst, discussed creating personalized education based on predictive models, helping to create more effective training approaches via adaptive technologies.
Rebecca Wright, from Rutgers University, focused on area-specific privacy and security concerns related to data and integrating solutions into technology, and mentioned two forthcoming related workshops (April 24-25 and Fall 2017).
Next, we heard from Penn State's John Yen, who addressed the current lack of existing resources for sharing near real-time cyber threat information within the trusted community. The next related workshop will be on Nov 11th, featuring stakeholders from Penn state, Rutgers, Dartmouth, and Columbia.
Stephen Uzzo, from the New York Hall of Science, addressed approaches toward data literacy: we are collecting more data than we have the capability to analyze— there is currently not enough academic training to help close the big data divide.
In an announcement by Microsoft, Vani Mandava spoke about cloud-based solutions can help to connect partnering hospitals for patient risk-admission predictive analysis.
Breakout sessions highlighted common themes of algorithmic bias, data sharing, privacy/security, and scale of focus e.g., hyperlocal.