Picture

Ted Hart

I’m a senior data scientist in silicon valley and adjunct faculty at the University of Vermont. I build things for data: things that process it, parse it, visualize it, and analyze it. I like my beer cold, my snow deep, my mountains high, and my data open. I am a recovering academic.

Ted Hart

-

ecologist / data scientist / developer

The open data challenge

On the heels of the flurry of discussion about data sharing, I'm interested in expanding on Greg Wilson's open scoop challenge. He is looking to find anyone who has been scooped by sharing their data. This seems like a high bar to me. So I'd like to lower it. Here's my thought experiment. What data set exists that you can publish multiple papers from the exact same dataset? That is, you aren't just carving up a large dataset into least publishable units? <!--more--> From my own experience I've collected data over the course of many field seasons of the same experiment and that might be considered a dataset. I've used different parts of that dataset to submit (and soon publish) multiple papers, but when I share the data, it will be the slice used to create the paper, not the entire dataset. Then I'll have gotten all I can from it, and it's open to use.

I'm looking for anyone who has published multiple papers using the exact same dataset, and there could potentially have been scooped. Not the scenario above where there's a large dataset and people create publications based on slices of it. In my reading of what PLoS requires (and I think is good to share) is the data required to recreate the paper, and given that requirement it seems like it'd be hard to scoop anyone. I'm just not sure what the scenario is where the exact same pieces of data are being used over and over again to get multiple publications. Please leave references in the comments.