• 121

    Backer(s)
  • $ 45,011

    pledged of $ 35,000 goal
  • 1774

    days ago
  • Funded!

    This project was successfully funded.

  • This project was successfully funded on 2012-12-10

About our project



NEW NOTE: "Thank you to the Medstartr community and to Alex, and Komal and the Medstartr team for helping us go way past our funding goals.

We have a new website for the data set, which is available at http://docgraph.notonlyfor.com. There you will find information about our progress, as we move forward to gather the data. If you are a supporter, please expect to find a link to the current data download “Real Soon”…

Thanks so much we look forward to freeing more data together!!Fred Trotter

New Stretch Goal Announced!

Our first project, the Next Level Doctor Social Graph was meant to raise $15,000 in order to obtain state level credentialling data. We raised 23,720 — $8,720 over our goal — and we’ve been backed by 88 different people!!

Due to our success, we’re initiating round 2 to reach some of our stretch goals, and of course, to give you more chances to claim the awesome data and rewards! Read about it below!

Thank you for helping us reach our goal! As soon as we we have the cash in hand, we will begin purchasing, formating and redistributing this data set.

However, we have had several people contact us in the days after this Medstartr closed asking to see if they could “still buy” the data. Eventually, we will have this data for sale at a much higher price than was originally available as Medstartr rewards. Those price hikes will be pretty stiff in some cases. The main reward for people who invested early is that there was no way to know if the Medstartr would make, and there is no way to be sure that we will be able to perform this fairly large data merger. Once we have merged the data, we will charge those who “lacked faith” much more money for the data!

But that seems pretty unfair to those people who just did not know yet know about the Medstartr project. With that in mind, we have arranged with Medstartr to extend our campaign by three weeks with several stretch goals.

$15,000 – State Level Credentialing Data – Done!!

$25,000 – COMPLETED! We will integrate the CMS nursing home inspection data. This will allow researchers to replicate the excellent work that Propublica has already done in this area, and enable the analysis of this data using the referral graph.

UPDATED

$35,000COMPLETED! We will integrate the Hospital Compare Data set!

$50k Reach Goal – Provide a download with the OpenStreetMap database containing the specific location of every known (by OSM) address in the NPI database.

$75k Reach Goal – Enhanced Integration with the OpenStreetMap data set. Right now, our initial testing shows that about 50% of the addresses in the NPI database properly resolve against the OSM open geo data set. At this level we would create code and process to start addressing that problem using a crowd sourced effort. We will be contributing the resulting improvements back to OSM.

$100k Reach Goal – Integration with the new non-profit data feed from resource.org. Automatically get the non-profit tax report for all non-profit hospitals and clinics.

Thank you for your continued support!!

Project Description

At Strata RX 2012, we (NotOnly Dev) released a teaming (i.e. referral) graph — a project that details the connections between doctors, hospitals and other healthcare organizations in the US. This data set shows most of the connections between doctors, including referrals to specialist and what lab providers and hospitals they prefer to work with. It displays real names and and will eventually show every city.

With this crowdfunding, we plan on increasing the amount of data that we will release by an order of magnitude. We are calling this new, larger data set “DocGraph”.

This is a valuable data set for academics, scientists, or health policy junkies because it can be merged with other research data.

Our goal is to empower the patient, make the system transparent and accountable, and release this data to the people who can use it to revitalize our health system.

Why this MATTERS to Patients:

It is very difficult to fairly evaluate the quality of doctors in this country. Our State Medical Boards only go after the most outrageous doctors. The doctor review websites are generally popularity contests. Doctors with a good bedside manner do well. Doctors without strong social skills can do poorly, even if they are otherwise good doctors. It is difficult to evaluate doctors fairly. Using this data set, it should be possible to build software that evaluates doctors by viewing referrals as “votes” for each other.

This data set could be the best source of public information about the quality of individual doctors ever released. More importantly, it should help doctors to encourage other doctors to improve their skills — for example, by seeking board certification. This data set will allow patients and administrators to evaluate the health system on both micro and macro scales and give them the tools to take steps towards addressing inefficiencies.

What we will be releasing next year, NO MATTER WHAT:

This data set, which we got from a carefully formed FOIA request against the Medicare claims database, shows how hospitals, doctors and other organizations work together. This data set was released under an “Open Source Eventually” License to Strata RX attendees. The only way to get access to this data set right now, before the data set becomes Open Source next year, will be to participate in this project. Act now, because all of the really amazing discoveries in this data set will made in the next few months, by those who either attended Strata RX, or who participate in this project.

Code the change you want to see in the world

How we plan to use your money to make the data even better.

This data set can be made substantially more valuable by merging it with other “openish” data sources on the performance of doctors and hospitals. We want to turn this into the ultimate source for open doctor and hospital data.

Almost every State Medical Board in the US releases a report about the doctors in that state. This usually includes information on the doctors medical school, information about board certification and information on disciplinary actions against the doctor.

All of these state-level data sources believe that it is a appropriate to charge $50 to $1000 for copies of this data. Frequently, the states release data that is not yet linked to the NPI data. Sometimes some data is only available in PDFs etc etc. In short this data is currently available, but it is either messy, confusing and disconnected… or it is organized but expensive.

As a result it is not possible to get a full profile for a particular doctor, as they potentially move between states, without paying for expensive data aggregation services. These services charge as much as $150 to data on a single doctor. At those kinds of prices, there is simply no way that a data scientist can afford to really do any significant work on doctor data.

This crowd funded project will enable us to purchase all of this data from the various public sources that sell it, and then to perform the conversion required to merge this data with the core NPI database. Our calculations indicate that for $15k we can comfortably get the state medical board data from every state in the union.

We want to release this data back into the open data community! We will provide this data in clean formats such as csv, json or xml. But we also want to be able to provide exclusive access to this data set as a reward for participating in this crowd funding project. We came up with “Open Source Eventually” as a perfect compromise.

How “Open Source Eventually” works.

Our compromise is to use an “Open Source Eventually” license for the data. If you contribute $100 to this campaign we will provide you with private access to this data for six months before it automatically reverts to a Creative Commons license. (specifically the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0))

$100 for six months of exclusive access to one of the most detailed social graphs ever available is pretty reasonable. The whole point is to enable researchers who are willing to help us study this data in the open to have cheap access to a rich data set. If you are willing to innovate in the open, then your expenses should be minimized.

After six months, this data will become available to everyone under the above license. For $100 you get early access. That means that you get to be the one to write new software, submit the new NEJM article or whatever. All of the cool discoveries in this data set should happen in the first six months.

However, if you want to take the DocGraph database, and you want to merge it with your proprietary dataset or application then you will need to contribute more money. This way, those who are seeking to capture value from this data (by building or extending a business in some way) will help more to contribute to the open research from this data. A proprietary friendly license will allow you do anything you want with this data, with the exception of merely republishing it.

We might continue to sell the data after this Medstartr campaign, but we will at least double our prices for access to the data after the Medstartr is over. The people who participate in this fund-raising campaign will have the best price for this data set.

What will you do with the money if you get more than 15k?

A nice vacation in Hawaii. Just kidding.

There are lots of holes in the data that we have. We do not have referral data for the doctors who serve veterans in the VA, we do not have any referral information about kid doctors. But we think we could fix that with further FOIA requests to the VA and Medicaid. We would also like to see if we could get the graph for doctors who get money from CMS in different ways (i.e. Medicare Advantage etc).

There is information about hospitals that is available from the IRS, or from the hospitals themselves. There are some interesting data sets regarding surgeons relationships with implanted device manufacturers. The list of wonderful things that need to be added to this data, just goes on and on.

We are pretty confident that we can continue adding new data to this open data set up to around $100k. As long as you keep giving money, we will keep increasing the amount of data we give data back to you. The more money we get, the more data we will be providing to you. Which brings us to:

Why should I support this project even if it is already funded? Won’t I get all of this data eventually anyways if I wait long enough?

Yes, that is true. The main benefit that supporters of this project get is early access to this data. But if you can afford to contribute $100 more, then we can find some more data to add to this open data set. Who knows, maybe that extra money will enable you to get clean data in just the format that you need, to enable your data research process.

Of course, if you want to use this data set in a proprietary database, then the public release, under a “viral” open source license will not help you. You will still need the proprietary-friendly version of DocGraph and this Medstartr will be the cheapest way to get this data!

This is awesome and I want you to get a specific data set for me.

If you are willing to sponsor this project at the $5000 level, then we will actively consult you regarding what data to acquire next.

We are specifically committing to “freeing” all of the state medical board information, but there are lots of directions to go in next. Our $5000 level sponsors (whether they want credit or would like to stay anonymous) will help us determine how we spend any money over $15k. Of course, sponsorship at this level also includes any of the other rewards, including access to the data set with a proprietary license.

About US

NotOnly Dev is a Health IT software incubator company formed by Fred Trotter, Rick Trotter and Ashish Patel. We are a “not-only-for-profit” company. Of course, we are still a for-profit endeavor, but we have a very specific social mission: To use software and data to empower patients. On some projects, we make money. On others our goal is to make patient’s lives better. Most of the time, we can find ways to do a little of both at the same time. You are welcome to hire us for your healthcare software development project. We encourage that.

Twitter: @fredtrotter
For more crazy ideas: Patient Skunkworks Projects

Tldr summary

You give us money. We give you lots of doctor data.

What does the data look like?

Here is a sample that shows what the file looks like when searching (using grep) for a specific NPI, in this case Methodist Hospital in Houston TX

>grep 1548387418 refer.2011.csv > Methodist_Hospital_Referrals.csv
Results in the following data. It is of the form:
NPI_Seen_First,NPI_Seen_Second,Seen_Count
1184710477,1548387418,55
1548387418,1326047754,62
1548387418,1598971913,24
1548387418,1558430330,254
1548387418,1154308633,74
1548387418,1942276605,76
1548387418,1659412336,5643
1902898455,1548387418,41
1548387418,1861490005,76
1730260035,1548387418,57
1033190681,1548387418,15
1679678767,1548387418,132
1710982798,1548387418,114

Here is the link to the full results of that search.

Thank you for your interest and support!
Yours Truly,
Fred Trotter

Rewards

For $ 5 or more

0 Backer(s)


Permanent Thank You, with your name on our website as a supporter of open data.

For $ 15 or more

6 Backer(s)


I just want to browse: 1 YEAR OF UNLIMITED SEARCHES using the web portal that we are building for browsing the data set ($30 after Medstartr)

For $ 60 or more

0 Backer(s)


CODE THE CHANGE SHIRT: The t-shirt will feature the phrases: "I hacked the healthcare graph" and "code the change you want to see in the world".

For $ 120 or more

17 Backer(s)


OPEN SOURCE DATA PURCHASE: You will get the entire database under an Open Source Eventually (viral) data license. This will give you access to everything, but you will not be able to integrate this data with any data that you are unwilling to release. See the text for what we mean by "Eventually".

For $ 125 or more

5 Backer(s)


The t-shirt and the data: If you want a t-shirt and the open source data, that will cost you.

For $ 300 or more

1 Backer(s)


A limited edition print, celebrating the release of this data set, from renowned patient artist Regina Holiday. Her art frequently goes for $5k at auction, so these should almost immediately be worth more than you paid for them. Plus they are dripping with awesome. And, you still get a T-shirt!

For $ 1200 or more

11 Backer(s)


Proprietary-friendly Data License: This will ensure that you are able to use all of this data in any way you like (except just offering it for direct download, etc etc) without concern that your own data/software would need to be released. If you want to build a proprietary product with this data set, this backing level is for you.

For $ 5000 or more

1 Backer(s)


Stretch Partner -- Help us decide what the stretch goals should be and receive full credit for doing so!

For $ 10000 or more

0 Backer(s)


GRAPH YOUR NETWORK: If you want your own network of doctors analyzed using our GUI tools (including the graph laid out on a map) This is for you. Includes 10 hours of on-site consulting.

    No updates found .

    No comments found .

Login to post your comment!
Click here to Login