Note: This was Phase 1 of a two phase project. To See Phase 2, the active one currently please click here.
At Strata RX 2012, we (NotOnly Dev) released the Doctor Social Graph — a project that visually displays the connections between doctors, hospitals and other healthcare organizations in the US. This conglomeration of data set shows everything — from the connections between doctors who refer their patients to each other to any other data collected by state and national databases. It displays real names and and will eventually show every city.
This is THE data set that any academic, scientist, or health policy junkie could ever want to conduct almost any study.
Our goal is to empower the patient, make the system transparent and accountable, and release this data to the people who can use it to revitalize our health system.
It is very difficult to fairly evaluate the quality of doctors in this country. Our State Medical Boards only go after the most outrageous doctors. The doctor review websites are generally popularity contests. Doctors with a good bedside manner do well. Doctors without strong social skills can do poorly, even if they are good doctors. It is difficult to evaluate doctors fairly. Using this data set, it should be possible to build software that evaluates doctors by viewing referrals as “votes” for each other.
This data set could be the best source of public information about the quality of doctors ever. More importantly, it should help doctors to encourage other doctors to improve their skills — for example, by seeking board certification. This data set will allow patients and administrators to evaluate the health system on both micro and macro scales and give them the tools to take steps towards addressing inefficiencies.
This data set, which we got from a carefully formed FOIA request against the Medicare claims database, shows how hospitals, doctors and other organizations work together. This data set was released under an “Open Source Eventually” License to Strata RX attendees. The only way to get access to this data set right now, before the data set becomes Open Source next year, will be to participate in this project. Act now, because all of the really amazing discoveries in this data set will made in the next few months, by those who either attended Strata RX, or who participate in this project.
Code the change you want to see in the world
This data set can be made substantially more valuable by merging it with other “openish” data sources on the performance of doctors and hospitals. We want to turn this into the ultimate source for open doctor and hospital data.
Almost every State Medical Board in the US releases a report about the doctors in that state. This usually includes information on the doctors medical school, information about board certification and information on disciplinary actions against the doctor.
All of these state-level data sources believe that it is a appropriate to charge $50 to $1000 for copies of this data. Frequently, the states release data that is not yet linked to the NPI data. Sometimes some data is only available in PDFs etc etc. In short this data is currently available, but it is either messy, confusing and disconnected… or it is organized but expensive.
As a result it is not possible to get a full profile for a particular doctor, as they potentially move between states, without paying for expensive data aggregation services. These services charge as much as $150 to data on a single doctor. At those kinds of prices, there is simply no way that a data scientist can afford to really do any significant work on doctor data.
This crowd funded project will enable us to purchase all of this data from the various public sources that sell it, and then to perform the conversion required to merge this data with the core NPI database. Our calculations indicate that for $15k we can comfortably get the state medical board data from every state in the union.
We want to release this data back into the open data community! We will provide this data in clean formats such as csv, json or xml. But we also want to be able to provide exclusive access to this data set as a reward for participating in this crowd funding project. We came up with “Open Source Eventually” as a perfect compromise.
Our compromise is to use an “Open Source Eventually” license for the data. If you contribute $100 to this campaign we will provide you with private access to this data for six months before it automatically reverts to a Creative Commons license. (specifically the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0))
$100 for six months of exclusive access to one of the most detailed social graphs ever available is pretty reasonable. The whole point is to enable researchers who are willing to help us study this data in the open to have cheap access to a rich data set. If you are willing to innovate in the open, then your expenses should be minimized.
After six months, this data will become available to everyone under the above license. For $100 you get early access. That means that you get to be the one to write new software, submit the new NEJM article or whatever. All of the cool discoveries in this data set should happen in the first six months.
However, if you want to take this “referrals and more” database, and you want to merge it with your proprietary dataset or application then you will need to contribute more money. This way, those who are seeking to capture value from this data (by building or extending a business in some way) will help more to contribute to the open research from this data. A proprietary friendly license will allow you do anything you want with this data, with the exception of merely republishing it.
We might continue to sell the data after this Medstartr campaign, but we will at least double our prices for access to the data after the Medstartr is over. The people who participate in this fund-raising campaign will have the best price for this data set.
A nice vacation in Hawaii. Just kidding.
There are lots of holes in the data that we have. We do not have referral data for the doctors who serve veterans in the VA, we do not have any referral information about kid doctors. But we think we could fix that with further FOIA requests to the VA and Medicaid. We would also like to see if we could get the graph for doctors who get money from CMS in different ways (i.e. Medicare Advantage etc).
There is information about hospitals that is available from the IRS, or from the hospitals themselves. There are some interesting data sets regarding surgeons relationships with implanted device manufacturers. The list of wonderful things that need to be added to this data, just goes on and on.
We are pretty confident that we can continue adding new data to this open data set up to around $100k. As long as you keep giving money, we will keep increasing the amount of data we give data back to you. The more money we get, the more data we will be providing to you. Which brings us to:
Yes, that is true. The main benefit that supporters of this project get is early access to this data. But if you can afford to contribute $100 more, then we can find some more data to add to this open data set. Who knows, maybe that extra money will enable you to get clean data in just the format that you need, to enable your data research process.
If you are willing to sponsor this project at the $5000 level, then we will actively consult you regarding what data to acquire next. We are specifically committing to “freeing” all of the state medical board information, but there are lots of directions to go in next. Our $5000 level sponsors (whether they want credit or would like to stay anonymous) will help us determine how we spend any money over $15k. Of course, sponsorship at this level also includes any of the other rewards, including access to the data set with a proprietary license.
Simple. No harm no foul. We have plenty of other work that pays better than this, so do not think we will go hungry or anything. We are doing this as one of our side-projects to benefit the data science community. If they data science community does not need/want this data, then we will live.
But, as an added reward, if you were willing to contribute to this Medstartr campaign and it did not make, then we will still provide you with a copy of the referral data set that we released at Strata RX. Participating in this Medstartr at the $100 level or above, is the only way to get a copy besides attending Strata RX. And, if you commit to pay, you will get a copy of this data no matter what.
If we make, we will be able to provide you with 10’s of GB of data about specific doctors and hospitals, but even if we do not make, we will provide you with 2 or 3, just for believing. So, it’s pretty much win-win for you. If we make you pay and get a huge amount of doctor data. If we do not make, then you get early access to a little doctor data for free!!
NotOnly Dev is a Health IT software incubator company formed by Fred Trotter, Rick Trotter and Ashish Patel. We are a “not-only-for-profit” company. Of course, we are still a for-profit endeavor, but we have a very specific social mission: To use software and data to empower patients. On some projects, we make money. On others our goal is to make patient’s lives better. Most of the time, we can find ways to do a little of both at the same time. You are welcome to hire us for your healthcare development project. We encourage that.
For more crazy ideas: Patient Skunkworks Projects
You give us money. We give you lots of doctor data.
Here is a sample that shows what the file looks like when searching (using grep) for a specific NPI, in this case Methodist Hospital in Houston TX
>grep 1548387418 refer.2011.csv > Methodist_Hospital_Referrals.csv
Results in the following data. It is of the form:
Here is the link to the full results of that search http://pastebin.com/E7Mv8RmL
Thank you for your interest and support!
I just want to browse: 1 YEAR OF UNLIMITED SEARCHES using the web portal that we are building for browsing the data set ($30 after Medstartr)
CODE THE CHANGE SHIRT: The t-shirt will feature the phrases: "I hacked the healthcare graph" and "code the change you want to see in the world".
OPEN SOURCE DATA PURCHASE: You will get the entire database under an Open Source Eventually (viral) data license. This will give you access to everything, but you will not be able to integrate this data with any data that you are unwilling to release. See the text for what we mean by "Eventually".
The t-shirt and the data: If you want a t-shirt and the open source data, that will cost you.
A limited edition print, celebrating the release of this data set, from renowned patient artist Regina Holiday. Her art frequently goes for $5k at auction, so these should almost immediately be worth more than you paid for them. Plus they are dripping with awesome. And, you still get a T-shirt!
Proprietary-friendly Data License: This will ensure that you are able to use all of this data in any way you like (except just offering it for direct download, etc etc) without concern that your own data/software would need to be released. If you want to build a proprietary product with this data set, this backing level is for you.
Loud Partner: If you would like to specifically sponsor our work, and/or you would like to help direct how we gather data beyond the $15k mark, this level is for you. As a bonus, this level will include full credit for sponsoring.
Silent Partner: If you would like to specifically sponsor our work, and/or you would like to help direct how we gather data beyond the $15k mark, this level is for you. As a bonus, this level will include us never telling anyone that you sponsored.
GRAPH YOUR NETWORK: If you want your own network of doctors analyzed using our GUI tools (including the graph laid out on a map) This is for you. Includes 10 hours of on-site consulting.
No updates found .
No comments found .