About our project

Note: This was Phase 1 of a two phase project. To See Phase 2, the active one currently please click here.

At Strata RX 2012, we (NotOnly Dev) released the Doctor Social Graph — a project that visually displays the connections between doctors, hospitals and other healthcare organizations in the US. This conglomeration of data set shows everything — from the connections between doctors who refer their patients to each other to any other data collected by state and national databases. It displays real names and and will eventually show every city.

This is THE data set that any academic, scientist, or health policy junkie could ever want to conduct almost any study.

Our goal is to empower the patient, make the system transparent and accountable, and release this data to the people who can use it to revitalize our health system.

Why this matters to patients

It is very difficult to fairly evaluate the quality of doctors in this country. Our State Medical Boards only go after the most outrageous doctors. The doctor review websites are generally popularity contests. Doctors with a good bedside manner do well. Doctors without strong social skills can do poorly, even if they are good doctors. It is difficult to evaluate doctors fairly. Using this data set, it should be possible to build software that evaluates doctors by viewing referrals as “votes” for each other.

This data set could be the best source of public information about the quality of doctors ever. More importantly, it should help doctors to encourage other doctors to improve their skills — for example, by seeking board certification. This data set will allow patients and administrators to evaluate the health system on both micro and macro scales and give them the tools to take steps towards addressing inefficiencies.

What we will be releasing no matter what next year.

This data set, which we got from a carefully formed FOIA request against the Medicare claims database, shows how hospitals, doctors and other organizations work together. This data set was released under an “Open Source Eventually” License to Strata RX attendees. The only way to get access to this data set right now, before the data set becomes Open Source next year, will be to participate in this project. Act now, because all of the really amazing discoveries in this data set will made in the next few months, by those who either attended Strata RX, or who participate in this project.

Code the change you want to see in the world

How we plan to use your money to make the data even better.

This data set can be made substantially more valuable by merging it with other “openish” data sources on the performance of doctors and hospitals. We want to turn this into the ultimate source for open doctor and hospital data.

Almost every State Medical Board in the US releases a report about the doctors in that state. This usually includes information on the doctors medical school, information about board certification and information on disciplinary actions against the doctor.

All of these state-level data sources believe that it is a appropriate to charge $50 to $1000 for copies of this data. Frequently, the states release data that is not yet linked to the NPI data. Sometimes some data is only available in PDFs etc etc. In short this data is currently available, but it is either messy, confusing and disconnected… or it is organized but expensive.

As a result it is not possible to get a full profile for a particular doctor, as they potentially move between states, without paying for expensive data aggregation services. These services charge as much as $150 to data on a single doctor. At those kinds of prices, there is simply no way that a data scientist can afford to really do any significant work on doctor data.

This crowd funded project will enable us to purchase all of this data from the various public sources that sell it, and then to perform the conversion required to merge this data with the core NPI database. Our calculations indicate that for $15k we can comfortably get the state medical board data from every state in the union.

We want to release this data back into the open data community! We will provide this data in clean formats such as csv, json or xml. But we also want to be able to provide exclusive access to this data set as a reward for participating in this crowd funding project. We came up with “Open Source Eventually” as a perfect compromise.

How “Open Source Eventually” works.

Our compromise is to use an “Open Source Eventually” license for the data. If you contribute $100 to this campaign we will provide you with private access to this data for six months before it automatically reverts to a Creative Commons license. (specifically the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0))

$100 for six months of exclusive access to one of the most detailed social graphs ever available is pretty reasonable. The whole point is to enable researchers who are willing to help us study this data in the open to have cheap access to a rich data set. If you are willing to innovate in the open, then your expenses should be minimized.

After six months, this data will become available to everyone under the above license. For $100 you get early access. That means that you get to be the one to write new software, submit the new NEJM article or whatever. All of the cool discoveries in this data set should happen in the first six months.

However, if you want to take this “referrals and more” database, and you want to merge it with your proprietary dataset or application then you will need to contribute more money. This way, those who are seeking to capture value from this data (by building or extending a business in some way) will help more to contribute to the open research from this data. A proprietary friendly license will allow you do anything you want with this data, with the exception of merely republishing it.

We might continue to sell the data after this Medstartr campaign, but we will at least double our prices for access to the data after the Medstartr is over. The people who participate in this fund-raising campaign will have the best price for this data set.

What will you do with the money if you get more than 15k?

A nice vacation in Hawaii. Just kidding.

There are lots of holes in the data that we have. We do not have referral data for the doctors who serve veterans in the VA, we do not have any referral information about kid doctors. But we think we could fix that with further FOIA requests to the VA and Medicaid. We would also like to see if we could get the graph for doctors who get money from CMS in different ways (i.e. Medicare Advantage etc).

There is information about hospitals that is available from the IRS, or from the hospitals themselves. There are some interesting data sets regarding surgeons relationships with implanted device manufacturers. The list of wonderful things that need to be added to this data, just goes on and on.

We are pretty confident that we can continue adding new data to this open data set up to around $100k. As long as you keep giving money, we will keep increasing the amount of data we give data back to you. The more money we get, the more data we will be providing to you. Which brings us to:

Why should I support this project even if it is already funded? Won’t I get all of this data eventually anyways if I wait long enough?

Yes, that is true. The main benefit that supporters of this project get is early access to this data. But if you can afford to contribute $100 more, then we can find some more data to add to this open data set. Who knows, maybe that extra money will enable you to get clean data in just the format that you need, to enable your data research process.

This is awesome and I want you to get a specific data set for me.

If you are willing to sponsor this project at the $5000 level, then we will actively consult you regarding what data to acquire next. We are specifically committing to “freeing” all of the state medical board information, but there are lots of directions to go in next. Our $5000 level sponsors (whether they want credit or would like to stay anonymous) will help us determine how we spend any money over $15k. Of course, sponsorship at this level also includes any of the other rewards, including access to the data set with a proprietary license.

What happens if you do not meet your goals

Simple. No harm no foul. We have plenty of other work that pays better than this, so do not think we will go hungry or anything. We are doing this as one of our side-projects to benefit the data science community. If they data science community does not need/want this data, then we will live.

But, as an added reward, if you were willing to contribute to this Medstartr campaign and it did not make, then we will still provide you with a copy of the referral data set that we released at Strata RX. Participating in this Medstartr at the $100 level or above, is the only way to get a copy besides attending Strata RX. And, if you commit to pay, you will get a copy of this data no matter what.

If we make, we will be able to provide you with 10’s of GB of data about specific doctors and hospitals, but even if we do not make, we will provide you with 2 or 3, just for believing. So, it’s pretty much win-win for you. If we make you pay and get a huge amount of doctor data. If we do not make, then you get early access to a little doctor data for free!!

About US

NotOnly Dev is a Health IT software incubator company formed by Fred Trotter, Rick Trotter and Ashish Patel. We are a “not-only-for-profit” company. Of course, we are still a for-profit endeavor, but we have a very specific social mission: To use software and data to empower patients. On some projects, we make money. On others our goal is to make patient’s lives better. Most of the time, we can find ways to do a little of both at the same time. You are welcome to hire us for your healthcare development project. We encourage that.

Twitter: @fredtrotter
For more crazy ideas: Patient Skunkworks Projects

Tldr summary

You give us money. We give you lots of doctor data.

What does the data look like?

Here is a sample that shows what the file looks like when searching (using grep) for a specific NPI, in this case Methodist Hospital in Houston TX

>grep 1548387418 refer.2011.csv > Methodist_Hospital_Referrals.csv
Results in the following data. It is of the form:


Here is the link to the full results of that search http://pastebin.com/E7Mv8RmL

Thank you for your interest and support!

Yours Truly,
Fred Trotter


