You are here


Data collection

For openmediaid to function properly quite a lot of seed data has to be collected and added to the system. The system needs to know about the medical language as well as have a repertoire of medical profiles used to describe the common cases of known diseases. These archetype profiles can be defined based on existing literature (books of medical knowledge), (meta-)studies and information gathered from existing databases. We need quite some man power to accomplish this task and the main challenge is to build and manage the community of volunteers to do that.

We plan to organize collaborations with groups of students and medical research or educational institutions. We will build tools (visual editor with language processing support) to make data collection an educational and enjoyable experience and scalable process. We want to organize hackathons and have ideas like hosting a data collection summer camp.

For the machine learning to function properly, we need to grow our sets of real patient medical records because the quality of clustered profiles largely depends on the sample size. We need patients to use the system and submit their stories and experience. We have ideas about establishing help-desks for patients who want to have their health record digitized and feel like needing help.

Data quality

The statistics and recommendations generated by a data-driven platform are largely influenced by the quality of underlying data. In a collaborative environment where experts and laymen alike are the sources of the medical data, special care needs to be taken to ensure a sufficient degree of its quality. Incomplete medical records, malformed, ambiguous or inconsistent data will inevitably produce artefacts.

Furthermore, OPENMEDiAID is a system striving to bring transparency into a sector where a lot of money is earned based on lack of transparency. Ergo, there will be people and organizations that don’t like the platform at all and they will try to attack it in many ways. An obvious attempt to compromise the system is by seeding fake medical profiles that support the attackers favored view.
To protect against such types of data corruption some layers of control and trust are necessary.

  • bots are be kept-out of the system
  • a medical profile belongs to a real person and only one profile per person is available
  • new, unverified profiles are not considered in clustering
  • different methods of profile verification are available (sms, post ident, helpdesks)
  • dynamic verification scores influence a profiles impact
  • there is a continuous peer-review process of profiles and other parts of the model

Similar to the elaborate permission system used by the stackexchange communities, a system of control is established using different roles that a community member can fulfil

Other factors that influence data quality:

  • Mapping of natural language based descriptions with medical particles need to be accurate and intuitive
  • Excellent editor/wizard to guide anamnesis
  • Internationalization and handling of synonyms as well as ambiguous terms
  • Striving towards complete profiles
  • Flexible data model that can evolve with growing data and requirements

User Interface

Providing excellent guidance during this process is essential for the correct functioning of the profile matching. Flawed descriptions of medical cases can not produce reliable matches with other profiles and can generate misleading information.

Supervision and Evolution

The available medical knowledge needs to continuously evolve and improve in quality. It is very unlikely that all the models, algorithms and other parts of the system will produce the expected results right from the beginning. Instead of trying to create a perfect system from scratch it is much more promising to build the possibility for evolution into its core. People need to be able to interact with it in order to add new information and adjust existing knowledge. Only if the system can learn from people may it be able to teach them.

Smart technology alone will not solve the problem of providing medical advice. As important as the right technology is a healthy community process that allows people to participate in different roles – patients, general doctors, neurologists, software engineers etc. – providing distinct types of information and supervising different parts of the system.

Information exchange and collaboration must be transparent such that everybody can get involved and understand how decisions have made. A community moderation system based on trust and reputation inspired by and will help coordinate the efforts of larger groups of people.

Tue, 08/11/2015 - 20:30

Our first meetup of our recently founded group


The article (in German) tells a part of our story and mission. Read more...


A great article (German again) about the project idea, its values and potential. Read more...

Open Medicine Initiative e.V.


Amtsgericht Berlin Charlottenburg
ID: VR34011B
Mail to: hello (at) open-medicine-initiative (dot) org



Select the newsletter(s) to which you want to subscribe or unsubscribe.