Breast Cancer Diagnosis via Linear Programming
Mangasarian, Street, and Wolberg, 1995 proposed using the (LP) formulation for breast cancer diagnosis and prognosis. Their method is currently being successfully used at the University of Wisconsin Hospital. The method involves extracting a fluid sample via a fine needle from the patient's breast lump/mass. The fluid is placed on a glass slide and stained to highlight the nuclei of constituent cells. The image is transferred to a program (XCYT) written by the authors mentioned above. The image is analyzed, and the following features are extracted for each nucleus: area, radius, perimeter, symmetry, number and size of concavities, compactness, smoothness, etc... The features comprise a 30-Dimensional vector. The next step was to compute this 30-Dimensional vector for patients known to have benign lumps and also for patients known to have malignant lumps (Thus, we obtain sets H and M). The separating hyperplane (or next best thing) is computed using (LP), yielding a discriminant function for classification of unknown samples. The success rate of this procedure is claimed to be 97.5%.
Breast Cancer Background (statistics as of 1994)
|
- 12% of U.S. women are diagnosed with breast cancer
- 3.5% of women will die of breast cancer - Constant mortality rate over the last 20 years - Early detection (diagnosis) and accurate prognosis are essential for a better chance of survival. |
| - Mammography: 68%-79% accuracy
- Surgical biopsy: 100% accuracy - Fine Needle Aspirate (FNA) with Visual Interpretation 65%-98% |
| - Mammography inaccurate
- Surgery is costly, invasive, time consuming, (and painful!) - FNA interpreted visually: accuracy too variable (depends on experience of doctor) |
An Alternative Diagnostic Method:
Olvi Mangasarian, Nick Street, and William Wolberg wrote an image analysis program
called XCYT. XCYT is capable of analyzing cytological/cellular features based on a digital
scan: it determines exact boundaries of nuclei using curvefitting. The authors propose using the FNA method
and interpreting the data using XCYT as oppose to visually (rely on a michine, not a human!)
Ten features for each nucleus are determined:
| - Area
- Radius - Perimeter - Symmetry - Texture - Smoothness - Compactness - Fractal dimension - Number of concavities - Size of concavities |

Using the LP formulation described, (click here for more details - returns to the main page) and a training set consisting of 357 known benign samples and 212 malignant samples, the authors used cross validation and found that the following features return the best results: extreme area, extreme smoothness, and mean texture. The predicted accuracy, estimated with cross-validation, was 97.5%.
The Specifics:
It turns out that benign and malignant samples are not linearly separable. As a solution, the authors
solved the LP to obtain the "best" hyperplane and then proceeded to solve a another LP for each halfspace
to partition the halfspaces further.
Example.:

Cross-Validation:
The authors build classifiers using all subsets of one, two, three or four features and one or two separating
hyperplanes. Combinations that resulted in classifiers that separated the training set well were evaluated
using ten-fold cross-validation: The predictive model is trained using 90% of the training examples and tested
on the remaining 10%. This is done 10 times, each time testing on a different 10%. The average performance
on the testing sets gives an accurate, unbiased estimate of real-world performance.
Read Mangasarian et al. for further information
Preliminary Results:
The classification method based on linear programming has been used at the University of Wisconsin
Hospitals since 1993. At the time the research paper was published (1994), the clssifier acheived
100% correctness on the 131 new cases that it had diagnosed (94 benign, 37 malignant)
Conclusion:
It may take years of experience for a physician to gain the experience to achieve a high level
of accuracy in breast cancer diagnosis. Use Linear Programming based classification we can accurately
make this diagnosis - thus saving time, money and possibly pain (avoiding surgery). What other
medical decisions can be automated? ...only time will tell!
...Click here to return to the main page
Click here to see the main reference of this page (Mangasarian)