2007 Important Research Results
One of my recent research with colleagues investigates the clinical significance of GPR30 in the infiltrating ductal carcinoma (IDC) of the breast in an Asian population. Relationships among GPR30 expression and ERα, PR, HER-2/neu were studied using quantitative polymerase chain reaction (qPCR) in 118 IDC and 27 non tumor mammary tissues. Univariate analysis was used to study the correlation of GPR30 expression with clinical parameters including tumor/non-tumor, ER, PR, HER-2/neu, age, lymph node metastasis, lymph vascular invasion, grade and stage. Pearson correlation was used to evaluate the relationship between continuous variables. The survival significance of GPR30 was also investigated. The results had shown that GPR30 expression level is significantly lower in breast cancer tissues than non-tumor mammary tissue. The expression level of GPR30 in IDC was highly correlated with that of ER, PR, and HER-2/neu. None of the clinical parameters including age, stage, grade, lymph node metastasis, lymphovascular invasion and patient survival was correlated with GPR 30 expression levels in IDC. Multiple regression showed that PR expression is the independent prognosis factor for survival in IDC. The sum of GPR30, ER and PR expression as a measure of global estrogen responsiveness is significantly correlated with patient survival. We concluded that the clinical significance of GPR30 in IDC was not evident.
In another study, we adopted the coefficient of intrinsic dependence (CID) to identify putative signatures for classification. Our results showed that CID is promising in supervised learning. The simulations had shown that CID is robust in selecting features with different means or different variances in two classes. When applying to the breast cancer clinical array data, the genes selected by CID best classified ER+/−patients. However, CID is not appropriate to be immediately applied to unsupervised learning. Although the misclassification rate of CID was as low as those of conventional methods in most of the cases, CID suffered the curse of dimensionality the most. The small sample size relative to the number of variables (genes) is of particular concern in the microarray studies. When the sample size of the training data is small, one might yield a classifier that perfectly classifies the training sample but performs badly in the other samples. While applying CID in classification, the curse of dimensionality strikes from another direction. One may not observe a particular set of data in the training set but the same data appears in the test set. In this scenario, the probability that the object belonging to certain group is not estimable. Another way to estimate the conditional distribution, such as nonparametric smoothing, might solve the problem.