Sunday, June 28, 2009

Track record

Coming from a not-stellar undergrad college I not only got a chance to have a closer look at the value of "track record" in academia and life. Thanks to transition to America I also got to see how system designs can make a difference not only in realizing ones own potential but in essentially defining potentials of human beings.

Lately I have been looking at an inverse problem of learning classifiers for autism using DTI. Not being primarily a machine-learning researcher I focus on an "feature-selection" process for achieving better classification accuracies. Thanks to Chris Hinrichs' help and amazingly useful implementations of Support Vector Machines, I am now able to use these tools to play around verifying the power of different features. It's generally known that any generic computational learning problem is usually infeasible whether using Monte-Carlo methods or using deterministic methods. Feature selection is a very important problem in itself that demands exploitation of structures of the problem at hand. Just this morning I had a nice experience of using a priori information in extracting the features from the DTI data for the autism study and was able to get 100% classification accuracy using leave-one-out cross validation. The features were extracted by Nick Lange using statistical tests and more importantly using biological prior. Statistical tests are usually only a verification step.

Now why all the feature selection mumbo-jumbo for the post titled "Track record". Well, recently Scott posted on his blog about "two-conference solution" for better feature selection in theory community using Innovations in Computer Science (ICS). See, besides which schools you have graduated from, these conferences, journals are fundamentally involved in feature selection process for either a binary classification (good researchers vs. bad researchers) or multi-class problem (exceptional, average, survivalist, bad etc.). Every field has these coveted conferences (like SODA/FOCS/STOC for theoretical computer science, ICCV/CVPR for computer vision, IROS/RSS for robotics) people die-hard to publish in. Even though publishing and bringing in grant money and good feedback in teaching etc. all form huge part of track record for a tenure track publications are the most important and independent dimension necessary for discriminative analysis. There is always a need to balance between "false positives" and "false negatives" in any learning problem in addition to taking care of "outliers/wrong labels".

Since it's hard to change the influence of publications (or find another as uncorrelated feature) on a track record it's important that we try to keep the data in that feature as independent and unbiased as possible. For that there have to be "checks and balances" between types of efforts encouraged in research. This might involve creating new venues for newly discovered efforts. Thanks to worlds most individualistic and worlds biggest democratic society America tends to find such balances in time most of the time (even in establishing track records) and that's what keeps doors open for the underprivileged while banning imposters!

No comments: