Mark Painter

This site is currently a work in progress.  At some future date, I may get around to explaining myself more fully, but for now I have two significant intellectual interests.  One is Computer Science, especially Web applications, networks and data security. The other is a more recent diversion, I have been studying Chinese.

I constructed a Web Site, http://huamake.com/web2_0.htm , which manages to combine both interests.  On the Computer Science side, the design is pretty extreme about using RESTful concepts, such as, no state at the server side and everything cache-able at intermediate nodes.  It also uses improvements in the elegance of Web site development mechanisms that have occurred and been deployed within the past several years, such as, DOM, CSS and AJAX pretty aggressively.   For the more technically oriented, it may be an interesting example for those reasons.  This page is not hosted on Google, because Google sites doesn't let me store my own .css and .js files, probably for legitimate security concerns.

The Chinese aspect of the site is evolving into a study aid for the HSK, a standardized test for Chinese which is used as an admissions requirement for foreign students to study in China, and to certify translators.  Since I am a non-traditional student of Chinese, I became interested in the HSK as a means of proving what I had accomplished with my study.  In any case, I find it easier to memorize Chinese characters by focusing on the radicals that make up various characters.  So, I am in the process of entering structural data for all Characters used in the HSK.  I have already created links to a complete list of Characters used in the HSK, and a list of the vocabulary used in the HSK.  These in themselves might be useful to other students.  They weren't that easy to find.  In fact I had to write some coded to derive the HSK character lists.

The essence of what I was trying to do was to create a "contains" and a "contained in" relationship for each character.  I think if good data structures exist for expressing those relationships many different user interfaces could be built on top of those data structures.

One thing that I have difficulty with is how to handle indirect relationships.  For example, 淡 contains 炎 which in turn contains 火.  So, do I include 火 in 淡 's contains relation and 淡 in 火's contained in relationship, or not?  I decided not to.  This considerably reduced the data entry complexity, but in some cases made me have to think carefully about what is a component and what is not.  Doubtless, I made some mistakes.  When I want to show secondary and higher relations in the UI (the transitive closure) I recurse through the contains (or contains in) relation.  [In my UI I have done this for "contains", but not "contained in".   I am somewhat ambivalent about the best way to present the underlying information.]

I have found that most Chinese character dictionaries use a selective set of "contained in" relations to organize their content.  Some more etymology focused materials describe the "contains" relation for some or all characters.  I haven't seen anything that lets one flexibly search either way.   This would be impractical for written materials, but the Web should be able to support it easily.  So, I thought there may be an opportunity to make fuller use of a new technology for an old subject.  Then I focused on the HSK vocabulary, because it seemed like it was a small enough subset that I could apply this thought on my own to that set.  However, I think it would be cool if the idea caught on.

Some other links:
http://portal.acm.org/citation.cfm?id=894488