Ada Wan

I am a transdisciplinary researcher based in Zurich, Switzerland. I am primarily interested in (use-inspired) fundamental research and I appreciate creative thinking. I am also interested in philosophy, art, statistics/mathematics, and (intellectual/scientific/academic) history.

One of my current goals is positive progress towards fairness and equity in the educational context through responsible diversification and convergence of perspectives and methodologies in education and in research & development.

I have been devoting much time to the design and execution of transdisciplinary alignment and of a statistical meta/general data science (upon the availability of relevant and sufficient data). Some directions of such science would include but would not be limited to:

i. the meta-(re-)analysis/(re-)evaluation/(re-)interpretation of data statistics;

ii. leveraging machine learning methods and computing tools for scientific, technical, and philosophical insights through computational modeling;

iii. the investigation of complex phenomena both analytic and synthetic, theoretical and empirical in nature and/or culture ("culture" defined as "created/produced by humans").

In my spare time, I enjoy spending as much time as possible doing nothing while marveling at the negative space in philosophy, statistics/mathematics, history, art, and technology.

Contact: adawanwork at gmail dot com

Publications (please note: my perspective on work in the language & computing space has changed a lot since my discoveries from 2019 on. While there is some truth to be abstracted from earlier work, please read those with a grain of salt. )

Ada Wan. 2022. Fairness in Representation for Multilingual Natural Language Processing: Insights from Controlled Experiments on Conditional Language Modeling. Long version.
Ada Wan. 2022. Fairness in representation for multilingual NLP: Insights from controlled experiments on conditional language modeling. In International Conference on Learning Representations (ICLR), 2022. (Code for preprocessing: zipped, all in 1 txt file for viewing)
Ada Wan. 2021. Representation and bias in multilingual NLP: Insights from controlled experiments on conditional language modeling. Reviewed version for ICLR 2021. (The last version containing the mention of "DASH" ("data, algorithm, size, and hardware") (p. 49), submitted prior to the final reviewed version, was uploaded on 20201110 at 2050. It can be read here: v202011102050.)
Ada Wan. 2019. Towards a computationally relevant typology for polyglot/multilingual NLP. (Poster)

Earlier publications:

Ada Wan. 2018. Tel(s)-telle(s)-signs: Highly accurate automatic crosslingual hypernym discovery. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, May 2018. European Languages Resources Association (ELRA).
Ada Wan. 2018. Visualizing the “dictionary of regionalisms of France” (DRF). In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association (ELRA).
Ada Wan. 2016. Leveraging data-driven methods in word-level language identification for a multilingual alpine heritage corpus. In Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP, pages 45–54, San Diego, California, June 2016. Association for Computational Linguistics.
Steven Bird, David Chiang, Friedel Frowein, Andrea L Berez, Mark Eby, Florian Hanke, Ryan Shelby, Ashish Vaswani, and Ada Wan. 2013. The international workshop on language preservation: An experiment in text collection and language technology. Language Documentation & Conservation, 7:155–167.

Syllabi, presentation slides, notes/manuscripts, proposals

20220822: Syllabus: Foundations of Linguistic Theory. Context: https://twitter.com/chrmanning/status/1555982606260285440. Open for suggestions and comments.

Remark 20240314/20240430: the syllabus was only intended to be an intermediate solution / temporary syllabus for the past couple of years for linguistics. But beyond/without linguistics (qua linguistics), it can also be good for the next few years or so to help offset certain narratives on language that some might have regarded as popular in the structural linguistics paradigm --- as part of general education in the humanities (e.g. history) and/or the social sciences, if/when/where offered. Learn and unlearn. Technologists: it is useful too to know what is and is not relevant to computing, how, and why.

20231204: The Science of Language Modeling & Conditional Language Modeling (and other caveats for tech/policy-makers, academics, & civilians) [modified version]
20250204: Notes to "no 'w/s/ls/g/l'". Open for suggestions and comments.
20250402: Higher Education: Paradigm Shift Driven by Science and Digitization. [Spreadsheet: open to comments] [PDF]

Selected posts on Twitter/X

Ada Wan [@adawan919]. 2022. "The point is think of reality as a continuum in some very finest, superhuman-fine, granularity. Some may be able to see and understand the connectedness of everything, while others might just see certain (more disparate) items that catch their attention (they are able to 1/...". Twitter/X, 20220602, https://x.com/adawan919/status/1532335891448057858 (thread with 3+2 posts). (Last accessed 20240528.) [Screenshot]
# My theory of everything in a tweet
Ada Wan [@adawan919]. 2023. "Take 'words'/'sentences' as an example: ppl don't express themselves (speak/sign... or even write, most of the time depending on the context) in these units naturally. Reinforcing these units or an idealized 'parse' (which entails an idealized, qualitative interp) in terms of 1/....". Twitter/X, 20230103, https://x.com/adawan919/status/1610377376042467328 (thread with 6 posts). (Last accessed 20240528.) [Screenshot]
# "Word" and attitude (or "...m...m...m...my" vs "m(a)-m(a)-m(a)-m(y)")
Ada Wan [@adawan919]. 2023. "When 'words' are NOT the way to go in computational processing and 'meaning' does not suffice...". Twitter/X, 20230324, https://twitter.com/adawan919/status/1639252583440355328. (Last accessed 20240430.) [Chart to variability in word count vs. character count]
Ada Wan [@adawan919]. 2023. "[I'd consider morphology resolved, feel free to write in your objections (in the next 14 days): here, to me, or through the CorporaList. I reply to all emails that don't appear to be spam. If you've sent one to me & I haven't replied, pls feel free to tweet/X@ me here as well.]1/". Twitter/X, 20231018, https://x.com/adawan919/status/1714665016987693144. (Last accessed 20240529.) [Screenshot]
[Original email thread from the CorporaList containing my communication from 20231018, uploaded as a screenshot in the 2nd post of this X thread: a thought experiment on internal bias wrt morphology]
# Open call for objections to the resolution of morphology (no objections received)
Ada Wan [@adawan919]. 2024. "Revisiting Drew Conway's Data Science Venn Diagram ...". Twitter/X, 20240126, https://twitter.com/adawan919/status/1750669250233081989 (thread with 4 posts). (Last accessed 20240314.) [Screenshot]
# TLDR: tools, interpretation, evaluation
Ada Wan [@adawan919]. 2024. "Here is another attempt to point out, more succinctly, some implications of FaIR/R&B, esp. wrt 'language': ...". X, 20240420, https://x.com/adawan919/status/1781449436410810500 (thread with 7 posts). (Last accessed 20240528.) [Screenshot]
# Decomposition of "language" (general)
Ada Wan [@adawan919]. 2024. "I have been using the terms 'p-lg' and 'g-lg' ('particular language' and 'general phenomenon of language' from Haspelmath (2019): https://dlc.hypotheses.org/1741). ...". X, 20240501, https://x.com/adawan919/status/1785753718102667641 (thread with 9 posts). (Last accessed 20240528.) [Screenshot]
# Decomposition of "p-language", part 1
Ada Wan [@adawan919]. 2024. "... Let's continue with the exercise, the decomposition of "p-lg", from 1/ ...". X, 20240503, https://x.com/adawan919/status/1786399510819697102 (thread with 6 posts). (Last accessed 20240528.) [Screenshot]
# Decomposition of "p-language", part 2
Ada Wan [@adawan919]. 2024. "On text analytics & 'meaning': The direct jump from data to meaning is too coarse-grained.Instead,one needs to regard/treat data as information & clarify the diversity in human judgements & reactions towards such.There should be much to bridge/connect & many biases to point out.". X, 20240528, https://x.com/adawan919/status/1795419502429065502. (Last accessed 20240528.) [Screenshot]
# Text and "meaning"
Ada Wan [@adawan919]. 2024. "But in light of (my) recent discoveries in computational generalization capabilities (incl. interpretations!)& to help prevent practitioners from reading too much into text from computational results,it's probably best to use the term 'data analytics' instead of 'text analytics'.". X, 20240528, https://x.com/adawan919/status/1795421264317678055. (Last accessed 20240528.) [Screenshot]
# How "data" (and data) can help broaden scope instead of perpetuating an insistence to seek info/"meaning" from text alone
Ada Wan [@adawan919]. 2024. "I was asked by an undergraduate as to what some of my comments on @X/Twitter have been about and if I had any advice to share on being a transdisciplinarian or from research. This was my reply: ...". X, 20240604, https://x.com/adawan919/status/1797982157220233545. (Last accessed 20240604.) [Screenshot]
# To those who aspire to be in research, are new to my initiative/mission in education and research, or are simply wondering what I have been doing on X/Twitter
Ada Wan [@adawan919]. 2024. "... Most usages of the term "language" can be summarize [recte summarized] as: i. some artefacts/data (e.g. texts, speech, signs, i.e. text/audio/video data --- both in and beyond the context of computing); and ii. some subject[ive] element (sentiments/values/intent/attitude/identity/politics/ideology/anything)...". X, 20241115, https://x.com/adawan919/status/1857537770220146743/photo/1. (Last accessed 20250123.) [Screenshot]
# A simpler reformulation of the decomposition of "language"

Work in progress

Ada Wan. 2024. Microscopes and telescopes: Trading in black boxes for a lens with multitexts, network depths, and statistical comparisons. (What I had intended to submit in fall 2021, but didn't: screenshot) [OpenReview version posted on 20240425, please note CC BY-NC-ND 4.0 licensing]
20240430: Selected work directly/indirectly supporting no "w/s/ls/g/l" (No "word(s)", no "sentence(s)", no "linguistic structure(s)", no "grammar(s)", no "language(s)"!)
Ada Wan. A statistical untypology in finer granularity with parallel data --- from text to science. (Earlier proposal to database for re-education & re-evaluation here (Feb2023).) An even earlier version of this work was mentioned in the rebuttal for FaIR here, titled: A statistical typology of (textual) language in finer granularity.

[20240430] Context to delay in writing up this work and modifying the title/work: I realized the extent of miseducation in and "addiction" towards "language" and how my suggestion to rank varieties would have been or could have become a bad idea when so many seem to be interested in inappropriately divergent directions. It was intended to be an exercise with no determinate solution, but one through which one could experience how language (or how no language) works.

Ada Wan. A transdisciplinary approach to scientific insights in finer granularity: ML-inspired progress and opportunities. (See "Higher Education: Paradigm Shift Driven by Science and Digitization" under heading Syllabi, presentation slides, notes/manuscripts, proposals above.)
Ada Wan. 2025. The Last Treatise (on that which used to be known as "language"). [Versions: 202509091848, 202509302250, current]

Discussions and responses to calls from the "language (and computing) space"

On the public mailing list CorporaList (https://list.elra.info/mailman3/hyperkitty/list/corpora@list.elra.info/)

Earlier attempt to persuade the community in the "language space" to adopt more inclusive and scientific terminology:

20230208-20230216: Correspondences on the reformulation and reinterpretation of "MWEs".

Further correspondences/notifications sent to researchers posting calls:

20231019: From lemmatization to morphology (4 PDFs, 20231010-20231019): https://drive.google.com/drive/folders/1f2Bxv9I-JJzjnv7_uv-RQ0FJxT48WLPF?usp=sharing

20231103: Selected 20 threads in PDFs (20230512-20231024): https://drive.google.com/drive/folders/1TP5ZBC8IQ0vQzEueq6U9iub2uwbpryZd?usp=sharing

In the past years up to November 2024*, I have invested much time and effort in personally notifying various (former) colleagues in the "language" space (i.e. those in linguistics, computational linguistics, NLP, digital humanities, computer science...) of my findings by replying to their posts on the CorporaList and/or ML-news mailing lists (may these posts be calls for new positions or event participation). To many, I have suggested various options for task/project reformulation as well as more appropriate/ethical alternatives for professional development. I have come to notice that those who do intend to practise with integrity have already changed course, and those who don't are ones who ought to be dealt with via more official means (as there are safety/ethical/technical/scientific/legal concerns in undertakings related to "language" to which professionals must pay heed). Some of these emails I have spent hours writing and I simply cannot continue the way I have. Besides, researchers have the responsibility to keep themselves updated/informed of new findings. I am not obligated to send out any courtesy notifications/reminders.

*Update 20250828: notifications have resumed in the course of 2025 and it'll be an on-going process to have these be documented here: https://docs.google.com/spreadsheets/d/1IPjO4qw8STBwzhTi17BpvbzULO9NsM1gDQDRiAol2YE/edit?usp=sharing
Academic misconduct and/or waste, fraud, and abuse in form of higher education. A list for the complaint:
https://docs.google.com/spreadsheets/d/1W0gLjQ9CE-Z3YJjVJHkWGUGV5FVJB0F-cQLlEkchT7w/edit?usp=sharing (note: work in progress)

Other "public outreach"

I occasionally take on the unpopular role of calling out bad research or bad conduct in research on Twitter/X @adawan919 (because I have to*). Follow only if you are brave. 😜 (But I'm looking forward to a day when there is no need for such a thankless task!)
Let's practice research integrity with no "w/s/ls/g/l"! (No "word(s)", no "sentence(s)", no "linguistic structure(s)", no "grammar(s)", no "language(s)"!) Note: "no" here could denote the inexistence/absence of something or as part of an imperative (e.g. for ethical reasons).

*My story (and this is how I have been notifying my service providers in the meanwhile):

20250415: A general briefing

Contact: adawanwork at gmail dot com

last updated: 20251108 (site), 20251108 (file content on site)

Google Sites

Report abuse