Polish Authentic Texts
Agnieszka Makles
Below you can find a useful list of authentic Polish materials that will be helpful in creating an interesting corpus for Polish language learners.
Here you can find current events sources that include online news websites and popular online newspapers:
Here you can find sources with fiction such as poetry and fairy tales:
- Various poetry
- Classic poetry
- Poetry for children
- Fairy tales
- Fairy tales and other stories for children
Spoken texts:
- Interview with Martyna Wojciechowska, a Polish traveler
- Interview with Katarzyna Grochola, a Polish writer
- Interview with Agnieszka Chylińska, a Polish singer
- Audio fairy tales
- Radio RMF FM
- Radio Zet
- Audiobook "Godfather"
- Audiobook "Forest Stories for Children"
- Knitting class
- Yoga for beginners
Other materials:
Below you can find a mini-corpus of 5 Polish articles focusing on a doctor's visit.
Here are the articles that I used for the corpus:
1. How to prepare well for a doctor's visit
3. 10 minutes more for a patient
4. When a doctor visits a sick person...
I chose this topic because everyone gets sick. Being abroad in a foreign country, not understanding Polish healthcare reality, and not knowing useful vocabulary or options for scheduling a doctor's appointment can be frustrating. Using this corpus can bring the most frequent and current vocabulary to life and teach about pragmatics. If you need vocabulary for everyday life, for academic discussions about healthcare, or as a helpful glossary during doctor's visits while being sick, this small corpus delivers all.
The most frequent words in this corpus are "w", "się", "nie", "z", and "i," and the most frequent content words are “patient,” “health,” “private visit,” “specialist,” “pain,” “temperature,” “doctor,” “sickness,” and “flu” (as per AntConc). Using a word cloud and stop list, I wanted to focus on the most frequent vocabulary. I ran into the problem in Polish of inflected forms, of which there are quite a lot. I thought about eliminating inflected forms from the word cloud so I could create a clearer list of words that my students could learn rather than a complicated list of forms. The above list of frequent content words is created after I eliminated inflected words.
Overall the corpus contains 2976 words for an average text length of 595.2 words.