I wanted to quickly overview the word count on my blog posts to roughly calculate the possible translation count and here's a one-liner I have come up with:

find . -maxdepth 1 -type f -name "*.md" -exec printf "{} " \; -exec ~/.local/bin/mwc {} \; | awk '{print $2 " " $1}' | sort -rnk1

The output should look similar to this:

1862 ./becoming-better-presentation-creator.md
1739 ./make-ssh-prompt-password-keepassxc.md
1619 ./are-otp-secrets-stored-plaintext.md
1602 ./how-not-create-node-executable-arm.md
1596 ./three-reasons-spent-time-nature-programmer.md
1536 ./keep-gnome-shell-settings-dotfiles-yadm.md
1407 ./how-update-gooogle-calendar-pre-push-hook.md
1390 ./story-about-nfc-thinkpad-t470.md
1211 ./building-on-your-previous-work.md
1179 ./lockdown-travel-sms-sync-phone-reset.md
1038 ./most-useful-keyboards-android.md
1033 ./how-use-flashrom-archlinux-arm.md
...

The mwc command should exclude punctuation, footnotes or other markdown specialties but I did not do any extensive research yet. It should be however possible to draw a general conclusion about the translation costs. I am wondering if translators are accustomed to translate markdown already.

Requirements

The above line requires mwc command, a python markdown-word-count script. Install via pip:

pip3 install markdown-word-count

Apart from the script, the line only requires standard GNU commands.

  • Passing ls output into xargs can introduce many security risks link
  • It might be better to consider using find -exec instead link
  • There are unavoidable security problems surrounding use of the -exec action; you should use the -execdir option instead link
  • Simply passing multiple -execdir parameters to find is sufficient link
  • Narrowing results of the find command is optional link
  • Using awk for swapping columns is very easy link
  • Sorting the output via the column is specified via -k parameter link

This is a 55th post of #100daystooffload.