Štítky

sobota 31. října 2009

Two same words in in the row

The following awk script finds (more or less) words that occur in the document twice in a row. It outputs the word with (something like) the surrounding text.


BEGIN {
FS = "[^A-Za-z0-9\\\\]+"
RS = "necodslkjfkas"
}

{
j = "xyz"
for(i = 1 ; i <= NF ; i++)
{
if ($i == j) { print $i "\t\t" $(i-3) " " $(i-2) " " $(i-1) " " $i " " $(i+1) " " $(i+2)}
j = $i
}
}

END { }


The quality of the script and the explanation is so poor because I have never used awk before (I prefer grep, but it is useless here for the obvious reason), and because I have to submit my thesis in 3 days. Btw. the thesis is the reason why I wrote this script, and it actually found about 10 occurrences of unwanted word (together with about 1000 false positives :-))

neděle 11. října 2009

RDS-TMC fairy tale

The following story is created solely from messages that are defined in standard specifying information codes for Radio Data System -- Traffic Message Channel:

"reduce your speed", "sports meeting. Heavy traffic has to be expected", "expect car park to be full", "police directing traffic", "only a few parking spaces available", "hockey game", "crowd", "security alert", "danger of explosion", "extra police patrols in operation", "terrorist incident", "air crash", "emergency vehicles on scene", "evacuation", "allow emergency vehicles to pass", "traffic being directed around accident area", "drive carefully" ... "gunfire on roadway, danger", "stop at next safe place" "switch off engine", "leave your vehicle. Proceed to the next safe place", "traffic has returned to normal", "drive with extreme caution".