Moftasa posted this article on how he wrote a linux script that downloads all the obituaries published in Al Ahram newspaper between 2002 and 2008, finds all the names of people with Police titles then finds the family relations between them!
The script does the following:
- The first step is to convert the HTML files downloaded by curl into one giant text file.
- Then to move each separate obituary into a line of it’s own.
- Extract officer names sandwiched between rank and place of work into a separate text file.
- Search for the names of each officer through each obituary, family links between different officers can be discovered.
- The output is in GraphViz .dot format.
This is as cool as the Oracle Of Bacon