Lua Task 9 - Date formats (advanced)

[edit]

Prerequisite: Lua Task 7 - Wikibase client. This task requires a lot of research and independent learning and is considerably more difficult than the introductory seven tasks. You should have successfully and comfortably completed all of the introductory tasks before attempting any of the advanced ones. It is not suitable for beginners to programming, although students new to Lua with previous experience in other programming languages should be able to produce acceptable solutions. Read through the entire task before starting work on it.

Background

[edit]

On the English Wikipedia, we find 5 types of allowed date formats:

Wikipedia always gives month names in full (i.e. December, not Dec) and only uses all-numeric dates in special cases. Text that is imported from outside of Wikipedia may be in any of large number of formats, or sometimes malformed but understandable. There is often a need to take a piece of text and extract a date from it, displaying it in the required format. This is a common sort of task in natural language processing.

Examine the table below:

Date formatting
Text Format Date
31 december 2019 31 December 2019
31 December 2019 mdy December 31, 2019
December 31, 2019 iso 2019-12-31
31 December 2019 (uncertain) year circa 2019
31 December 2019 iso 2019-12-31
29 February 2004 (uncertain) mdy circa February 29, 2004
29 February 2005 (uncertain) mdy Invalid entry
31/12/2019 2019-12-31
2019-12-31 mdy December 31, 2019
2019 (uncertain) circa 2019
31 31
31 December 31 December
31 2019 2019
sometime around 27th December 2019 circa 27 December 2019
sometime around 3rd December 2019 circa 3 December 2019
on the 16th of December in the year of our Lord 1770 16 December 1770
99 red balloons 99
20/06/2019 mdy June 20, 2019
31 August 103 AD 31 August 103 AD
31 August 2019 BC 31 August 2019 BC
31 August 2019 BCE 31 August 2019 BCE
31 August 103 CE 31 August 103 CE
2019-08-31 2019-08-31
31 August 213 31 August 213
213 213
31 August 13 31 August 13
31 August 13 BC 31 August 13 BC
30 BCE 30 BCE
3 may 2017 3 May 2017
3 Jan 2017 3 January 2017
3 jan 9 AD 3 January 9 AD
31 February 2013 mdy Invalid entry
the quick brown fox Invalid entry
4 and 20 blackbirds ... Invalid entry

Notes:

Requirements

[edit]

This task requires you to create your own function which can take text such as may be found in the first column and an optional format parameter. It will output a date either in the requested format or in a format matching that of the text supplied. You should test your function against all of the text shown in the table above, at least. Copy the table into your sandbox and change the entries in the last column to make calls to your function by supplying the value in the first column and the format in the second. See how close you can get to reproducing the table above.

To complete this task you will need to make use of the techniques you learned in the first six tasks, as well as doing further research on string-handling functions and patterns, and possibly making use of other libraries.

You must work in a fresh module sandbox and user sandbox. If I were doing the task, I would use Module:Sandbox/RexxS/Dates and User:RexxS/Sandbox/Dates.

Meeting the requirement to add "circa" may prove to be difficult, so get a function working without it to start with, and consider adding it in later if time permits. If you can get a function working without it, that is the minimum needed to successfully complete the task if you have no more time, but please try to accomplish the entire task if time permits.

Hints and tips

[edit]