Database and Analytics Programming Project Nationwide Faculty of Eire
Masters in Science in Information Analytics (MSCDAD_A)
Masters in Science in Information Analytics (MSCDAD_B) Postgraduate Diploma in Science in Information Analytics (PGDDA_SEPOL)
Database and Analytics Programming
Semester 1, 2021/22
This challenge is designed to guage the educational goals of the Database and Analytics Programming module as outlined beneath:
LO1: Analyse, examine, distinction and critically consider the traits of programming languages, programming environments and database methods generally utilised for knowledge analytics answer implementation.
LO2: Critically assess the challenges related to processing large knowledge datasets and examine and distinction programming for giant knowledge vis-à-vis programming for standard datasets.
LO3: Consider instruments and methods for managing the info pipeline and making ready knowledge for additional evaluation by way of knowledge wrangling, cleansing, and validation.
LO4: Critically assess strategies and practices for software program improvement as a way to design and implement knowledge programming necessities.
LO5: Consider, design and implement options for processing datasets through the use of key programming patterns and constructs for knowledge analytics, related programming languages, and appropriate database methods.
The target of this challenge is to establish and perform a sequence of analyses on a group of huge datasets which can be by some means associated or complement one another, utilising applicable programming languages, programming environments and database methods.
This shall be a group challenge. Groups ought to have Three-Four contributors.
Your challenge should incorporate the next parts:
1. Three or 4 semi-structured datasets have to be used, relying on whether or not there are Three or Four members in every group. Every group member is answerable for one dataset.
2. Datasets have to be programmatically saved in applicable database(s) previous to processing.
Three. Programmatic pre-processing, transformation, evaluation and visualisation of the info.
Four. Programmatically storing the processed output knowledge in applicable databases.
For instance, you might use Python to programmatically retrieve a semi-structured dataset (XML or JSON) or web-scraped or streaming knowledge) and retailer this knowledge in MongoDB.
You could possibly then learn these knowledge from MongoDB, to-process and rework it, within the course of creating some structured datasets that you just retailer in PostgreSQL for later utilization.
Following that you might use Python or R to conduct additional evaluation on these knowledge to search out fascinating patterns my making use of information gained in different modules (e.g., statistical evaluation,machine studying), and generate visualisations to higher current the outcomes.
Every dataset ought to comprise at the very least 1,000 information. Some applicable datasets could also be discovered at:
A listing of different potential sources shall be posted on Moodle.
The goals, methodology and outcomes of your evaluation must be offered within the type of a challenge report. This report ought to talk about the programming and knowledge processing challenges that you just encountered and the means and mechanisms you applied to beat these challenges.
The report must be round Three,000 phrases in size (excluding references), ought to use applicable educational model and referencing, and be offered within the IEEE convention format. Templates for Microsoft Phrase and LATEX may be downloaded from the IEEE .
The report ought to comprise the next sections:
This could present a abstract of the challenge goals, strategies and outcomes. Check out abstracts from papers in your literature overview to get an thought of what constitutes /dangerous
Right here it's best to present a brief motivation for the challenge, describe the relevance of the subject and state the goals of the challenge. Word that the proposed evaluation ought to reply a novel query,
which must be clearly said via appropriately shaped analysis query(s).
• Associated Work
On this part, it's best to summarise related educational work that addressed related issues or guided your selections. Word that this must be a important analysis. It must be greater than a mere abstract of the works and may talk about their limitations and implications.
This part ought to comprise :
– An in depth description of the underlying dataset(s) and your justification for selecting them.
– Full descriptions and justifications of the info processing actions carried out, reminiscent of use of APIs, databases, and many others.
– Full descriptions and justifications of the applied knowledge processing algorithms.
– Justifications for the selection of applied sciences used, reminiscent of programming languages, libraries and databases.
– Diagrams offering a visible overview of the info gathering, processing and evaluation circulation.
• Outcomes and Analysis
Right here it's best to current the outcomes of your work, making applicable use of figures, tables, and many others. It's best to present proof of how the challenge goals had been met, making certain that you just talk about your analysis findings, their interpretation(s) and implications.
• Conclusions and Future Work
On this part it's best to element what others can/might be taught out of your work. It's best to talk about your findings within the context of the analysis query(s) you elicited earlier. It's best to current the constraints of your work, i.e. this must be a important self-evaluation. Lastly, it's best to recommend potential instructions for future work. Sometimes you'll describe what you'll do otherwise or how you'll prolong your work for those who had extra time.
Right here it's best to present a whole listing of the educational works cited and on-line supplies used within the challenge. References must be included as in-text citations utilizing the IEEE quotation model.
As a group it's best to create a video presentation (most 10 minutes lengthy) that may act as a dialogue level to your work. It must be used to offer a dialogue on what you probably did, how you probably did it, why you probably did it and what you found.
Word that though particular person members shall be presenting completely different components of the video, every member of the group is anticipated to have the ability to current all elements of the work individually and with out help from different group members, if required.
It's best to create a zipper or gz archive all belongings reminiscent of program code, knowledge and system configuration particulars. Along with this, it's best to arrange a non-public GitHub repository the place every group member ought to commit their very own code and any modifications they make to code created by different group members.
Word: Having one group member be answerable for making all commits just isn't acceptable.
Every member of every challenge group ought to preserve their very own journal over the course of the challenge. The journal ought to present a short description of every activity carried out by the group member, the time spent finishing up the duty, and an outline of any challenges or difficulties encountered and the way they had been addressed.
Word: The submission of the journal is obligatory. Zero marks shall be awarded to any group member for your complete challenge if a journal just isn't submitted.
The person challenge journal shall be used to weight the marks awarded to every member of the group, as follows:
70% A poor journal that fails to offer ample data on the group member’s contribution to the challenge, the time spent engaged on the challenge or the challenges encountered.
80% An ample journal that gives rudimentary data on the group member’s contribution to the challenge, the time spent engaged on the challenge or the challenges encountered.
90% A great journal that gives fairly in-depth data on the group member’s contribution to the challenge, the time spent engaged on the challenge or the challenges encountered.
100% A superb journal that gives complete data on the group member’s contribution to the challenge, the time spent engaged on the challenge or the challenges encountered.
The challenge carries 70% of the full marks for the module, with a mark of 40% or better being required to go.
For the challenge report, code artefact and challenge presentation, there must be just one submission per group:
• The challenge report should embrace the complete identify of every group member (as per NCI official paperwork) and their scholar quantity. These have to be clearly seen on the entrance web page of the report. The report must be uploaded as a PDF doc to the Challenge Report Turnitin hyperlink on Moodle.
• The code artefact must be uploaded as a zipper or gz archive to the Code Artefact hyperlink on Moodle.
• The challenge presentation should embrace the complete identify of every group member (as per NCI official paperwork) and their scholar quantity. These have to be clearly seen in the beginning of the video. This must be uploaded as a mp4 video to the Challenge Presentation hyperlink on Moodle.
For the challenge journal, there shall be one submission for every group member. Once more, this must be in PDF format and may embrace the complete identify of scholar (as per NCI official paperwork) in addition to their scholar quantity. This shall be uploaded to the Challenge Journal Turnitin hyperlink on Moodle.
Late submissions won't be accepted except an extension has been requested by way of NCI360 and formally accredited.
The challenge shall be marked in accordance with the grading rubric offered on the finish of this doc.
6 Tutorial Integrity
Any written work created by others have to be correctly cited and must be paraphrased or summarised the place doable, in any other case it must be included in quotes. Figures not created by it's best to embrace an acknowledgement detailing the identify(s) of the creator(s). Code discovered on the web shouldn't be claimed as your individual, however as a substitute a remark must be included within the supply code indicating the place you obtained it.
College students are strongly suggested to familiarise themselves with the Information to Tutorial Integrity produced by the NCI Library .
Word: All submissions shall be electronically screened for proof of educational misconduct, e.g. plagiarism, collusion and misrepresentation. Any submission displaying proof of such misconduct shall be referred to the school’s educational misconduct committee for disciplinary motion. Grading Rubric – Database and Analytics Programming Challenge
Criterion Strong H1 = 80% H1 = 70% 80% H2.1 = 60% 70% H2.2 = 50% 60% Move = 40% 50% Fail 40%
(10%) Very difficult challenge goals are exceptionally properly offered, absolutely met and completely mentioned Difficult challenge goals are properly offered, are absolutely met and completely mentioned. Affordable challenge goals are properly offered, absolutely met and adequately mentioned. Affordable challenge goals are clear, are principally met and adequately mentioned. The goals are clear, if unambitious and are at the very least partially met and briefly mentioned. The goals of the challenge are unclear, haven't been mentioned. It isn't doable to discern if the goals have been met.
(10%) A superb important evaluation of substantive and extremely related literature. An excellent important evaluation of substantive and related literature. A great evaluation of related literature. The important evaluation facet may very well be considerably stronger. An ample evaluation of principally related literature. The important evaluation facet may very well be considerably stronger. A restricted evaluation of some related literature but it surely lacks proof of understanding. Little or no related literature reviewed. Very restricted proof of understanding.
Information Complexity and
(20%) The datasets have been properly ready and meaningfully explored.
All datasets had been saved in applicable databases earlier than and after processing. At the very least two datasets have a excessive diploma of complexity. At the very least one dataset was programmatically retrieved by way of an API or by internet scraping. The datasets have been properly ready and meaningfully explored.
All datasets had been saved in applicable databases earlier than and after processing. At the very least two datasets have a excessive diploma of complexity. The datasets have been properly ready and explored. At the very least one dataset was saved in an applicable database. At the very least one dataset has a excessive diploma of complexity. The datasets have been appropriately ready for evaluation. At the very least one dataset was saved in an applicable database. At the very least one
of the datasets is nontrivial. The datasets had been appropriately dealt with given the goals. The usage of databases could be very primary and a few inappropriate decisions could also be evident. The datasets are considerably trivial. Just one considerably trivial dataset was used. No database was used to retailer the datasets. No apparent improvement was carried out.
Information Processing Implementation
(20%) The info processing algorithms used play a
properly conceived and important function in assembly the challenge goals. The implementation considerably exceeds the said minimal necessities. The info processing algorithms used play a
properly conceived and important function in assembly the challenge goals. A number of knowledge processing methods / languages had been employed. The usage of knowledge processing algorithms is
well-thought and applicable for the challenge goals. Complete use of at the very least one knowledge programming language and a number of methods. The usage of knowledge processing algorithms is significant and applicable for the challenge goals. There may be proof of applicable use of at the very least one knowledge programming language and a small variety of applicable methods. Applicable however primary use of knowledge processing algorithms. Fundamental use of knowledge programming languages and a restricted variety of methods. Poor or no implementation. If an implementation is offered, it demonstrates inappropriate use of knowledge processing algorithms.
Grading Rubric (continued)
Criterion Strong H1 = 80% H1 = 70% 80% H2.1 = 60% 70% H2.2 = 50% 60% Move = 40% 50% Fail 40%
(10%) All components of the evaluation are automated inside a single course of management circulation. Each run of the method may end up in completely different outcomes as new knowledge is extracted and subsequently included, reminiscent of knowledge obtained by way of an API. All components of the evaluation are automated inside a single course of management circulation. Most core parts of the evaluation are automated inside a single course of management circulation. Some parts of the evaluation are included inside a bigger course of. Nonetheless, some parts of the evaluation are run as separate processes. Individually all parts of the evaluation are automated, however not essentially linked collectively as half of a bigger course of circulation. Little or no proof of automation.
(20%) Three or extra insightful findings are excellently offered and completely mentioned within the context of the area utilizing applicable references to prior work. Three or extra fascinating and non-arbitrary findings are offered and completely mentioned the context of the area utilizing applicable references to prior work. Three or extra fascinating non-arbitrary findings are offered and completely mentioned. Two or extra fascinating non-arbitrary findings are offered and appropriately mentioned. Two or extra fascinating non-arbitrary findings are offered however are poorly mentioned. Little to no nonarbitrary outcomes and/or findings are offered.
High quality of
(20%) Exceptionally properly written, with no language errors. All figures are
properly conceived, readable and accurately captioned. The IEEE template is strictly adhered to. The report doesn't exceed the size limits. All references are appropriately and accurately used. Properly written, with no
vital language errors. All figures are
properly conceived, readable and appropriately captioned. The IEEE template is adhered to. The report doesn't exceed the size limits. References are appropriately and accurately used. Properly written, however has just a few vital language or model errors. Figures are properly offered. The IEEE tem-
plate and size restrict are adhered to. References are full and accurately used. Adequately written. however as just a few vital language and/or model errors. Some figures are could also be laborious to learn. The IEEE template and size restrict are principally adhered to. References are full, and accurately used. Adequately written, with some vital language and/or model errors. Figures could also be laborious to learn or offered in a suboptimal method. The IEEE template might
not have been adopted. References are principally full and accurately used. Poorly written and affected by typographical errors and/or poor use of English. The IEEE template was not used. Figures could also be laborious to learn.
References (if any) are largely incomplete.