Final Code and Report#

Your Final Report should be about 10 pages long. You are allowed to have a longer report if there are many plots to show. However, the sum total amount of text found in the report should be limited to no more than 3 pages. In other words, if all the plots were removed, the report would be at most 3 pages long.

Your code should be completely finalized and submitted. The code will be reviewed with three things in mind:

  1. to verify that it does indeed generate the plots found in the report.

  2. to estimate the quantity and quality of work.

  3. to validate that you met your Challenge Goals.

The code will not be directly graded because it is simply the tool to generate the product of value–the report!

Overview#

You will deliver the following:

  • A written report as a PDF file

  • Final code submission in Replit

Things NOT necessary for this deliverable:

  • Final Presentation

Your report will focus on the plots that are to be appropriate, varied, creative, insightful, and annotated. The report will contain some English write-up, but the write-up is to be relatively short on words.

Report Sections#

Here are the sections you should have in your report:

  • Title & Author(s) : Have a title and list the authors.

  • Abstract : The abstract is a summary of the entire study, describing the context, research aim, methods, results and key conclusions. It should be 1-2 paragraphs long. It should include the focus/question, the approach, the findings and the conclusions. Abstracts should be self-contained and concise, explaining your work as briefly and clearly as possible.

  • Results : This the bulk of your report where you will display your plots. Label the figures so that when your text references them it is clear which plot is being referenced. This section must be 100% data driven and devoid of all speculation. Students are to briefly explain and describe what the data say. Do not explain why the data is that way, only that the data is that way. Any subjective opinion or speculation must be put into the Discussion section. All qualitative comments made about the data must be cited. More details below. This sections can optionally have subsections as appropriate, such as:

    • Data Baises : Discuss how the data was biased or incomplete. Consider data collection, accuracy, and the challenges present in the data itself.

    • ML Model : Explore the results of your Machine Learning model. Provide transparency into how the model makes decisions by graphing the decision space. Provide useful predictions, explore biases, potential for a feedback loop, and/or do a fairness analysis (Accuracy, Predictive Equality, Statistical Parity, Equal Opportunity).

  • Discussion : This is an optional section where you can make speculations about your data. The sky is the limit here. Go where your heart desires. Why do you think certain things are they way the data suggests? Or, why do you think the data is a poor representation of reality? Speculate away!

  • Conclusion : You should expand in more detail the conclusions of the sum total of your research. Aggregate and synthesize the results of the individual plots into a final, grand conclusion. What are the potential implications of your results? Who might be benefit from your analysis and who might be excluded, or worse, harmed by it. Are there any biases in your data that might impact your results? Explicitly outline the limitations of your analysis and how others should or shouldn’t use your conclusions.

  • Further Research : Undoubtedly you will discover more questions as you complete your research. Explain what new questions you could research if you had more data and/or more time.

  • Challenge Goals : This is a VERY IMPORTANT section for your grade. List your challenge goals and expand on how they worked out. You won’t be graded on the quality of your write-up; you’ll be graded primarily on the sum total of the difficulty of your work.

  • References : List all the sources cited in the paper.

Results Section#

The goal of Data Science is to be Data Driven. Refrain from subjective opinions which are typically based on personal beliefs, preferences, or anecdotal evidence. If you want to conclude that one thing might be due to another you must present data to illustrate the correlation. Alternatively, you can provide a citation to a book, article or research paper that supports your position. [1] [2]

Let the data speak for itself.

Work hard to create plots that speak volumes!

Project Rubric#

The following are used to determine your score:

  • Professionally written

  • Creativity, quantity, and quality of plots

    • Visualizations are inspiring, inventive, resourceful and motivating

  • Difficulty of your project as reviewed in the Challenge Goals section

  • Code…

    • illustrates your effort to achieve your Challenge Goals

    • follows the project structure provided for you

    • has citations for all sources used. Note that all sources must augment your achievement and extend your understanding[3]

Footnotes#

[1] Articles are heavily influenced by the author’s desire to sell their work. Articles are biased in several big ways: Selection bias, Confirmation bias, Political bias, Commercial bias and Cultural bias. Consider the source of the publication (Fox vs CNN). Just because someone published it on the internet does not make it true!
[2] Identifying and Avoiding Bias in Research
[3] While you are encouraged to use the internet to help you succeed, you are not permitted to plagiarize. You are to understand all the code. ChatGPT and other sources are to be used as you would use a book to extend your understanding. You are not to leverage AI to supplant your lack of understanding.