non significant results discussion example

Hence, most researchers overlook that the outcome of hypothesis testing is probabilistic (if the null-hypothesis is true, or the alternative hypothesis is true and power is less than 1) and interpret outcomes of hypothesis testing as reflecting the absolute truth. To draw inferences on the true effect size underlying one specific observed effect size, generally more information (i.e., studies) is needed to increase the precision of the effect size estimate. values are well above Fishers commonly accepted alpha criterion of 0.05 Recipient(s) will receive an email with a link to 'Too Good to be False: Nonsignificant Results Revisited' and will not need an account to access the content. Columns indicate the true situation in the population, rows indicate the decision based on a statistical test. Although the emphasis on precision and the meta-analytic approach is fruitful in theory, we should realize that publication bias will result in precise but biased (overestimated) effect size estimation of meta-analyses (Nuijten, van Assen, Veldkamp, & Wicherts, 2015). It's pretty neat. To conclude, our three applications indicate that false negatives remain a problem in the psychology literature, despite the decreased attention and that we should be wary to interpret statistically nonsignificant results as there being no effect in reality. Examples are really helpful to me to understand how something is done. Hi everyone, i have been studying Psychology for a while now and throughout my studies haven't really done much standalone studies, generally we do studies that lecturers have already made up and where you basically know what the findings are or should be. We begin by reviewing the probability density function of both an individual p-value and a set of independent p-values as a function of population effect size. These results promoting results with unacceptable error rates is misleading to Figure 6 presents the distributions of both transformed significant and nonsignificant p-values. nursing homes, but the possibility, though statistically unlikely (P=0.25 First, we compared the observed nonsignificant effect size distribution (computed with observed test results) to the expected nonsignificant effect size distribution under H0. Such decision errors are the topic of this paper. Was your rationale solid? At the risk of error, we interpret this rather intriguing term as follows: that the results are significant, but just not statistically so. Let's say the researcher repeated the experiment and again found the new treatment was better than the traditional treatment. I go over the different, most likely possibilities for the NS. However, our recalculated p-values assumed that all other test statistics (degrees of freedom, test values of t, F, or r) are correctly reported. Moreover, two experiments each providing weak support that the new treatment is better, when taken together, can provide strong support. Your discussion can include potential reasons why your results defied expectations. Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. Lastly, you can make specific suggestions for things that future researchers can do differently to help shed more light on the topic. A researcher develops a treatment for anxiety that he or she believes is better than the traditional treatment. These errors may have affected the results of our analyses. For medium true effects ( = .25), three nonsignificant results from small samples (N = 33) already provide 89% power for detecting a false negative with the Fisher test. <- for each variable. Fiedler et al. We sampled the 180 gender results from our database of over 250,000 test results in four steps. Simulations indicated the adapted Fisher test to be a powerful method for that purpose. Specifically, your discussion chapter should be an avenue for raising new questions that future researchers can explore. ratios cross 1.00. Third, we calculated the probability that a result under the alternative hypothesis was, in fact, nonsignificant (i.e., ). pesky 95% confidence intervals. Given this assumption, the probability of his being correct \(49\) or more times out of \(100\) is \(0.62\). Because of the large number of IVs and DVs, the consequent number of significance tests, and the increased likelihood of making a Type I error, only results significant at the p<.001 level were reported (Abdi, 2007). Some studies have shown statistically significant positive effects. When writing a dissertation or thesis, the results and discussion sections can be both the most interesting as well as the most challenging sections to write. We reuse the data from Nuijten et al. Similarly, applying the Fisher test to nonsignificant gender results without stated expectation yielded evidence of at least one false negative (2(174) = 324.374, p < .001). Therefore we examined the specificity and sensitivity of the Fisher test to test for false negatives, with a simulation study of the one sample t-test. If all effect sizes in the interval are small, then it can be concluded that the effect is small. Expectations were specified as H1 expected, H0 expected, or no expectation. This page titled 11.6: Non-Significant Results is shared under a Public Domain license and was authored, remixed, and/or curated by David Lane via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request. 178 valid results remained for analysis. Denote the value of this Fisher test by Y; note that under the H0 of no evidential value Y is 2-distributed with 126 degrees of freedom. You are not sure about . ive spoken to my ta and told her i dont understand. For example, a large but statistically nonsignificant study might yield a confidence interval (CI) of the effect size of [0.01; 0.05], whereas a small but significant study might yield a CI of [0.01; 1.30]. Hence we expect little p-hacking and substantial evidence of false negatives in reported gender effects in psychology. Application 1: Evidence of false negatives in articles across eight major psychology journals, Application 2: Evidence of false negative gender effects in eight major psychology journals, Application 3: Reproducibility Project Psychology, Section: Methodology and Research Practice, Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2015, Marszalek, Barber, Kohlhart, & Holmes, 2011, Borenstein, Hedges, Higgins, & Rothstein, 2009, Hartgerink, van Aert, Nuijten, Wicherts, & van Assen, 2016, Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012, Bakker, Hartgerink, Wicherts, & van der Maas, 2016, Nuijten, van Assen, Veldkamp, & Wicherts, 2015, Ivarsson, Andersen, Johnson, & Lindwall, 2013, http://science.sciencemag.org/content/351/6277/1037.3.abstract, http://pss.sagepub.com/content/early/2016/06/28/0956797616647519.abstract, http://pps.sagepub.com/content/7/6/543.abstract, https://doi.org/10.3758/s13428-011-0089-5, http://books.google.nl/books/about/Introduction_to_Meta_Analysis.html?hl=&id=JQg9jdrq26wC, https://cran.r-project.org/web/packages/statcheck/index.html, https://doi.org/10.1371/journal.pone.0149794, https://doi.org/10.1007/s11192-011-0494-7, http://link.springer.com/article/10.1007/s11192-011-0494-7, https://doi.org/10.1371/journal.pone.0109019, https://doi.org/10.3758/s13423-012-0227-9, https://doi.org/10.1016/j.paid.2016.06.069, http://www.sciencedirect.com/science/article/pii/S0191886916308194, https://doi.org/10.1053/j.seminhematol.2008.04.003, http://www.sciencedirect.com/science/article/pii/S0037196308000620, http://psycnet.apa.org/journals/bul/82/1/1, https://doi.org/10.1037/0003-066X.60.6.581, https://doi.org/10.1371/journal.pmed.0020124, http://journals.plos.org/plosmedicine/article/asset?id=10.1371/journal.pmed.0020124.PDF, https://doi.org/10.1016/j.psychsport.2012.07.007, http://www.sciencedirect.com/science/article/pii/S1469029212000945, https://doi.org/10.1080/01621459.2016.1240079, https://doi.org/10.1027/1864-9335/a000178, https://doi.org/10.1111/j.2044-8317.1978.tb00578.x, https://doi.org/10.2466/03.11.PMS.112.2.331-348, https://doi.org/10.1080/01621459.1951.10500769, https://doi.org/10.1037/0022-006X.46.4.806, https://doi.org/10.3758/s13428-015-0664-2, http://doi.apa.org/getdoi.cfm?doi=10.1037/gpr0000034, https://doi.org/10.1037/0033-2909.86.3.638, http://psycnet.apa.org/journals/bul/86/3/638, https://doi.org/10.1037/0033-2909.105.2.309, https://doi.org/10.1177/00131640121971392, http://epm.sagepub.com/content/61/4/605.abstract, https://books.google.com/books?hl=en&lr=&id=5cLeAQAAQBAJ&oi=fnd&pg=PA221&dq=Steiger+%26+Fouladi,+1997&ots=oLcsJBxNuP&sig=iaMsFz0slBW2FG198jWnB4T9g0c, https://doi.org/10.1080/01621459.1959.10501497, https://doi.org/10.1080/00031305.1995.10476125, https://doi.org/10.1016/S0895-4356(00)00242-0, http://www.ncbi.nlm.nih.gov/pubmed/11106885, https://doi.org/10.1037/0003-066X.54.8.594, https://www.apa.org/pubs/journals/releases/amp-54-8-594.pdf, http://creativecommons.org/licenses/by/4.0/, What Diverse Samples Can Teach Us About Cognitive Vulnerability to Depression, Disentangling the Contributions of Repeating Targets, Distractors, and Stimulus Positions to Practice Benefits in D2-Like Tests of Attention, Prespecification of Structure for the Optimization of Data Collection and Analysis, Binge Eating and Health Behaviors During Times of High and Low Stress Among First-year University Students, Psychometric Properties of the Spanish Version of the Complex Postformal Thought Questionnaire: Developmental Pattern and Significance and Its Relationship With Cognitive and Personality Measures, Journal of Consulting and Clinical Psychology (JCCP), Journal of Experimental Psychology: General (JEPG), Journal of Personality and Social Psychology (JPSP). The Fisher test statistic is calculated as. serving) numerical data. This is a non-parametric goodness-of-fit test for equality of distributions, which is based on the maximum absolute deviation between the independent distributions being compared (denoted D; Massey, 1951). Our team has many years experience in making you look professional. This is the result of higher power of the Fisher method when there are more nonsignificant results and does not necessarily reflect that a nonsignificant p-value in e.g. While we are on the topic of non-significant results, a good way to save space in your results (and discussion) section is to not spend time speculating why a result is not statistically significant. Other research strongly suggests that most reported results relating to hypotheses of explicit interest are statistically significant (Open Science Collaboration, 2015). See osf.io/egnh9 for the analysis script to compute the confidence intervals of X. Further, Pillai's Trace test was used to examine the significance . Two erroneously reported test statistics were eliminated, such that these did not confound results. Fourth, we examined evidence of false negatives in reported gender effects. It would seem the field is not shying away from publishing negative results per se, as proposed before (Greenwald, 1975; Fanelli, 2011; Nosek, Spies, & Motyl, 2012; Rosenthal, 1979; Schimmack, 2012), but whether this is also the case for results relating to hypotheses of explicit interest in a study and not all results reported in a paper, requires further research. And there have also been some studies with effects that are statistically non-significant. Bond can tell whether a martini was shaken or stirred, but that there is no proof that he cannot.