Another great resource for the analyst is What Statistical Analysis Should I Use? courtesy of the Institute for Digital Research and Education (IDRE) at UCLA. The linked document outlines different tests very clearly, with working examples in Stata. They also have examples for SAS and SPSS which you can look up in this handy table if you are so inclined. Each statistical test gets a short paragraph describing what it tests, for which types of variables it's appropriate, and how the test relates to other methods. I especially like that important assumptions, which can easily be overlooked, are pointed out explicitly.
Take for example the two-sample t-test above. For any range-based variable you can calculate the t-statistic and then use the limiting distribution to estimate the confidence interval/p-value. However, the limiting distribution is only applicable for range-based variables that are normally distributed. So if your variable of interest cannot be assumed to be normal, a t-test is absolutely inappropriate. As with most things Stata, the document is geared towards causal analysis. This means that terms like "dependent" and "independent" variable are thrown around. In a two-sample example, the "dependent" variable is the variable of interest and the "independent" variable is an indicator of which sample a given observation belongs to. I came across this document when trying to answer the question: how do I test if two non-normal samples arise from the same distribution? In my case, the Wilcoxon-Mann-Whitney test (or the Kruskal Wallis test on two samples) would be more appropriate than a two-sample t-test.
The list of methods provided by "What Stat...?" is far from exhaustive. Other tests that I have found and believe would also be appropriate are Kolmogorov-Smirnov and Anderson-Darling. Which yields a new question: how can the analyst reconcile contradictory results from different non-parametric tests? The subject of a later post I believe.
I recently came across The R Inferno written by Patrick Burns. In a satirical style following Dante's Inferno, he discusses pretty much every stack exchange question I've ever looked up. Among the many new things I learned and immediately implemented:
"NOTE: Failing to use drop=FALSE inside functions is a major source of bugs. " Suppose you have a matrix my.Matrix and you want to take some subset my.subset of the rows in M.
> M[ my.subset, ] If my.subset has length 1, then the above code will return a single vector. > M[ my.subset, , drop = FALSE] Even if my.subset has length 1, the above will return a matrix with a single row. |
AuthorJust an aggregation of things I like. Categories |