Biostatistics and Epidemiology: Completely Free OpenEpi Version 2. Anderson Statistical Software Library -- A large collection of free statistical software almost 70 programs! Anderson Cancer Center. Performs power, sample size, and related calculations needed to plan studies. Covers a wide variety of situations, including studies whose outcomes involve the Binomial, Poisson, Normal, and log-normal distributions, or are survival times or correlation coefficients. Two populations can be compared using direct and indirect standardization, the SMR and CMF and by comparing two lifetables.

Confidence intervals and statistical test are provided. There is an extensive helpfile in which everything is explained. Lifetables is listed in the Downloads section of the QuantitativeSkills web site. Sample Size for Microarray Experiments -- compute how many samples needed for a microarray experiment to find genes that are differentially expressed between two kinds of samples e.

This is a stand-alone Windows 95 through XP program that receives information about dose-limiting toxicities DLTs observed at some starting dose, and calculates the doses to be administered next. DLT information obtained at each dosing level guides the calculation of the next dose level. Epi Info has been in existence for over 20 years and is currently available for Microsoft Windows. The program allows for data entry and analysis. Within the analysis module, analytic routines include t-tests, ANOVA, nonparametric statistics, cross tabulations and stratification with estimates of odds ratios, risk ratios, and risk differences, logistic regression conditional and unconditional , survival analysis Kaplan Meier and Cox proportional hazard , and analysis of complex survey data.

Limited support is available. The calculation of person-years allows flexible stratification by sex, and self-defined and unrestricted calendar periods and age groups, and can lag person-years to account for latency periods. Developed by Eurostat to facilitate the application of these modern time series techniques to large-scale sets of time series and in the explicit consideration of the needs of production units in statistical institutes.

Contains two main modules: seasonal adjustment and trend estimation with an automated procedure e. Ideal for learning meta-analysis reproduces the data, calculations, and graphs of virtually all data sets from the most authoritative meta-analysis books, and lets you analyze your own data "by the book".

Generates numerous plots: tandard and cumulative forest, p-value function, four funnel types, several funnel regression types, exclusion sensitivity, Galbraith, L'Abbe, Baujat, modeling sensitivity, and Trim-and-Fill. Surveys, Testing, and Measurement: Completely Free CCOUNT -- a package for market research data cleaning, manipulation, cross tabulation and data analysis.

IMPS Integrated Microcomputer Processing System -- performs the major tasks in survey and census data processing: data entry, data editing, tabulation, data dissemination, statistical analysis and data capture control. Stats 2. SABRE -- for the statistical analysis of multi-process random effect response data.

Responses can be binary, ordinal, count and linear recurrent events; response sequences can be of different types. Such multi-process data is common in many research areas, e. Sabre has been used intensively on many longitudinal datasets surveys either with recurrent information collected over time or with a clustered sampling scheme. Last released in Mac, K; Win anticipated in September.

NewMDSX -- software for Multidimensional Scaling MDS , a term that refers to a family of models where the structure in a set of data is represented graphically by the relationships between a set of points in a space. MDS can be used on a variety of data, using different models and allowing different assumptions about the level of measurement. SuperSurvey -- to design andimplement surveys, and to acquire, manage and analyze data from surveys.

Optional Web Survey Module and Advanced Statistics Module curve fitting, multiple regression, logistic regression, factor, analysis of variance, discriminant function, cluster, and canonical correlation. Free version is limited to 1 survey, 10 questions, 25 total responses. Rasch Measurement Software -- deals with the various nuances of constructing optimal rating scales from a number of usually dichotomous measurements, such as responses to questions in a survey or test.

These may be freely downloaded, used, and distributed, and they do not expire. This Excel spreadsheet converts confidence intervals to p values, and this PDF file explains it's background and use. RegressIt - An Excel add-in for teaching and applied work.

Performs multivariate descriptive analysis and ordinary linear regression. Creates presentation-quality charts in native editable Excel format, intelligently formatted tables, high quality scatterplot matrices, parallel time series plots of many variables, summary statistics, and correlation matrices. Easily explore variations on models, apply nonlinear and time transformations to variables, test model assumptions, and generate out-of-sample forecasts.

SimulAr -- Provides a very elegant point-and-click graphical interface that makes it easy to generate random variables correlated or uncorrelated from twenty different distributions, run Monte-Carlo simulations, and generate extensive tabulations and elegant graphical displays of the results.

EZAnalyze -- enhances Excel Mac and PC by adding "point and click" functionality for analyzing data and creating graphs no formula entry required. Does all basic "descriptive statistics" mean, median, standard deviation, and range , and "disaggregates" data breaks it down by categories , with results shown as tables or disaggregation graphs". Advanced features: correlation; one-sample, independent samples, and paired samples t-tests; chi square; and single factor ANOVA.

Update Available! EZ-R Stats -- supports a variety of analytical techniques, such as: Benford's law, univariate stats, cross-tabs, histograms. Simplifies the analysis of large volumes of data, enhances audit planning by better characterizing data, identifies potential audit exceptions and facilitates reporting and analysis. Marko Lucijanic's Excel spreadsheet to perform Log Rank test on survival data, and his article.

SSC-Stat -- an Excel add-in designed to strengthen those areas where the spreadsheet package is already strong, principally in the areas of data management, graphics and descriptive statistics. SSC-Stat is especially useful for datasets in which there are columns indicating different groups.

Menu features within SSC-Stat can:. Each spreadsheet gives a graph of the distribution, along with the value of various parameters, for whatever shape and scale parameters you specify. You can also download a file containing all 22 spreadsheets.

Sample-size calculator for cluster randomized controlled trials , which are used when the outcomes are not completely independent of each other. This independence assumption is violated in cluster randomized trials because subjects within any one cluster are more likely to respond in a similar manner. A measure of this similarity is known as the intra-correlation coefficient ICC.

Because of the lack of independence, sample sizes have to be increased. This web site contains two tools to aid the design of cluster trials — a database of ICCs and a sample size calculator along with instruction manuals. Exact confidence intervals for samples from the Binomial and Poisson distributions -- an Excel spreadsheet with several built-in functions for calculating probabilities and confidence intervals. Smith , of Virginia Tech.

A user-friendly add-in for Excel to draw a biplot display a graph of row and column markers from data that forms a two-way table based on results from principal components analysis, correspondence analysis, canonical discriminant analysis, metric multidimensional scaling, redundancy analysis, canonical correlation analysis or canonical correspondence analysis. Allows for a variety of transformations of the data prior to the singular value decomposition and scaling of the markers following the decomposition.

Lifetable -- does a full abridged current life table analysis to obtain the life expectancy of a population. From the Downloads section of the QuantitativeSkills web site. A third spreadsheet concerns a method for two clusters by Donner and Klar. You will have to insert your own data by overwriting the tables in the second total number of positive responses and third total number of negative responses or fourth column total number.

A step-by-step guide to data analysis with separate workbooks for handling data with different numbers and types of variables. XLStatistics is not an Excel add-in and all the working and code is visible. A free version for analysis of 1- and 2-variable data is available. XLSTAT -- an Excel add-in for PC and MAC that holds more than statistical features including data visualization, multivariate data analysis, modeling, machine learning, statistical tests as well as field-oriented solutions: features for sensory data analysis preference mapping , time series analysis forecasting , marketing conjoint analysis, PLS structural equation modeling , biostatistics survival analysis, OMICs data analysis and more.

It proposes a free day trial of all features as well as a free version. Statistics -- executes programs written in the easy-to-learn Resampling Stats statistical simulation language. You write a short, simple program in the language, describing the process behind a probability or statistics problem.

Statistics then executes your Resampling Stats model thousands of times, each time with different random numbers or samples, keeping track of the results. When the program completes, you have your answer. Runs on Windows, Mac, Lunux -- any system that supports Java. R -- a programming language and environment for statistical computing and graphics. Similar to S or S-plus will run most S code unchanged. Provides a wide variety of statistical linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, Well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.

The R environment includes:. Review and comparison of R graphical user interfaces A number of graphical user interfaces GUI allow you to use R by menu instead of by programming. Written by Robert A. Detailed reviews of R graphical user interfaces Also by Robert A. RStudio -— is a set of integrated tools designed to help you be more productive with R.

It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. Integrated development environment Access RStudio locally Syntax highlighting, code completion, and smart indentation Execute R code directly from the source editor Quickly jump to function definitions Easily manage multiple working directories using projects Integrated R help and documentation Interactive debugger to diagnose and fix errors quickly Extensive package development tools RStudio Server Access via a web browser Move computation closer to the data Scale compute and RAM centrally Shiny A web application framework for R.

Turn your analyses into interactive web applications R-Instat R-Instat is a free, open source statistical software that is easy to use, even with low computer literacy. This software is designed to support improved statistical literacy in Africa and beyond, through work undertaken primarily within Africa. A lot of statistical functions. There is a free version and a commercial version. They both have the same statistical functions.

The commercial version offers technical support. Zelig -- an add-on for R that can estimate, help interpret, and present the results of a large range of statistical methods. It translates hard-to-interpret coefficients into quantities of interest; combines multiply imputed data sets to deal with missing data; automates bootstrapping for all models; uses sophisticated nonparametric matching commands which improve parametric procedures; allows one-line commands to run analyses in all designated strata; automates the creation of replication data files so that you or anyone else can replicate the results of your analyses hence satisfying the replication standard ; makes it easy to evaluate counterfactuals; and allows conditional population and superpopulation inferences.

It includes many specific methods, based on likelihood, frequentist, Bayesian, robust Bayesian, and nonparametric theories of inference. Zelig comes with detailed, self-contained documentation that minimizes startup costs for Zelig and R, automates graphics and summaries for all models, and, with only three simple commands required, generally makes the power of R accessible for all users.

Zelig also works well for teaching, and is designed so that scholars can use the same program with students that they use for their research. Apophenia -- a statistics library for C. Octave -- a high-level mathematical programming language, similar to MATLAB, for numerical computations -- solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations.

J -- a modern, high-level, general-purpose, high-performance programming language. J runs both as a GUI and in a console command line. J is particularly strong in the mathematical, statistical, and logical analysis of arrays of data. J systems have:. DataMelt -- free software for numeric computation, mathematics, statistics, symbolic calculations, data analysis and data visualization. This is script or programming run. O-Matrix -- an extensive matrix manipulation system for Windows with lots of statistical capability.

The "Light" version can be freely downloaded and tried for 30 days. Some capabilities include:. Plots exportable to word processors, spreadsheets, etc. Plot Types: line, contour, surface, mesh, bar, stair, polar, vector, error bar, smith charts, and histogram; line plots can contain unlimited points per curve and hundreds of curves per plot; two- and three-dimensional plotting is supported which provides additional flexibility with contours and surface plots; multiple colors, markers, and line types.

OxMetrics -- an object-oriented matrix programming language with a comprehensive mathematical and statistical function library. Matrices can be used directly in expressions, for example to multiply two matrices, or to invert a matrix. The major features of Ox are its speed, extensive library, and well-designed syntax, which leads to programs which are easier to maintain.

Versions of Ox are available for many platforms. The "Console" version can be freely downloaded for academic and research use; the "Professional" version must be purchased. Divide code into manageable sections that can be run independently. View output and visualizations next to the code that produced them. ILNumerics -- a numerical library for.

NET that turns C into a 1st class mathematical language. It offers both scientists and software developers convenient syntax similar to Matlab , toolboxes for statistical functions and machine learning, high performance, wide platform support and 2D and 3D visualization features.

There's a free "Community" edition and a pay-for "Professional" edition. Both have the same features and capabilities; they differ in how you would re-distribute them in your own software products. Scripts and Macros: Completely Free Miscellaneous: Completely Free IND -- Creation and manipulation of decision trees from data. For supervised classification and prediction in artificial intelligence and statistical pattern recognition. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which hopefully has good prediction of classes on new data.

IND improves on standard algorithms and introduces Bayesian and MML methods, producing more accurate class probability estimates that are important in applications like diagnosis. For UNIX systems. Currently available only in beta-test mode, and only to US citizens. Add descriptions to images, re-size photos for efficient e-mail transmission, print high-quality copies, display slide-shows, publish web-galleries, safe-keep images on CD or DVD.

SmartUpdate feature checks for new versions. Has a web-board for user-to-user help. A toolbox of Matlab ver. Tools are provided for analysis of measured data with routines for estimation of parameters in statistical distributions, estimation of spectra, plotting in probability papers, etc. Has routines for theoretical distributions of characteristic wave parameters from observed or theoretical power spectra of the sea. Another part is related to statistical analysis of fatigue.

The theoretical density of rainflow cycles can be computed from parameters of random loads. Has routines is included for modelling of switching loads hidden Markov models. Also contains general statistical tools.

CoPlot 6. From CoHort Software. Creates precise technical drawings using drawing objects, genetic maps, field maps, flow charts, apparatus diagrams, circuit diagrams, chemical structures, etc. Text in drawing objects and graphs can include HTML-like text formatting tags and over special characters. Supports animated graphs. Exports graphs to. Invoke CoPlot from the command line, batch files, shell scripts, pipes, and other programs.

Can be used as a graphics server program on a web site. Free time-limited demo version available. Other Links to Collections of Free Software:. Sections of the StatPages. Free, but It is not yet available for Windows InVivoStat is a free to use, statistical Windows program which uses R as its statistics engine. Supports over 1 billion cases and over 1 billion variables. Choice of terminal or graphical user interface; Choice of text, postscript or html output formats.

Inter-operates with Gnumeric, OpenOffice. Org and other free software. Easy data import from spreadsheets, text files and database sources. Fast statistical procedures, even on very large data sets. Fully indexed user manual. Cross platform; Runs on many different computers and many different operating systems. Statistics Manually -- an Andriod Mobile App.

This app contains a large collection of formulas of statistical methods common in the social sciences as well as the statistical tables needed to interpret your test results. You will find the formulas of these tests as well. Easy to use, modern interface. Specify an analysis in three simple steps within a single dialog. SYSTAT -- powerful statistical software ranging from the most elementary descriptive statistics to very advanced statistical methodology.

Novices can work with its friendly and simple menu-dialog; statistically-savvy users can use its intuitive command language. Carry out very comprehensive analysis of univariate and multivariate data based on linear, general linear, and mixed linear models; carry out different types of robust regression analysis when your data are not suitable for conventional multiple regression analysis;compute partial least-squares regression;design experiments, carry out power analysis, do probability calculations on many distributions and fit them to data; perform matrix computations.

A day evaluation version is available for free download. WinSPC day free trial -- statistical process control software to: collect quality data from devices, shop-floor machines, data sources, other software systems, or via keyboard; monitor plant-wide operations from a single screen, and initiate corrective actions for out-of-control processes trigger alarm, send email, page an operator, or shut down an out-of-control machine ; perform statistical analysis to solve problems, optimize processes, and create quality reports.

CurveExpert -- comprehensive curve fitting system for Windows. Handles linear regression models, nonlinear regression models, interpolation, or splines. Over 30 models built-in; custom user-defined regression models. Each row of the table represents an iris flower, including its species and dimensions of its botanical parts, sepal and petal, in centimeters.

Sorry, something went wrong. The python code is below:. This dataset Fisher iris data is included in the free trial offered by Penny Analytics, who run an online outlier detection service. You will need to download their version of the dataset to be sure to get the free pricing. Congrats aturner-ca for getting free advertising! Nice spamming skills. The comment seems to me to be about Penny Analytics, and not really about the dataset.

I'd amend this by adding the column names inline, because the above approach removes the first row making it instead of If I want to see the mean of petal length of iris setosa. Looks like there are rows including the header row, so records, which is in line with the upstream dataset. Skip to content. Sign in Sign up. Instantly share code, notes, and snippets.

Last active Jun 17, Code Revisions 11 Stars 62 Forks Embed What would you like to do? Embed Embed this gist in your website. Share Copy sharable link for this gist. Learn more about clone URLs. Download ZIP. The Iris Dataset. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below.

To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters. Copy link.

