Plots With R
- R Tutorial
- Plotting Data In R
- 3d Plots With R
- Reading Line Plots With Fractions
- Color Lines In Plot R
- R Plot Points
- Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par. For simple scatter plots, plot.default will be used. However, there are plot methods for many R objects, including function s, data.frame s, density objects, etc. Use methods (plot) and the documentation for these.
- Dummies has always stood for taking on complex concepts and making them easy to understand. Dummies helps everyone be more knowledgeable and confident in applying what they know.
- Combining Plots. R makes it easy to combine multiple plots into one overall graph, using either the par( ) or layout( ) function. With the par( ) function, you can include the option mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row.
- R Data Interfaces
- R Charts & Graphs
- R Statistics Examples
- R Useful Resources
- Selected Reading
The plot function in R is used to create the line graph. The basic syntax to create a line chart in R is − plot(v,type,col,xlab,ylab) Following is the description of the parameters used − v is a vector containing the numeric values. Normal QQ Plots We look at some of the ways R can display information graphically. This is a basic introduction to some of the basic plotting commands. It is assumed that you know how to enter data or read data files which is covered in the first chapter, and it is assumed that you are familiar with the different data types.
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
The basic syntax for creating scatterplot in R is −
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the plot.
Example
We use the data set 'mtcars' available in the R environment to create a basic scatterplot. Let's use the columns 'wt' and 'mpg' in mtcars.
When we execute the above code, it produces the following result −
Creating the Scatterplot
The below script will create a scatterplot graph for the relation between wt(weight) and mpg(miles per gallon).
When we execute the above code, it produces the following result −
Scatterplot Matrices
When we have more than two variables and we want to find the correlation between one variable versus the remaining ones we use scatterplot matrix. We use pairs() function to create matrices of scatterplots.
Syntax
The basic syntax for creating scatterplot matrices in R is −
Following is the description of the parameters used −
formula represents the series of variables used in pairs.
data represents the data set from which the variables will be taken.
Example
Each variable is paired up with each of the remaining variable. A scatterplot is plotted for each pair.
When the above code is executed we get the following output.
Contents
We look at some more options for plotting, and we assume that you arefamiliar with the basic plotting commands (Basic Plots). Avariety of different subjects ranging from plotting options to theformatting of plots is given.
In many of the examples below we use some of R’s commands to generaterandom numbers according to various distributions. The section isdivided into three sections. The focus of the first section is ongraphing continuous data. The focus of the second section is ongraphing discrete data. The third section offers some miscellaneousoptions that are useful in a variety of contexts.
Contents
In the examples below a data set is defined using R’s normallydistributed random number generator.
One common task is to plot multiple data sets on the same plot. Inmany situations the way to do this is to create the initial plot andthen add additional information to the plot. For example, to plotbivariate data the plot command is used to initialize and create theplot. The points command can then be used to add additional datasets to the plot.
First define a set of normally distributed random numbers and thenplot them. (This same data set is used throughout the examples below.)
Note that in the previous example, the colour for the second set ofdata points is set using the col option. You can try differentnumbers to see what colours are available. For most installationsthere are at least eight options from 1 to 8. Also note that in theexample above the points are plotted as circles. The symbol that isused can be changed using the pch option.
Again, try different numbers to see the various options. Anotherhelpful option is to add a legend. This can be done with the legendcommand. The options for the command, in order, are the x and ycoordinates on the plot to place the legend followed by a list oflabels to use. There are a large number of other options so usehelp(legend) to see more options. For example a list of colors canbe given with the col option, and a list of symbols can be givenwith the pch option.
Figure 1.
Another common task is to change the limits of the axes to change thesize of the plotting area. This is achieved using the xlim andylim options in the plot command. Both options take a vector oflength two that have the minimum and maximum values.
Another common task is to add error bars to a set of data points. Thiscan be accomplished using the arrows command. The arrows commandtakes two pairs of coordinates, that is two pairs of x and yvalues. The command then draws a line between each pair and adds an“arrow head” with a given length and angle.
Figure 2.
Note that the option code is used to specify where the bars aredrawn. Its value can be 1, 2, or 3. If code is 1 the bars are drawnat pairs given in the first argument. If code is 2 the bars aredrawn at the pairs given in the second argument. If code is 3 thebars are drawn at both.
In the previous example a little bit of “noise” was added to the pairsto produce an artificial offset. This is a common thing to do formaking plots. A simpler way to accomplish this is to use the jittercommand.
Figure 3.
Note that a new command was used in the previous example. The parcommand can be used to set different parameters. In the example abovethe mfrow was set. The plots are arranged in an array where thedefault number of rows and columns is one. The mfrow parameter is avector with two entries. The first entry is the number of rows ofimages. The second entry is the number of columns. In the exampleabove the plots were arranged in one row with two plots across.
Figure 4.
There are times when you do not want to plot specific points but wishto plot a density. This can be done using the smoothScatter command.
Figure 5.
Note that the previous example may benefit by superimposing a grid tohelp delimit the points of interest. This can be done using the gridcommand.
There are times that you want to explore a large number ofrelationships. A number of relationships can be plotted at one timeusing the pairs command. The idea is that you give it a matrix or adata frame, and the command will create a scatter plot of allcombinations of the data.
Figure 5.
A shaded region can be plotted using the polygon command. Thepolygon command takes a pair of vectors, x and y, and shades theregion enclosed by the coordinate pairs. In the example below a bluesquare is drawn. The vertices are defined starting from the lowerleft. Five pairs of points are given because the starting point andthe ending point is the same.
A more complicated example is given below. In this example therejection region for a right sided hypothesis test is plotted, and itis shaded in red. A set of custom axes is constructed, and symbols areplotted using the expression command.
Figure 6.
The axes are drawn separately. This is done by first suppressing theplotting of the axes in the plot command, and the horizontal axis isdrawn separately. Also note that the expression command is used toplot a Greek character and also produce subscripts.
Finally, a brief example of how to plot a surface is given. Thepersp command will plot a surface with a specified perspective. Inthe example, a grid is defined by multiplying a row and column vectorto give the x and then the y values for a grid. Once that is donea sine function is specified on the grid, and the persp command isused to plot it.
The %*% notation is used to perform matrix multiplication.
Plotting Data In R
In the examples below a data set is defined using R’s hypergeometricrandom number generator.
The plot command will try to produce the appropriate plots based onthe data type. The data that is defined above, though, is numericdata. You need to convert the data to factors to make sure that theplot command treats it in an appropriate way. The as.factor commandis used to cast the data as factors and ensures that R treats it asdiscrete data.
In this case R will produce a barplot. The barplot command can alsobe used to create a barplot. The barplot command requires a vector ofheights, though, and you cannot simply give it the raw data. Thefrequencies for the barplot command can be easily calculated usingthe table command.
In the previous example the barplot command is used to set the titlefor the plot and the labels for the axes. The labels on the ticks forthe horizontal axis are automatically generated using the labels onthe table. You can change the labels by setting the row names of thetable.
The order of the frequencies is the same as the order in the table. Ifyou change the order in the table it will change the way it appears inthe barplot. For example, if you wish to arrange the frequencies indescending order you can use the sort command with the decreasingoption set to TRUE.
The indexing features of R can be used to change the order of thefrequencies manually.
The barplot command returns the horizontal locations of thebars. Using the locations and putting together the previous ideas aPareto Chart can be constructed.
Mosaic plots are used to display proportions for tables that aredivided into two or more conditional distributions. Here we focus ontwo way tables to keep things simpler. It is assumed that you arefamiliar with using tables in R (see the section on two way tables formore information: Two Way Tables).
Here we will use a made up data set primarily to make it easier tofigure out what R is doing. The fictitious data set is definedbelow. The idea is that sixteen children of age eight areinterviewed. They are asked two questions. The first question is, “doyou believe in Santa Claus.” If they say that they do then the term“belief” is recorded, otherwise the term “no belief” is recorded. Thesecond question is whether or not they have an older brother, oldersister, or no older sibling. (We are keeping it simple here!) Theanswers that are recorded are “older brother,” “older sister,” or “noolder sibling.”
The data is given as strings, so R will automatically treat them ascategorical data, and the data types are factors. If you plot theindividual data sets, the plot command will default to producingbarplots.
If you provide both data sets it will automatically produce a mosaicplot which demonstrates the relative frequencies in terms of theresulting areas.
The mosaicplot command can be called directly
3d Plots With R
The colours of the plot can be specified by setting the colargument. The argument is a vector of colours used for the rows. SeeFgure :ref`figure7_intermediatePlotting` for an example.
Figure 7.
The labels and the order that they appear in the plot can be changedin exactly the same way as given in the examples for barplot above.
When changing the order keep in mind that the table is a twodimensional array. The indices must include both rows and columns, andthe transpose command (t) can be used to switch how it is plottedwith respect to the vertical and horizontal axes.
Contents
The previous examples only provide a slight hint at what ispossible. Here we give some examples that provide a demonstration ofthe way the different commands can be combined and the options thatallow them to be used together.
First, an example of a histogram with an approximation of the densityfunction is given. In addition to the density function a horizontalboxplot is added to the plot with a rug representation of the data onthe horizontal axis. The horizontal bounds on the histogram will bespecified. The boxplot must be added to the histogram, and it willbe raised above the histogram.
The dev commands allow you to create and manipulate multiplegraphics windows. You can create new windows using the dev.new()command, and you can choose which one to make active using thedev.set() command. The dev.list(), dev.next(), and dev.prev()command can be used to list the graphical devices that are available.
In the following example three devices are created. They are listed,and different plots are created on the different devices.
There are a couple ways to print a plot to a file. It is important tobe able to work with graphics devices as shown in the previoussubsection (Multiple Windows). The first way explored is to usethe dev.print command. This command will print a copy of thecurrently active device, and the format is defined by the deviceargument.
In the example below, the current window is printed to a png filecalled “hist.png” that is 200 pixels wide.
Reading Line Plots With Fractions
To find out what devices are available on your system use the helpcommand.
Another way to print to a file is to create a device in the same wayas the graphical devices were created in the previous section. Oncethe device is created, the various plot commands are given, and thenthe device is turned off to write the results to a file.
Basic annotation can be performed in the regular plottingcommmands. For example, there are options to specify labels on axes aswell as titles. More options are available using the axis command.
Most of the primary plotting commands have an option to turn off thegeneration of the axes using the axes=FALSE option. The axes can bethen added using the axis command which allows for a greater numberof options.
In the example below a bivariate set of random numbers are generatedand plotted as a scatter plot. The axes are added, but the horizontalaxis is located in the center of the data rather than at the bottom ofthe figure. Note that the horizontal and vertical axes are addedseparately, and are specified using the first argument to thecommand. (Use help(axis) for a full list of options.)
Color Lines In Plot R
In the previous example the at option is used to specify the tick marks.
When using the plot command the default behavior is to draw anaxis as well as draw a box around the plotting area. The drawing ofthe box can be suppressed using the bty option. The value can be“o,” “l,” “7,” “c,” “u”, “],” or “n.” (The lines drawn roughly looklike the letter given except for “n” which draws no lines.)The box can be drawn later using the box command as well.
The par command can be used to set the default values for variousparameters. A couple are given below. In the example below the defaultbackground is set to grey, no box will be drawn around the window, andthe margins for the axes will be twice the normal size.
Another common task is to place a text string on the plot. The textcommand takes a coordinate and a label, and it places the label at thegiven coordinate. The text command has options for setting theoffset, size, font, and other options. In the example below the label“numbers!” is placed on the plot. Use help(text) to see moreoptions.
R Plot Points
The default text command will cut off any characters outside of theplot area. This behavior can be overridden using the xpd option.