---
layout: post
title: Introduction to R Data types
date: 2015-06-12
## Make sure to change these
published: true
sitemap: true
---

Everything in R is an object, there are five basic or *“atomic”* classes
of objects. The most basic object is a **vector**, it can contain
objects of the same class only, the one exception is a **list**, which
is represented as a vector but can contain different classes and indeed
that’s usually how lists are used.

Objects
-------

-   character
-   numeric (real numbers)
-   integer
-   complex (3 - 4i)
-   logical (True/False)

Numbers are double precision real numbers. If needed to specify an
integer we should use the **L** suffix. There are a couple of special
numbers, **Inf** and **NaN** which stand for Infinite and Not A Number.

Objects in R can have attributes

-   name, dimnames
-   dimensions (matrices, arrays)
-   class

Vectors
-------

The `c()` function is used to create vectors.

   x <- c(1, 3, 4, 7)
   x <- vector("numeric", lenght=2)

When different objects are mixed in a vector, *coercion* occurs so that
every element in the vector is of the same class. Objects can be
explicitly coerced using the `as.*` function. If non-sense coercion is
tried the result is **NA**.

   x <- 0:6
   class(x)
   as.numeric(x)
   as.logical(x)
   as.character(x)

Lists
-----

Lists are very important in R, the element in a list are enclosed in
double square brackets and can be accesed by their index in the list.

   x <- list(1, "a", TRUE)
   [[2]]
   [2] "a"

Matrices
--------

Special vetor in R with a *dimension attribute* which is itself an
integer vector of lenght 2 *(nrow, ncol)*. Matrices are constructed
column-wise.

   m <- matrix(nrow = 2, ncol = 3)
   dim(m)
   attributes(m)

Matrices can also be created directly from vectors by adding a dimension
attribute.

   m <- 1:10
   dim(m) <- c(2,5)

Binding can be used to create matrices also by using the functions
`cbind()` and `rbind()`

   x <- 1:3
   y <- 10:12
   cbind(x,y)
   rbind(x,y)

Factors
-------

Are used to represent categorical data, can be ordered or unordered. One
can think a factor as an integer vector where each integer has a
*label*. Using factors with labels is better than using integers because
factors al self-described.

The order of the labels can be set using the **levels** argument, this
can be important for linear modeling.

   x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
   table(x)
   unclass(x)
   attr(,"levels")

Missing values
--------------

Missing values are denoted by **NA** or **NaN** fr undefined
mathematical operations. NA values can have a class also, so there are
integer NA, character NA, etc. A NaN value is also an NA value, but not
the opposite.

   x <- c(1, 2, NA, 10, 3)
   is.na(x)
   is.nan(x)

Data frames
-----------

Are used to store tabular data, they are represented as a special list
has to have the same length. Each element can be thought of as a column
and the length of each element of the list is the number of rows. Unlike
matrices, data frames can store different classes of objects in each
column. Data frames have a special attribute called *row.names*

   x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
   x
   nrow(x)
   ncol(x)

Names
-----

R objects can have names, which is useful for writing readable code.

   x <- 1:3
   names(x)
   x
   names(x) <- c("foo", "bar", "norf")
   x
   x <- list(a = 1, b = 2, c = 3)
   x
   m <- matrix(1:4, nrow = 2, ncol = 2)
   dimnames(m) <- list(c("a", "b"), c("c", "d"))
   m

EOF