Topic Audit: Data Science

R Programming:
The Language of Data.

R is a special tool made just for numbers, charts, and facts. If you want to understand information, R is the best helper you can get.

What Exactly is R?

The Simple Answer

Think of R as a super-powered calculator that can read giant spreadsheets and draw beautiful pictures.

Instead of clicking buttons with a mouse (like you do in Excel), you type simple commands. You tell R what to do, and it does it instantly, even if you have millions of rows of data.

  • It is 100% Free. You do not have to pay to use it. Anyone can download it right now.
  • It is Made for Data. While other languages build websites or games, R was built purely to understand numbers.

R vs. Normal Spreadsheets

Feature Normal Spreadsheets R Programming
Data Size Slows down with lots of rows. Can handle millions of rows easily.
Repeating Work You have to click the same buttons every week. Write code once, press play forever.
Making Errors Easy to delete a cell by mistake and not know. Code keeps a perfect record of everything you did.

The Heart of R: Data Frames

In R, we store information in something called a Data Frame. It looks just like a table. Every column is a type of fact. Every row is a single item. Let us look at a Data Frame of fruits.

Row Number
Name
Color
Weight (grams)
1
Apple
"Red"
150
2
Banana
"Yellow"
120
3
Grape
"Green"
5

Add-ons: The Power of Packages

By itself, R is smart. But its real power comes from Packages.

Packages are bundles of extra code that other smart people wrote. You can download them for free. Think of them like apps on a smartphone.

There is a giant online store of these packages called CRAN. It has over 18,000 free tools!

ggplot2

The best tool in the world for drawing beautiful charts and graphs.

dplyr

A tool to slice, filter, and sort your data incredibly fast.

tidyr

Cleans up messy data so it is neat, tidy, and ready to use.

shiny

Turns your data into a real website that other people can click and play with.

Reading R Code

R code is designed to look like simple math and English. In R, we use an arrow symbol <- to put data into a name. Let us look at a simple example.

script.R
# 1. We create a list of ages
ages <- c(25, 30, 22, 40, 28)

# 2. We ask R to find the average (mean) age
average_age <- mean(ages)

# 3. We ask R to print the answer
print(average_age)
29

Drawing Pictures: Data Visualization

If you have a thousand numbers, your brain cannot understand them. But if you turn those numbers into a picture, your brain understands it instantly.

R is famous around the world because its pictures (charts and graphs) look highly professional. Newspapers and scientific magazines use R to draw their charts.

"A picture is worth a thousand numbers."

Who Uses R for Work?

Doctors & Biologists

They use R to track diseases and understand how medicines work on the human body.

Finance Experts

Banks use R to guess if the stock market will go up or down based on old numbers.

Shop Owners

Big stores use R to find out what items people buy together (like milk and cookies).

Weather Forecasters

They load temperature data into R to predict if it will rain next week.

How to Start Using R Today

1

Download Base R

First, you need the brain. Go to the CRAN website and download the R software for your computer (Windows or Mac).

2

Download RStudio

Next, you need a nice face for the brain. RStudio is a free program that makes typing R code much easier and prettier to look at.

3

Type Your First Code

Open RStudio, type print("Hello World") and press Enter. You are now a programmer!

Simple R Dictionary

Variable

A nickname you give to a piece of data so you can remember it later.

Vector

A simple line or list of items. Like a shopping list.

Function

An action word. It tells R to DO something (like add numbers, or draw a chart).

Machine Learning

Teaching the computer to find patterns in old data so it can guess what will happen in the future.

Bringing Data Into R

Before R can read your data, you have to bring it inside the program. Most of the time, data lives inside a file on your computer. R has simple commands to open almost any file type.

CSV Files

A very basic spreadsheet file. This is the most common format.

read.csv("file.csv")

Excel Files

Files made directly in Microsoft Excel. You need a package for this.

read_excel("data.xlsx")

Web Data

You can paste a web link, and R will download the data right from the internet.

read.csv("http...")

The 3 Flavors of Data

R needs to know exactly what kind of facts it is looking at. You cannot do math on a word, and you cannot spell with a number. There are three main types you must know:

Numbers

Also known as "Numeric". Used for math, age, height, and money.

age <- 25
price <- 19.99

Words

Also known as "Characters" or "Strings". Always put them inside quote marks.

name <- "John"
city <- "Tokyo"

True / False

Also known as "Logical". Used to answer yes or no questions.

is_raining <- TRUE
is_open <- FALSE

Cleaning Messy Data

In the real world, data is never perfect. People spell things wrong, or they forget to fill out forms. When a box is empty, R calls it NA (Not Available). You have to clean this up before doing math.

MESSY DATA
Row 1: [100, "Dog", TRUE]
Row 2: [ NA, "Cat", TRUE]
Row 3: [150, "Dog", FALSE]
CLEAN DATA
Row 1: [100, "Dog", TRUE]
Row 2 is removed!
Row 3: [150, "Dog", FALSE]

Grouping Things Together

Imagine a pile of mixed coins. To count them, you would first separate the pennies, nickels, and dimes into their own groups. Then, you count each group.

R does exactly this with a tool called group_by(). You can ask R to group all sales by "City", and then find the average money made in each city. It does this instantly.

Group A
Group B
Group C
SUM()

Making Reports: R Markdown

Nobody wants to read raw computer code. Your boss just wants the answers and the charts.

R Markdown is a magic tool. It lets you write normal English words, and put your R code right next to it. When you press a button called "Knit", R squishes everything together and creates a beautiful PDF or Word Document.

  • Code runs automatically.
  • Charts update by themselves.
  • It looks incredibly professional.
FINAL PDF!

Sales Report

Here is the chart showing our growth this year. As you can see, profits are up.

The Big Fight: R vs. Python

R

The Specialist

  • Built by statisticians for data.
  • The best charts out of the box.
  • Incredible for academic research.

Python

The All-Rounder

  • Built for any type of programming.
  • Great for making websites or apps.
  • The favorite for deep Artificial Intelligence.

Conclusion: Both are amazing. Pick R for pure data work, pick Python to build software.

Common Beginner Mistakes

Computers are very strict. If you make a tiny typo, the computer will stop working and give you an error. Here are the 3 most common traps beginners fall into.

Capital Letters

R cares about big and small letters. Age is completely different from age. If you mix them up, R will be confused.

Missing Brackets

If you open a door, you have to close it. If you type an open bracket (, you MUST type a closing bracket ) at the end.

Missing Commas

When giving R a list of numbers, you must put a comma between every single item. c(1, 2, 3) works. c(1 2 3) will crash.

You Are Not Alone: The Community

Learning to code can be hard. The good news is that R has one of the nicest, most helpful groups of users on the internet.

If your code is broken and you do not know why, someone has probably already had the exact same problem, and someone else has posted the answer online.

Stack Overflow WEBSITE
R-Ladies GLOBAL GROUP
#rstats SOCIAL MEDIA

Saving Your Results

Once you clean your data and make a chart, you need to get it OUT of R to share it with your boss or team.

Save as CSV

Push your clean data back into a spreadsheet file everyone can open.

Save as Image

Download your charts as high-quality pictures to put in presentations.

The Secret: Cheat Sheets

You Do Not Have to Memorize Anything.

Even experts look up codes every single day. Nobody remembers all the commands.

The creators of RStudio make free Cheat Sheets. These are 1-page PDF documents that show you exactly what to type to get things done.

If you want to make a chart, you just print out the "Data Visualization Cheat Sheet" and keep it next to your keyboard. It is that simple!