---
title: "Finding and extracting values and indices"
author: "Fred Hasselman"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Finding and extracting values and indices}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ">"
)
library(invctr)
```

## Finding and extracting values

The function groups `insiders`, `outsiders` and `extractors` provide infix functions that can be used to extract values from vectors.


### `insiders` and `outsiders`

These functions return values inside or outside a given interval. Inclusion or exclusion of interval endpoints follows the common notation for open and closed intervals: `[` and `]` means inclusion, and `(` and `)` means exclusion of endpoints.

The syntax is always:

`vector infix interval` 

Depending on which function is called, the return value is either a logical vector indicating which values are inside or outside the interval, or, the actual values (use the functions with a dot between the operators `%[.]%`)

The syntax and function is similar to those provided in package `DescTools` (I did not test whether they give the same results).

```{r}
x <- 0:9

# Inside open interval
x %()% c(5,9)

# Inside closed interval
x %[]% c(5,9)

# Outside open interval
x %)(% c(5,9)

# Outside closed interval
x %][% c(5,9)

# All variations left/right open/closed are possible
x %[)% c(5,9)
x %](% c(5,9)
```

### How to use...

Indices are commonly used to extract values, if you add a dot `.` inbetween the the interval symbols, values will be extracted.

```{r}
# Regular indexing works, but is a bit 'wordy'
x[x %[]% c(5,9)]

# Easier to use the special functions
x %[.]% c(5,9)

# Extract first, last, or, middle value of x
x %:% "f"
x %:% "m"
x %:% "l"

# Simulate a sample from a standard normal distribution
set.seed(4321)
Zscore <- rnorm(100)

# Find Z-scores that are 'significant' at alpha = .05
Zscore %).(% c(-1.96,1.96)

# Old indexing has a lot of repetition, so does tidyverse, e.g. using filter()
Zscore[Zscore < -1.96 | Zscore > 1.96]

```


### `extractors`

Extracting a subset of values from the front or rear of a vector is a common task and the `base` functions `head()` and `tail()` can do this. The infix functions in the `extractors` group mimic some of this behaviour and add the ability to extract *from - to*, or, *up -and-untill*, a specific value. 


```{r}
# A character vector
z <- letters

# Extract front by first occurrence of value "n"
z %[f% "n"

# Extact first, middle, last of z
z %:% "f"
z %:% "m"
z %:% "l"

# Extract by percentile
seq(1,10,.5) %(q% .5 # infix
seq(1,10,.5)[seq(1,10,.5) < quantile(seq(1,10,.5),.5)] # regular syntax

seq(1,10,.5) %q]% .5 # infix
seq(1,10,.5)[seq(1,10,.5) >= quantile(seq(1,10,.5),.5)] # regular syntax

# Random uniform integers
set.seed(123)
x <- round(runif(100,1,100))

# Extract front up and untill index 10
x%[%10 # infix
x[1:10] # regular [saves just 1 char]

# Extract from index 90 to rear
x%]%90 # infix
x[90:length(x)] # regular

# Extract numbers from front to first occurrence of 11
x%[f%11 # infix
x[1:which(x==11)[1]] # regular

# Extract numbers from last occurrence of 11 to rear
x%l]%11 # infix
x[which(x==11)[length(which(x==11))]:length(x)] # regular

# Extract by indices if an index range provided
# This is a clear case in which the infix is less sensible to use than regular indexing:
x%]%c(6,10) # infix
x[6:10] # regular

z%[%c(6,10) #infix
z[6:10] #regular
```

## Finding and extracting indices

The `fINDexers` group provides infix functions that can return column and row names based on indices, or, indices based on column and row names. Take for instance data frame `d`:


```{r echo=FALSE}
# data frame
d <- data.frame(x=1:5,y=6,txt=paste0("delta = ",6-1:5),row.names=paste0("ri",5:1))
knitr::kable(d)
```

We can use the infix functions to get names and indices of `d`:

```{r}
# Columns
 "txt"%ci%d # infix
 which(colnames(d)%in%"txt") # regular

 2%ci%d # infix
 colnames(d)[2] # regular
  
# Rows
 "ri4"%ri%d # infix
 which(rownames(d)%in%"ri4") # regular
 
 2%ri%d # infix
 rownames(d)[2] # regular
 
# Change column name
 colnames(d)["y"%ci%d] <- "Yhat" # infix
 colnames(d)[colnames(d)%in%"y"] <- "Yhat" # regular
```


For 1D list and vector objects `%ri%` and `%ci%` return the same value.

```{r}
 l <- list(a=1:100, b=LETTERS)

 2%ci%l == 2%ri%l
 "a"%ci%l == "a"%ri%l

# Named vector
 v <- c("first" = 1, "2nd" = 1000)

 1%ci%v == 1%ri%v
 "2nd"%ci%v == "2nd"%ri%v
```

Function `%mi%` will return row and/or column names on 2D objects: data frames, matrices, tibbles, etc.

```{r}
# Data frame d
 c(5,2) %mi% d

 list(r="ri1",c=2) %mi% d

# matrix row and column indices
(m <- matrix(1:10,ncol=2, dimnames = list(paste0("ri",0:4),c("xx","yy"))))

 1 %ci% m
 5 %ci% m # no column 5

 1 %ri% m
 5 %ri% m

 c(5,1)%mi%m
 c(1,5)%mi%m
```

Function `%ai%` is a version of `%in%` that returns the indices of all occurrences of one or more values in an object.

```{r}
# get all indices of the number 1 in v
 1 %ai% v

# get all indices of the number 3 and 6 in d
 c(3,6) %ai% d

 # Simulate a sample from a standard normal distribution
 set.seed(1234)
 Zscores <- rnorm(100)
 
 Zscores%).(%c(-1.96,1.96) %ai% Zscores # returns a data frame with values and indices
 
 which(Zscores%)(%c(-1.96,1.96)) # returns an index vector

```