Palantir

REST API

Go to the Account on the botton left on foundy -> Settings -> Tokens

List datasets in a folder and get their RID

import os

FOLDER_RID = "ri.compass.main.folder.ID"

headers = {

'Authorization': f'Bearer {os.environ["PALANTIR_API_TOKEN"]}',

'Content-Type': 'application/json'

}

url = f"https://{os.environ['FOUNDRY_INSTANCE']}.palantirfoundry.com/foundry-catalog/api/catalog/resources/{FOLDER_RID}/children"

Data Connection

https://learn.palantir.com/speedrun-data-connection

Datasets

Details -> Files to delete invidual files

Preview -> Schedules to build a dataset on a cron schedule or trigger on event (such as file upload).

Code repositories

https://www.palantir.com/docs/foundry/transforms-python/read-files-repository/#read-files-in-a-repository

Text files to Datasets

transforms-python/src/myproject/datasets/python-transform.py

"""

# Once you dropped your input file into data/raw/raw_text_files with naming convention

# FISRTNAME_LASTNAME_day_DD_input.txt

# Add your name to the USERS below and hit build.

# If there isn't a sample text for the day please into data/raw/raw_text_files as

# sample_day_DD_part_N

"""

import polars as pl

from transforms.api import (

transform,

lightweight,

Input,

LightweightInput,

Output,

LightweightOutput,

)

USERS = [

"ray_bell",

"santa_clause",

]

def transform_generator(file_names: list):

transforms = []

for file_name in file_names:

@lightweight

@transform(

my_input=Input(

"ri.foundry.main.dataset.ID"

),

my_output=Output(

f"/FOLDER/{file_name}"

),

)

def text_file_to_table(

my_input: LightweightInput,

my_output: LightweightOutput,

file_name: str = file_name,

):

try:

_file = next(my_input.filesystem().ls(glob=f"{file_name}.txt"), None)

with my_input.filesystem().open(_file.path, "rb") as f:

lines = f.readlines()

my_output.write_table(

pl.DataFrame({"input": [line.strip().decode() for line in lines]})

)

except Exception as e:

print(f"Error processing {file_name}: {str(e)}")

transforms.append(text_file_to_table)

return transforms

DAYS = [f"{day:02d}" for day in range(1, 32)]

user_files = [f"{user}_day_{day}_input" for user in USERS for day in DAYS]

sample_files = [f"sample_day_{day}_part_{part}" for day in DAYS for part in ["1", "2"]]

TRANSFORMS = transform_generator(user_files + sample_files)

if a text file is in the repository you can do. Ensure these files are packaged. Edit setup.py to include package_data={'': ['*.txt']}.

from transforms.api import transform_df, Output

from pkg_resources import resource_stream

@transform_df(

output=Output("/FOLDER/tables/day_01_part_1")

)

def create_dataset_from_text_file(ctx):

with resource_stream(__name__, "text_files/day_01_part_1.txt") as file:

lines = file.read().decode('utf-8').splitlines()

return ctx.spark_session.createDataFrame([(line,) for line in lines], ['text'])

Pipeline builder

Import data -> Transform e.g:

Get media references (datasets)
uuid

-> Add Output -> Dataset

Code workbook

Dataset of text files to table

def input_tables(inputs_text, text_file="sample_day_01_part_1.txt"):

print(f"processing {text_file=}")

# from pyspark.sql import functions as F

from pyspark.sql.types import StringType, StructType, StructField

schema = StructType([StructField("input", StringType(), True)])

text_files = [_file for _file in inputs_text.filesystem().ls() if _file.path == text_file]

text_file = text_files[0]

df = spark.read.text(inputs_text.filesystem().hadoop_path + "/" + text_file.path)

df = df.withColumnRenamed("value", "input")

# Remove any leading/trailing whitespace

# df = df.withColumn("input", F.trim(F.col("input")))

return df

Upload file

Fusion

https://www.palantir.com/docs/foundry/fusion/overview/

DEPRECATED

Typescript functions

https://www.palantir.com/docs/foundry/functions/foo-getting-started

https://www.palantir.com/docs/foundry/functions/api-attachments

https://github.com/palantir/osdk-ts/tree/main/examples

import { Function, Integer } from "@foundry/functions-api";

import { Objects } from "@foundry/ontology-api";

export class MyFunctions {

@Function()

myFunc(name: string = "ray_bell"): Integer {

const puzzleInputs = Objects.search()

.puzzleInputsWithUuid()

.filter((puzzleInput) =>

puzzleInput.fileName.exactMatch(`${name}_day_01_input.txt`)

)

.all();

const combinedInput = puzzleInputs

.map((puzzleInput) => puzzleInput.input)

.join("\n");

const leftList: number[] = [];

const rightList: number[] = [];

const lines = combinedInput.trim().split("\n");

for (const line of lines) {

const [left, right] = line.trim().split(/\s+/).map(Number);

leftList.push(left);

rightList.push(right);

}

leftList.sort((a, b) => a - b);

rightList.sort((a, b) => a - b);

let totalDistance = 0;

for (let i = 0; i < leftList.length; i++) {

totalDistance += Math.abs(leftList[i] - rightList[i]);

}

return totalDistance;

}

Python functions

https://www.palantir.com/docs/foundry/functions/python-getting-started

https://www.palantir.com/docs/foundry/ontology-sdk/python-osdk

from functions.api import function

from ontology_sdk import FoundryClient

from ontology_sdk.ontology.objects import (

PuzzleInputsWithUuid,

)

import polars as pl

@function

def day_1_part_1_solver(name: str = "ray_bell") -> int:

client = FoundryClient()

filtered_data = client.ontology.objects.PuzzleInputsWithUuid.where(

PuzzleInputsWithUuid.object_type.file_name == f"{name}_day_01_input.txt"

)

df = pl.DataFrame(filtered_data.to_dataframe()[["input"]])

answer = (

df.with_columns(pl.col("input").str.split_exact(" ", n=2))

.unnest("input")

.cast(pl.Int64)

.select(abs(pl.col("field_0").sort() - pl.col("field_1").sort()))["field_0"]

.sum()

)

return answer

Workshops apps

First we have to create an ontology backed by the processed dataset. Next we will write a python function to do the same as above

Code Workspaces

Jupyterlab

It comes with environmental variables FOUNDRY_EXTERNAL_HOST=https://dtn-training.palantirfoundry.com and FOUNDRY_TOKEN

Jupyter notebook

When you first open a notebook it'll be backed by a repository. First install pyyaml. You can either use the library installer on the left or edit .envs/maestro/meta.yaml to look like

package:

name: '{{ PACKAGE_NAME }}'

version: '{{ PACKAGE_VERSION }}'

source:

path: ../src

requirements:

run:

# - foundry-platform-sdk

- foundry-dev-tools

- toml

- pyyaml

- ipykernel

- pip

- foundry-transforms-lib-python

- pandas

Packages are installed using conda as

maestro env conda install pyyaml

import subprocess; subprocess.run(["maestro", "env", "conda", "install", "foundry-dev-tools", "pandas", "polars", "pyarrow", "s3fs", "xarray", "zarr", "toml"], capture_output=True, text=True, check=True)

Packages are installed using pip as:

/home/user/envs/default/bin/python -m pip install pandas polars --prefix /home/user/envs/default --force-reinstall --progress-bar off --retries 1

import subprocess; subprocess.run(["/home/user/envs/default/bin/python", "-m", "pip", "install", "foundry-dev-tools-transforms", "foundry-dev-tools[full]", "pandas", "polars", "pyarrow", "s3fs", "xarray", "zarr", "--prefix", "/home/user/envs/default", "--force-reinstall", "--progress-bar", "off", "--retries", "1"], capture_output=True, text=True, check=True)

Add a dataset using the side bar on the left. What this does is add a file called aliases.yml to .foundry which looks like

sample_day_01_part_1:

rid: "ri.foundry.main.dataset.ID"

You can write this manually by creating a python such as and run as %run import_datasets.py

import os

import yaml

# The data was generated using

# from foundry_dev_tools import FoundryContext

# ctx = FoundryContext()

# folder = "ri.compass.main.folder.ID"

# children = list(ctx.compass.get_child_objects_of_folder(folder))

# data = {f["name"]: {"rid": f["rid"]} for f in children}

data = {...}

data["raw_text_files"] = {"rid": "ri.foundry.main.dataset.ID"}

user = os.environ["GIT_AUTHOR_NAME"].lower().replace(" ", "_")

filter_terms = [user, "sample", "raw_text_files"]

filtered_data = {k: v for k, v in data.items() if any(term in k.lower() for term in filter_terms)}

original_dir = os.getcwd()

try:

os.chdir('..')

os.makedirs('.foundry', exist_ok=True)

with open('.foundry/aliases.yml', 'w') as file:

yaml.dump(filtered_data, file, default_flow_style=False)

print("File .foundry/aliases.yml has been created successfully in the parent directory.")

print("Wait a few seconds and your datasets should be viewable on the Datasets tab to the left")

finally:

os.chdir(original_dir)

Read data

from foundry.transforms import Dataset

table = Dataset.get("titantic").read_table(format="arrow")

pandas_df = Dataset.get("titantic").read_table(format="pandas")

try:

polars_df = Dataset.get("titantic").read_table(format="polars")

except ModuleNotFoundError:

polars_df = None

_ds = Dataset("raw_text_files")

local_file = _ds.files().filter(lambda f: f.path == f"{puzzle_input}.txt").download()

with open(local_file["sample_day_01_part_1.txt"], "r") as f:

lines = f.readlines()

lines

Steamlit

https://www.palantir.com/docs/foundry/code-workspaces/jupyterlab/#streamlit-applications

Jupyter Workspace -> Applications -> Streamlit -> Click install -> Publish. It will create a file called app.py.

When you click on the jupyter workspace project it'll open it in the repo view. Click on the ... in the top right to open in it back up in Jupyter lab

Media set

Document (.pdf). Can upload .txt and it will be converted to a .pdf.

AIP Agent Studio

https://www.palantir.com/docs/foundry/agent-studio/overview/

You can pass it a media set

AIP Threads

https://www.palantir.com/docs/foundry/threads/overview/ https://www.palantir.com/docs/foundry/threads/getting-started/

Compute module

https://www.palantir.com/docs/foundry/compute-modules/overview/

Resources / Marketplace

Create a folder like so Learning. Create a folder per course. Open the folder then go to the applications page (3 x 3 dots on left). Find demo and hit install again. Install to your folder in the learning page. When importing the resources as a prefix to the objects like "raybell".

REMOVED - How to bring an open source model into Palantir

https://learn.palantir.com/speedrun-data-connection - s3 and REST API

https://learn.palantir.com/speedrun-your-e2e-aip-workflow - AI Agent

https://learn.palantir.com/deep-dive-creating-your-first-ontology - Ontology (https://www.youtube.com/watch?v=SOW0IA_I0bk)

https://learn.palantir.com/deep-dive-building-your-first-application - Workshop application

https://build.palantir.com/platform/e99a7898-6dc2-4394-a5ec-a583a2d87568 - Cross-validate Images and Documents using AIP

https://build.palantir.com/platform/f5f350c4-e5e1-4e81-a3e7-141902bac29e - Advanced Document Parsing: Semantic Chunking

https://build.palantir.com/platform/37b17993-ba4d-4345-8ebd-0d634327a5f5 - Advanced Document Parsing: Semantic Chunking - Building Block

https://aip.palantir.com/?industry=Government+%26+Security&_gl=1*1o2tne*_gcl_au*MTQ1OTc3MTYyOS4xNzMwMjE1MzM2 - Government use cases with Palantir.

Other companies using Palantir

https://www.palantir.com/impact/swiss-re/