Go to the Account on the botton left on foundy -> Settings -> Tokens
List datasets in a folder and get their RID
import os
FOLDER_RID = "ri.compass.main.folder.ID"
headers = {
'Authorization': f'Bearer {os.environ["PALANTIR_API_TOKEN"]}',
'Content-Type': 'application/json'
}
url = f"https://{os.environ['FOUNDRY_INSTANCE']}.palantirfoundry.com/foundry-catalog/api/catalog/resources/{FOLDER_RID}/children"
Details -> Files to delete invidual files
Preview -> Schedules to build a dataset on a cron schedule or trigger on event (such as file upload).
transforms-python/src/myproject/datasets/python-transform.py
"""
# Once you dropped your input file into data/raw/raw_text_files with naming convention
# FISRTNAME_LASTNAME_day_DD_input.txt
# Add your name to the USERS below and hit build.
# If there isn't a sample text for the day please into data/raw/raw_text_files as
# sample_day_DD_part_N
"""
import polars as pl
from transforms.api import (
transform,
lightweight,
Input,
LightweightInput,
Output,
LightweightOutput,
)
USERS = [
"ray_bell",
"santa_clause",
]
def transform_generator(file_names: list):
transforms = []
for file_name in file_names:
@lightweight
@transform(
my_input=Input(
"ri.foundry.main.dataset.ID"
),
my_output=Output(
f"/FOLDER/{file_name}"
),
)
def text_file_to_table(
my_input: LightweightInput,
my_output: LightweightOutput,
file_name: str = file_name,
):
try:
_file = next(my_input.filesystem().ls(glob=f"{file_name}.txt"), None)
with my_input.filesystem().open(_file.path, "rb") as f:
lines = f.readlines()
my_output.write_table(
pl.DataFrame({"input": [line.strip().decode() for line in lines]})
)
except Exception as e:
print(f"Error processing {file_name}: {str(e)}")
transforms.append(text_file_to_table)
return transforms
DAYS = [f"{day:02d}" for day in range(1, 32)]
user_files = [f"{user}_day_{day}_input" for user in USERS for day in DAYS]
sample_files = [f"sample_day_{day}_part_{part}" for day in DAYS for part in ["1", "2"]]
TRANSFORMS = transform_generator(user_files + sample_files)
if a text file is in the repository you can do. Ensure these files are packaged. Edit setup.py to include package_data={'': ['*.txt']}.
from transforms.api import transform_df, Output
from pkg_resources import resource_stream
@transform_df(
output=Output("/FOLDER/tables/day_01_part_1")
)
def create_dataset_from_text_file(ctx):
with resource_stream(__name__, "text_files/day_01_part_1.txt") as file:
lines = file.read().decode('utf-8').splitlines()
return ctx.spark_session.createDataFrame([(line,) for line in lines], ['text'])
Import data -> Transform e.g:
Get media references (datasets)
uuid
-> Add Output -> Dataset
Dataset of text files to table
def input_tables(inputs_text, text_file="sample_day_01_part_1.txt"):
print(f"processing {text_file=}")
# from pyspark.sql import functions as F
from pyspark.sql.types import StringType, StructType, StructField
schema = StructType([StructField("input", StringType(), True)])
text_files = [_file for _file in inputs_text.filesystem().ls() if _file.path == text_file]
text_file = text_files[0]
df = spark.read.text(inputs_text.filesystem().hadoop_path + "/" + text_file.path)
df = df.withColumnRenamed("value", "input")
# Remove any leading/trailing whitespace
# df = df.withColumn("input", F.trim(F.col("input")))
return df
https://www.palantir.com/docs/foundry/functions/foo-getting-started
https://www.palantir.com/docs/foundry/functions/api-attachments
https://github.com/palantir/osdk-ts/tree/main/examples
import { Function, Integer } from "@foundry/functions-api";
import { Objects } from "@foundry/ontology-api";
export class MyFunctions {
@Function()
myFunc(name: string = "ray_bell"): Integer {
const puzzleInputs = Objects.search()
.puzzleInputsWithUuid()
.filter((puzzleInput) =>
puzzleInput.fileName.exactMatch(`${name}_day_01_input.txt`)
)
.all();
const combinedInput = puzzleInputs
.map((puzzleInput) => puzzleInput.input)
.join("\n");
const leftList: number[] = [];
const rightList: number[] = [];
const lines = combinedInput.trim().split("\n");
for (const line of lines) {
const [left, right] = line.trim().split(/\s+/).map(Number);
leftList.push(left);
rightList.push(right);
}
leftList.sort((a, b) => a - b);
rightList.sort((a, b) => a - b);
let totalDistance = 0;
for (let i = 0; i < leftList.length; i++) {
totalDistance += Math.abs(leftList[i] - rightList[i]);
}
return totalDistance;
}
}
https://www.palantir.com/docs/foundry/functions/python-getting-started
https://www.palantir.com/docs/foundry/ontology-sdk/python-osdk
from functions.api import function
from ontology_sdk import FoundryClient
from ontology_sdk.ontology.objects import (
PuzzleInputsWithUuid,
)
import polars as pl
@function
def day_1_part_1_solver(name: str = "ray_bell") -> int:
client = FoundryClient()
filtered_data = client.ontology.objects.PuzzleInputsWithUuid.where(
PuzzleInputsWithUuid.object_type.file_name == f"{name}_day_01_input.txt"
)
df = pl.DataFrame(filtered_data.to_dataframe()[["input"]])
answer = (
df.with_columns(pl.col("input").str.split_exact(" ", n=2))
.unnest("input")
.cast(pl.Int64)
.select(abs(pl.col("field_0").sort() - pl.col("field_1").sort()))["field_0"]
.sum()
)
return answer
First we have to create an ontology backed by the processed dataset. Next we will write a python function to do the same as above
It comes with environmental variables FOUNDRY_EXTERNAL_HOST=https://dtn-training.palantirfoundry.com and FOUNDRY_TOKEN
When you first open a notebook it'll be backed by a repository. First install pyyaml. You can either use the library installer on the left or edit .envs/maestro/meta.yaml to look like
package:
name: '{{ PACKAGE_NAME }}'
version: '{{ PACKAGE_VERSION }}'
source:
path: ../src
requirements:
run:
# - foundry-platform-sdk
- foundry-dev-tools
- toml
- pyyaml
- ipykernel
- pip
- foundry-transforms-lib-python
- pandas
Packages are installed using conda as
maestro env conda install pyyaml
import subprocess; subprocess.run(["maestro", "env", "conda", "install", "foundry-dev-tools", "pandas", "polars", "pyarrow", "s3fs", "xarray", "zarr", "toml"], capture_output=True, text=True, check=True)
Packages are installed using pip as:
/home/user/envs/default/bin/python -m pip install pandas polars --prefix /home/user/envs/default --force-reinstall --progress-bar off --retries 1
import subprocess; subprocess.run(["/home/user/envs/default/bin/python", "-m", "pip", "install", "foundry-dev-tools-transforms", "foundry-dev-tools[full]", "pandas", "polars", "pyarrow", "s3fs", "xarray", "zarr", "--prefix", "/home/user/envs/default", "--force-reinstall", "--progress-bar", "off", "--retries", "1"], capture_output=True, text=True, check=True)
Add a dataset using the side bar on the left. What this does is add a file called aliases.yml to .foundry which looks like
sample_day_01_part_1:
rid: "ri.foundry.main.dataset.ID"
You can write this manually by creating a python such as and run as %run import_datasets.py
import os
import yaml
# The data was generated using
# from foundry_dev_tools import FoundryContext
# ctx = FoundryContext()
# folder = "ri.compass.main.folder.ID"
# children = list(ctx.compass.get_child_objects_of_folder(folder))
# data = {f["name"]: {"rid": f["rid"]} for f in children}
data = {...}
data["raw_text_files"] = {"rid": "ri.foundry.main.dataset.ID"}
user = os.environ["GIT_AUTHOR_NAME"].lower().replace(" ", "_")
filter_terms = [user, "sample", "raw_text_files"]
filtered_data = {k: v for k, v in data.items() if any(term in k.lower() for term in filter_terms)}
original_dir = os.getcwd()
try:
os.chdir('..')
os.makedirs('.foundry', exist_ok=True)
with open('.foundry/aliases.yml', 'w') as file:
yaml.dump(filtered_data, file, default_flow_style=False)
print("File .foundry/aliases.yml has been created successfully in the parent directory.")
print("Wait a few seconds and your datasets should be viewable on the Datasets tab to the left")
finally:
os.chdir(original_dir)
Read data
from foundry.transforms import Dataset
table = Dataset.get("titantic").read_table(format="arrow")
pandas_df = Dataset.get("titantic").read_table(format="pandas")
try:
polars_df = Dataset.get("titantic").read_table(format="polars")
except ModuleNotFoundError:
polars_df = None
_ds = Dataset("raw_text_files")
local_file = _ds.files().filter(lambda f: f.path == f"{puzzle_input}.txt").download()
with open(local_file["sample_day_01_part_1.txt"], "r") as f:
lines = f.readlines()
lines
https://www.palantir.com/docs/foundry/code-workspaces/jupyterlab/#streamlit-applications
Jupyter Workspace -> Applications -> Streamlit -> Click install -> Publish. It will create a file called app.py.
When you click on the jupyter workspace project it'll open it in the repo view. Click on the ... in the top right to open in it back up in Jupyter lab
Document (.pdf). Can upload .txt and it will be converted to a .pdf.
https://www.palantir.com/docs/foundry/agent-studio/overview/
You can pass it a media set
Create a folder like so Learning. Create a folder per course. Open the folder then go to the applications page (3 x 3 dots on left). Find demo and hit install again. Install to your folder in the learning page. When importing the resources as a prefix to the objects like "raybell".
REMOVED - How to bring an open source model into Palantir
https://learn.palantir.com/speedrun-data-connection - s3 and REST API
https://learn.palantir.com/speedrun-your-e2e-aip-workflow - AI Agent
https://learn.palantir.com/deep-dive-creating-your-first-ontology - Ontology (https://www.youtube.com/watch?v=SOW0IA_I0bk)
https://learn.palantir.com/deep-dive-building-your-first-application - Workshop application
https://build.palantir.com/platform/e99a7898-6dc2-4394-a5ec-a583a2d87568 - Cross-validate Images and Documents using AIP
https://build.palantir.com/platform/f5f350c4-e5e1-4e81-a3e7-141902bac29e - Advanced Document Parsing: Semantic Chunking
https://build.palantir.com/platform/37b17993-ba4d-4345-8ebd-0d634327a5f5 - Advanced Document Parsing: Semantic Chunking - Building Block
https://aip.palantir.com/?industry=Government+%26+Security&_gl=1*1o2tne*_gcl_au*MTQ1OTc3MTYyOS4xNzMwMjE1MzM2 - Government use cases with Palantir.