Disclaimer: The content on this site comprises author's personal study notes . These notes may contain errors. Users are advised to independently verify information.
I acknowledge and appreciate the contributions of following educators whose material I utilized in preparation.
Stephene Mareek - Udemy - SAA-C03 course - https://www.udemy.com/course/aws-certified-solutions-architect-associate-saa-c03/
Stephene Mareek & Abhishek Singh- Udemy - SAA-C03 Practice exams - https://www.udemy.com/course/practice-exams-aws-certified-solutions-architect-associate/
tutorialsdojo.com - Practice test, Cheatsheets - https://portal.tutorialsdojo.com/courses/aws-certified-solutions-architect-associate-practice-exams/
AWS - RDS , EC2 Reserved Instances etc, NAT, IGW, DynamoDb, S3 classes,
---------------------------General----------------
Tasks that require root
Change your account settings. ...
Restore IAM user permissions. ...
Activate IAM access to the Billing and Cost Management console.
View certain tax invoices. ...
Close your AWS account. ...
Register as a seller in the Reserved Instance Marketplace.
Configure an Amazon S3 bucket to enable MFA (multi-factor authentication).
CloudWatch - resource performance monitoring, events, and alerts;
CloudTrail - account-specific activity and audit;
AWS Config - resource-specific history, audit, and compliance;
Event bridge integrates directly with 3rd party partners - not SQS
S3: strong consistency
KMS keys that AWS services create in your AWS account are AWS managed keys.
KMS keys that AWS services create in a service account are AWS owned keys.
S3 Deep Archiver - retrieval 12 hours , bulk - 48hrs
Weighted routing - single domain or subdomain
IOPS: io1( high performance , db) > gp2 ( dev/test/boot/transactional) > st1 ( large dataset - ETL) > sc1 ( cold - less assessed workload)
-----------------------------------------------
WAF. ( Layer 7)
- protects from common web exploits,
- monitor http and https
- protect CF, API GW, ALP, AppSync, Cognito user pool
- control access , IP match/ranges - 10k
- Define WebACL ( size constraints, rate based, DDoS, geo-match) , regional except for CloudFront
- Web doesn’t work with NLB so use Global Accelerator for fixed IP and WAF on the ALB
Rekognition
- Content moderation
- A2I ( Amazon Augemented AI) -> manual review
Transcribe
- uses ASR - Automated speech recognition
- PII - redaction,
- customer service call, captioning, meta data for media data in archive
Polly
- text to speech , lexicon file can be uploaded
Translate
- to localise contnent
Lex & Connect
- ASR - speech to text + NLU ( Natural Language Understanding)
- chatbots,call center bots
- Connect: contact flows, virtual contact center, integratoin with CRM,
- call center workflows
Comprehend
- NLP ( Natual Language Processing)
- Use ML to understand how positive or negative sentimate is
- to analyse customer interactions, postive or negative exp
- to group articles by topic
- Comprehend Medical notes - create insights
SageMaker
- to build ML models ( for devs and scientist)
- label -> build -> train and tupe -> apply model
Forecast
- use ML to forecast
- use historical time series data
- upload data to S3 ->Foreast -> Forecast model
Kendra
- Document search service ( text/pdf/powerpoint/MSWord)
Personalise
- same tech as use by Amazon.com - product recommendations
- retail stores, media and recommendation, reads data from S3
Textract
- Optical Character recognition(OCR) : extract text handwriting and data from any scanned docs
- from forms , tables, pdfs , images
Glue
- managed ETL service
- covert data into Parquet format ( columnar data format + Athena)
- Glue Data Catalog: catalog of datasets created by running data crawler
- RDS/S3/DynamodB --> AWS Glue data Crawler----> Glue data Catalog ----> Glue jobs/ Athena/EMR
- Glue Job bookmarks : prevent re-processing old data
- Glue Elastic Views : monitors for change in source data, materialise view
- Glue DataBrew : clean and normalise data using pre-built transformation
- Glue Studio: GUI to create and monitor ETL jobs in Glue
- Glue Streaming ETL -> compatible with Kinesis Data Streaming, Kafka , MSK
EMR
- Elastic MapREduce, Hadoop/Big Data clusters ( long running/transient)
- bundled with Apache Spark , HBase, Presto,Flink
- use case : data processing, machine learning, web indexing, big data
- Master node : manage cluster - longer running ( can be reserved)
- Core node : run and store tasks - long running ( can be reserved)
- Task node - run tasks - spot
MSK ( Managed Streaming for ApacheKafka)
- manage/creage
- serverless
Control Tower
- ongoing governance using control tower- guardrails
- preventive guardrails.- scps
- detective guardrails - AWS Config
AWS Config
- per region service but can be aggregated
- store in S3 and analyse by Athena
- AWS managed - 75 rules
- custom rules - must define in AWS Lambda function
- only compliance - no action like deny
- Automate remediations using SSM Automation documents or using AWS managed documents
Storage gateway
- S3 file g/w, FSX file gw, Volume gw, tape gw
GuardDuty
- Intelligent threat detection -> initiate lambda for remediation/prevention
- S3, EC2, ECS, EKS, RDS, Lambda, Malware
- comprised credentials, suspicious logics,
- routes finding to AWS Sec Hub and Event bridge
- think of secops
- vpc flow logs, dns logs, cloud trails + optional logs
- intrusion prevention using ML
- Crypto attack
Inspector
- S/w vul & n/w exposure
- agent and agentless scanning
- zero-day vul, patch remediation, compliance
- think of dev workflow
- reduce MTTR ( mean time to remediate)
CloudFront
- origin: S3, ELB and custom https but not ASGs
- edge location and regional edge caches
- customer at edge using Lambda@edge
Global Accelerator
- good fit for non-http ( gaming UDP), IoT, Voice
- Use with s3 multi-region access points
- instant regional failover
Kinesis Data stream ( 'ingest' )
- rapidy move data off producer and cont. process
- injest data from source, process using Lambda, Glue streaming, custom apps or deliver to S3/Redshit via
- manual scaling - scale up require planning ahead
- storage 24hrs(default) up to 365 days
- multiple consumers
Kinesis Firehose ( storing)
- ‘ Near real time’.
- auto-scaling
- can use with lambda
- no storage
- 3rd party - Splunk etc
- use - ordering of record, multi apps consuming same stream, replay
- streaming ETL solution,
- capture, transform(Lambda/built-in) and load data into destinations,
- if using data stream has source, Kinesis's PutRecord and PutBachRecord are disable.
use datasteram's PutRecord and PutRecordsmin mine
- S3 or S3 redshift
Kinesis Data Analysis ( analyse)
- Cannot ingest data directly
SQS
- 1min - 14days ( default 4 days)
- Delay queues( message is hidden when first added to queue) vs visiblity timeout ( message is hidden after consumed from queue)
- < 10ms , 256Kb/message
- unlimited messages, 120k inflight, 20 for FIFO ( inflight - received by consumer but not deleted yet)
SNS
- event producer sends to only 1 topic, consumers subscribe to the topic
- filter feature
- 1.25mil subscription/topic , 100k topics
- subscribe: email/sms/http /. SQS/Lambda/Kinesis Data Firehose
- Publisher - Cloudwatch, s3 etc
EMR
- big data platform
- manage your own infra
OpenSearch
- CW logs --> Subscriptoin Filter -> Lambda function or Firehose -> OpenSearch
- Dynamodb -> Dynamoc DB stream -> Lambda -> OpemmSearch
- Kinessis Data Stream -> Firehose ( or./plus Lambda functoin) -> OpenSearch
Redshift
- not for OLTP but OLAP
- 10x better performance, scales to PBs of data
- Columnar data. - parallel query eingne , index , SQL interface
- Leader node , compute node, provision node in advance , Reserved instances
- Multi AZ:
- Snapshots ---> S3 ( incremental) -----> restore into a new cluster ( automated/scheduled/manual)
- automatically copy snapshots to another region
- firehose ---> redshift ( through S3 copy command run from redshipt)
- Enhance VPC routing ( keell all data private)
- JDBC for copying data into EC2 instance: batches
- Redshift spectrum : query data already in s3 without loading, specture nodes read S3 data and send it back to compute node (saves cost)
Storage gw
- Hybrid cloud storage service
- seamless way to connect to AWS sotrage via NFS/SMB
- file gw - access cloud storage
- vol gw - migrate data to aws
- tape gw - back up to cloud
Transfer family
- sFTP, FTPS, FTP
- into and out of S3 and EFS
SnowBall Edge
- offline data tranfser . b/w constraints , 80Tb X 10
SQS
- use - districtuted apps, ack/fail, indivitual msg delay, dynamic increase in concurrency
- for many consumers use a uniqu atrribute
Snowball
- For regional data tx, prefer S3
- For large dataset of 10Pb in single location
Snowmobile
- For < 100Pb , in distructued locations
S3:
- individual Object lock settings will override bucket level retention settings
- For audit, use SSE-KMS ( not SSE-S3)
- gateway endpoints do not allow access from on-premises networks or cross-region or cross-vpc
- interface endpoints - to access from on-prem or other regions or external to vic
- CORS
- Event destination: SNS/SQS/Lambda/Event bridge - only 1 type
* 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix
* no limits to the number of prefixes in a bucket.
ASG
- Use standby state to temporarily remove instance from ASG e.g. troubleshoot or patching
- default termination policy but we can use custom 'selects the Availability Zone with two instances, and terminates the instance that was launched from the oldest launch template or launch configuration. If the instances were launched from the same launch template or launch configuration, Amazon EC2 Auto Scaling selects the instance that is closest to the next billing hour and terminates it. '
- Launch Template can used to create mixed instances group - ondemand and spot
- launch config deprecating
- Scaling policy
- target tracking is best to handle sudden spike( e.g. average cpu 40%), or use RequestCountPerTarget
- simple scaling - you need to wait for cooldown period - not good for spike capacity units or % of group added - add,remove, set to
- step scaling - combination of multiple 'simple' scaling)
- Scaling Cooldowns - 5mins ( after or removing instance, allow metrics to stability) -no new launh or terminatations
- Predictive scaling - not good for heterogeneous set up as any type can be launched
ELB
- default rule - send to target group, add more rules using host headers etc or redirect or return fixed response
- NLB -> UDP , millions of request, 1 static IP per AZ, narrow down source ips
- NLB -> ALB ( fixed IP address from NLB plus all http rules for ALB).
- NLB healthcheck - TCP, HTTP, or HTTPS
- Gateway LB - layer 3 - to manage a fleet of 3rd party virtual appliacnes
- Sticky - Session Affinity - NLB, ALB, Classic LB
- Cross zone - evenly across all EC2
- multiple certs - SNI - only ALB and NLB , Cloud front
- Connection draining ( Classic LB) for others - Deregistration delay to let in-flight requests complete when instance is de-registering or unhealthy ( default 300 secs , or 1-3600secs, can disable, reduce for short requests)
RDS
- Single AZ,
- Multi AZ DB instance( 1 standby no read)- doesn't server read traffic - synchronous replication
- Multi AZ DB cluster - serve read traffic) - synchronous repliction - lower write latency - no-cross regional automated backups
- inside VPC -> outside VPC not supported
- Aurora Multi AZ Standby - only for increasing availability,
- Storage auto-scaling
- Enhanced metrics - OS processes
- Read replicas ( async replication) ; multi-region - async ; multi-az
Aurora-Serverless -for infrequent, intermittent, unpredicatable data- MySQL, PostgreSQL
Billing
- Cloudwatch alarm for bills thresholds
- forecast,
- Check if reserved instances in use
Budget
- only to track spending - doesn't enforce any limit
- 4 types - Cost / Usage / Reservation/saving plans
- create alerts on Utilization or Coverage
- annual or montly ( certain services or link accounts, apply credits etc)
- alerts to sns topic or email: threshold based on actual costs vs forecasted cost
Cost
- Cost Explorer - To enable Saving plan
- Preconfigured reports - by service - view and reports for incurred costs by region/service
- Usage + Forecast
- unblended -cash cost you are incurring
- amortized - accruals ( unblended + taking in account saving plans etc). ( hourly display os paid)
Trusted Advistor
- reduce cost, increase performance/security/fault tolerance /service limits
DynamoDb
- Eventual consistent read(default), Strongly consistency available
- ACID within single AWS account and region
- Standard table vs IA ( to save $) - can be changed
EC2
- Spot Fleet request - Price capacity optmised ( recommended - lowest price, best avail)
- capactity optmised , lowest price
- Requests type - persistent/one-time
- persistent request - spot instance rmains active until it expires or cancelled even if its fulfilled
- one-time - spot instance request remains active until instance is launched, request expires or cancelled.
- capactiy rebalance and termination notice
- partition placement - big data ; spread - HA; cluster - HPC
- Hibernating - EBS root restored, RAM reloaded, ex-process resumed, EBS attached
- dedicate host - licenses vCores etc.
- Hibernate - can only select at launch time - only pay for EBS and Elastic IP Addresses attach
- Hibernate - stopping to hibernate - also bills
- ENI & EIP stays with instance even after stopping
EFS
- Filesystem type - Regional & One zone - Storage class - Standard, IA ( few times each quarter -92%$), Archive
- Performance mode: General purpose ( latency senistive) or Max I/O
- Throughput mode - Elasitic( scaled throughput), Provisioned or Bursting
- Can receiving file from DataSync directly?
- doesn't support SMB - only NFS
RAM
- free
- share dubnets within a vps using RAM - share with org or organizational unit (OU) in AWS Organizations, an AWS account, IAM role, or IAM user.
Elasticache
- Cache - avoid for write intensive apps
- social n/w, game, media sharing, Q&A, compute intensive such as recommendation engine
- combine with db for leaderboard, counting, session and tracking ( not acheivable via db in cost effective way)
Datasync
- AWS: S3, EFS, FSx ( Windows, Lusture, OpenZFS, NetApp ONTAB0) Snowball
- on-prem: NFS, SMB, HDFS( Hadoop), Object storage
- 3rd party - Google/Azure etc
Cloudwatch
- metric stream -> Kinesis, S3, AWS Partner
- log stream - sequence of log event
- log group- group of log streams with same settings
CloudTrail
- log, continuously monitor and retain account activity e.g. who made API call
- cannot maintain history of resource config changes
AWS Beanstalk
- pre-built infra template .e.g web and worker
- deployment policy - all at once/rolling/rolling with additional batch, immuatable, traffic splitting
NAT gateway
- in each AZ
- default: public, create and assign elastic IP address
- route traffic to IGW
- in pvt subnet, cannot associate elastic ip with NAT, traffic drops from pvt NAT to IGW
- only uses EIP in conjunction with IGW
- can't be used as jumpbox because it's a managed service by AWS as against NAT instances
IGW
- Attach to VPC
- a subnet with route to IGW is pubic subnet
- egress only - for IPv6
Route53
- apex - aws.com , (cannot be used as cname but can be used as alias)
- alias - free , cannot point to EC2 DNS - prefer to AWS services or point a domain or subdoain to IP addess. can't map one domain to another
- cname - cost involved. can accept dns , important: map one domain name to anotehr
- you can create a CNAME record that redirects queries from acme.example.com to zenith.example.com
- You can't create a CNAME record that has the same name as the hosted zone (the zone apex).
- You cannot create a CNAME record for example.com, but you can create CNAME records for www.example.com, newproduct.example.com, and so on.
- if you create a CNAME for www.example.com, you cannot create any other records for which the value of the Name field is www.example.com.
- health check live outside VPS can't reach pvt resources , cant access private endpint
DMS
- discover using DMS fleet advisor
- migrate using DMS Schema conversion, coverts source to a new target engine ( schema conversion tool)
- AWS DMS to perform one time migration or replicate ongong change - CDC - Change data Capture
- replication: recommended Mulit AZ option althought slow downs, only limited DMS copied
- can be used between S3 and Kinesis Data streams
Cognito
- user pool - user directly that stores usr profile attributes after sign up
- identity pool - containers that Cognito Identity uses to keep federated identities organised. dont have user profile
- one identity pool can be used by two apps, but same user will have different unique identifier in each idenity pool
- OpenID connect-based user profile attributes
- User pool - authn - pool of users - hosted UI
- Identity pool - authz - pool of identity providers
-