Application System Design Architecture Outline

Mindwatering Incorporated

Author: Tripp W Black

Created: 03/21 at 06:24 PM

Category:
General Web Tips
Other

Task:
Assemble a working outline or template from which design decisions can be made.

Core Summary (5 mins - hrs.)
- High-level
- What is it for? What need does it provide?
- Where is it needed?
- Why is it needed?
Feature Expectations: (5 mins - 1 hr)
- Use cases
- Use cases/scenarios not covered by this app
- Who will use?
- How many will use?
- Usage Patterns (times, and methods)
- Roles of types of users (readers/consumers, content creators, approvers, etc.)
Estimations (15 mins - hrs)
- Throughput of NICs Receive/Send Ratio
  - Throughput by Queries/second (QPS)
  - Throughput by Size, Upload and Download, of those queries/second (MBs/sec or GBs/sec)
- Throughput Read/Write Ratio
  - Write (QPS, volume of data)
  - Read (QPS, volume of data)
- Latency
  - Expected latency for read/write queries
- Storage Estimates
- Memory Estimates
  - If a cache, how much cached in memory?
  - Horizontal scaling loading an option?
  - If a disk cache, why and how much to store?
  - Types of disks/SSDs needed based on answers above
- CPU Estimates
  - Number of CPUs/node/VM
  - Number of CPUs total across nodes/VMs (horizontal scaling)
Design Performance Goals (15 mins)
- Latency and Throughput minimum/maximum requirements for scaling
- Consistency vs. Availability
  - Weak/strong --> eventual consistency
  - Failover/replication --> availability
High-Level Design (1 hr - hrs)
- APIs for CRUD scenarios (read, create/write, update, delete) for main design elements (e.g. doc forms, views, reports, metrics, static content vs dynamic, etc.)
- Containerization Platform Considerations
- Database Schema
- If performing Data Normalization/Machine Learning, design algorithms
  - Divide and Conquer (smaller subproblems) or Dynamic Programming (simpler/smaller subproblems)
  - Streaming Algorithms when data too big to fit all in memory or data is a continuous feed
  - Hashing and Indexing Algorithms for large data lookups and insertions
- High-level design (workflow process) for primary read scenario
- High-level design (workflow process) for primary read/write scenario
- High-level design (workflow process) for data management roles (e.g. approvers, data librarians, analysts performing searches for discoveries, etc.)
- Data archiving policies and where
Deeper-Dive Design (1 hr - hrs)
- Scaling Code
  - Algorithms
  - App code base(s) ensuring they can scale horizontally/asymmetrically and vertically
- Scaling App Components for Code
  - Availability, Consistency, for each App Component
    - Retries, Observability, and Reliability
  - Patterns of above, across all or sections of the App Components
- App Components to Cover
  - DNS (internal and external)
  - CDN (Push vs. Pull)
  - Load Balancers (Active-Passive, Active-Active, Layer 4, Layer 7)
  - Reverse Proxy
  - Application Layer Scaling (Microservices, Service Discovery)
  - Database options:
    - RDBMS: ACID Properties, Primary-Secondary, Primary-Primary, Federation, Sharding, Denormalization, SQL Tuning - Postgres
      - Use-cases: Structured data with relationships
      - Index Scaling
    - NoSQL: Key-Value, Wide-Column, Document - Domino NSF, MongoDB, DynamoDB
      - Use-cases: Unstructured or semi-structured data
    - Graph: Neo4j, Amazon Neptune
      - Use-cases: Social networks, knowledge graphs, recommendation systems, and bioinformatics
    - NewSQL: Key-Value with ACID Properties - CockroachDB, Google Spanner, VoltDB
      - Use-cases: Transaction processing, real-time analytics and IoT device data
    - Time Series: Time-stamped data points - InfluxDB, TimescaleDB, Prometheus
      - Use-cases: IoT sensor data, financial market data, system metrics, and logs
    - High-dimensional Vector Data - Pinecone, Weaviate, KDB.AI
      - Use-cases: Machine learning, similarity search, and recommendation systems
    - Fast lookups:
      - RAM (Bounded size) => Redis, Memcached.
      - AP (Unbounded size) => Cassandra, RIAK, Voldemort, DynamoDB (default mode)
      - CP (Unbounded size) => HBase, MongoDB, Couchbase, DynamoDB (consistent read setting)
  - Caches:
    - Cache Types:
      - Client caching
      - CDN caching
      - Web server caching
      - Database caching
      - Application caching
      - Query level caching
      - Object level caching
    - Cache Eviction Policies:
      - Cache aside
      - Write through
      - Write behind
      - Refresh ahead
  - Asynchronism:
    - Message queues
    - Task queues
    - Back pressure
  - Network Communication:
    - TCP
    - UDP
  - Client to Server, Server to Server Communication Protocols:
    - TCP - REST/API
    - TCP - RPC
    - TCP - WebSockets
Justify (15 mins)
- Throughput of Each Layer
- Latency Caused Between Each Layer
- Overall Latency Justification
Key Metrics to Measure (15 mins - 1 hr)
- Identify Key Metrics relevant to your system's design:
  - Availability
  - Latency
  - Throttling
  - Request Patterns/Volume
  - Measure Customer Experience
  - App Component/Feature Specific Metrics
    - Search - What keyword searches = empty (failure from the user/customer perspective)
- Define metrics for infrastructure and tools/resources:
  - Grafana with Prometheus
  - AppDynamics
System Health Monitoring (15 mins - 1 hr)
- Measure app index and latency of microservices:
  - New Relic
  - AppDynamics
- Monitoring health and performance:
  - Grafana with Prometheus
  - AppDynamics
- Simulate Customer Experience:
  - Canaries - Pro-active detection of service degradation
Log Systems (15 mins - 1 hr)
- Implement metrics gathering and visualization dashboards
- Implement Log Collection and Analyzation:
  - Elastic, Logstash, Kibana (ELK)
  - Splunk
  - Logtail
Security (15 mins - 1 hr)
- Firewall
- TLS transmissions encryption
- Data encryption at rest
- Authentication / Authorization
- Limited Egress/Ingress Rules
- Implementation of Least Privilege for Roles