Do not re-distribute this resume without obtaining my consent.
Authoritative source for this resume:  http://www.rage.net/~greg/resume.txt

Greg Retkowski / 408-544-0437 / greg@rage.net

Principal Engineer, Leader, Network & Systems Architect, Cloud, DevOps

# Executive Summary:

Demonstrated history of adding organizational value through the successful
execution of innovative technological projects - delivering within schedule
and budget constraints while exceeding expectations - both as in individual
contributor and as a leader of teams.  Successful outcomes in organizations
ranging from fast-paced startups to large public companies.

Adept at all aspects of network, server, & service management, with a focus
on cloud and operations automation projects. Strong problem solving,
communications, technical, and leadership skills.

Delivered valuable results for Internet startups such as Nebula, OnLive,
Avvenu, Mediaplex, & SpeedEra - and established brands such as SilverSpring,
Upwork, & 8x8.

# Values:

  Commit and Deliver, Candor, Hands-On Technical Leadership, Adaptability,
Innovative, Customer Focused, Metrics Driven

# Technical Skills:

Customer Experience Monitoring: Anomaly Detection, Real User Monitoring,
  Synthetics
Logging: Logstash, OpenSearch, Syslog, Splunk, fluentd, gelf, Cloudwatch logs
Observability: Grafana, Prometheus/Thanos, Graphite, Icinga, SNMP
Observability SaaS: PagerDuty, DataDog, DynaTrace
Amazon/AWS: Athena/Glue, CloudFormation, CloudWatch, EC2, EKS, ElastiCache,
  IAM, OpenSearch/ElasticSearch, OpsWorks, RDS, Route53, S3, SNS, ECS, EKS,
  SQS, VPC
OpenStack: Nova (compute), Cinder (volume), Ceph, Neutron (network),
  Trove (DBaaS), LBaaS, Heat (orchestration), Keystone (Identity)
Python: flask, numpy, jupyter, pip, venv, pytest
Configuration Management: Chef, Puppet, Ansible, Terraform
Security: Firewalls, SAML/SSO, MFA/2FA, Vault, SSH, Sudo, SSL/PKI, VPN, ZTA
Containers: Docker, K8S/Kubernetes, ECS, EKS, compose, microservice arch
Methodologies: Kanban, TPM, CI/CD, Atlasian/Jira, DevOps, Agile
Web: Apache, haproxy, nginx, HTTP & HTTPS, REST, SOAP
Other Languages: Java, Perl, PHP, HTML, CSS, Shell, SQL, Ruby
Dev Tools: Git, Github, Gitlab, Bitbucket; APT/Dpkg & RPM/Yum, Jenkins
LLM/GenAI: falcon, mistral, vicuna, langchain, chromadb, openai chatgpt
Big Data & Analytics: Retool, Redash, Athena/GLUE
HW/OS: Intel/Linux, Ubuntu, Redhat

# Experience:

2022-Current Principal Engineer, Manger EHM, Upwork Inc; Remote
  Founded and managed the Experience Health Management team, focused on
Customer Experience Monitoring to address gaps in incident detection. Our team
made out-sized contributions to improving incident detection. Responsible for
the launch of three new initiatives; Synthetic Monitoring, Anomaly Detection
on critical Metrics, and Real User Monitoring. These initiatives doubled the
number of incidents detected within the 10 minute SLO. A very low percentage
of incidents now fall outside of this SLO due to these efforts. In addition
to hands-on engineering work, responsibilities included all TPM effort,
and bootstrapping/management for the team.

  Track record of engineering mentorship, the team members from the team
showed constant improvement and are now two are in the top 10 most productive
engineers in the company.

  'On-prem' (in-vpc) Deployment of a Generative AI Large Language
Model (LLM), comparison/eval of several models including Mistral, Vicuna,
Llama, Falcon. integration work using python/langchain.


2018-2022 Team Lead, Automation (DevOps) Team, Upwork Inc; Remote
  Diverse responsibilities including architecture guidance, and delivery of
many DevOps projects using tools including chef, python, jenkins, terraform. 

To highlight one of many projects during this period - was tasked with
remediation of the logging pipeline which corrupted a large percentage of log
entries - impacting the overall availability of the customer-facing service.
Rapidly triaged several failing portions of the system, and applied fixes
to address the scaling problems that were the root cause. 

Became the steward of the logging system and therefore made several
improvements which enabled the log volume to increase 4x to dozens of TB per
day. Launched a project to vastly increase the retention and query capabilities
by 26X. The logging system now provides the ability to query petabytes of data
with a query response time measured in seconds.


2016-2018     Systems Architect, Silver Spring Networks; Remote
  Primary Architect and Implementer of project to convert the company from
Perforce to Git/GitLab. Provided engineering management a cost/benefit
analysis and project plan for converting to Git/Gitlab and migrating
existing Perforce code. Deployed multiple GitLab servers and integrated
with existing systems and workflows for user authentication, continuous
integration, and ticket tracking. 

  Tasked with evaluating suitable options for a private cloud within the
organization. Deployed a 22-node pilot. Spec'ed, configured, and managed an
OpenStack deployment which included compute, network, volume, orchestration,
DBaaS, LBaaS, services & supported multi-region and IPv6 features. This
included deploying monitoring, metrics, & logging aggregation. Provided
demonstrations, training and documentation for end-user use of the private
cloud.

  Created a configuration management workflow for the Technical 
Operations department. Designed a workflow based on Ansible & GitLab 
with development & testing using test-kitchen & Docker containers and 
secret management. Wrote a python library to integrate Ansible with a 
Remedy asset database.

  Developed a training curriculum, including hands-on classes, training 
videos, documentation, and sandbox environments for the OpenStack, Ansible
and Git/Gitlab projects.


2014-2015 DevOps, Upwork (Elance); Mountain View, CA
  Created a front/back-end microservices monitoring system. Built on 
ruby and leveraged Icinga as an execution engine. It queries metadata 
from each microservice to generate monitors, thresholds, and route email 
and pager alerts to the appropriate team. This monitored hundreds of 
microservices and automatically adjusted to additions or removals of 
services as the topology evolved.

  Managed AWS cost management using resource tags. Costs were tracked by 
team, department, and tier. Wrote a ruby library and several tools 
manage tags for resources. Deployed and modified Netflix ICE for 
on-demand and scheduled reporting.

  Designed and built the disaster recovery environment. Did the system- 
archaeology on the undocumented/cloned server image and porting the 
configuration into Chef recipes. Created a tool 'cloudmanager' which 
allowed operators to define the entire configuration for an environment 
in YAML and then launch/provision/manage/teardown all or subsets of 
hosts in an environment. The hosts would do an unattended configuration 
on boot via Chef.

2012-2013 DevOps, Nebula, Inc. Palo Alto, CA

2008-2011  OpsEng Team Lead, OnLive, Inc; Palo Alto, CA
  In charge of the five-member senior systems administration team at
OnLive. The team is responsible for all server & network automation
projects surrounding the launch of the OnLive game service. The team built
out the service from less than 100 nodes to more than two orders of
magnitude larger. Implemented a team work-flow using Kanban, which allows
the team to simultaneously complete large projects, increase team
productivity, and react to changing priorities. It made it possible for
the team to deploy the large-scale production service while meeting an
aggressive schedule.

  Introduced the operations software-quality initiative, which ensured
systems-configuration code was thoroughly vetted via phased-release,
continuous-integration, and unit-tests. Used VM's to CI & Unit test system
configuration code. Defined and created the Release-to-Customer processes
- deployed all production releases for a period of about six months.

  Performed all of these team-leading activities while also performing my
responsibilities as a primary individual contributor.

  Key designer/implementer of the service software deployment mechanism.
Currently manages over a half a terabyte of data spread across thousands
of build-server produced RPM's. Designed and implemented a multicast
software distribution system that enables software to be deployed across
the network quickly with little server impact. Primary maintainer of our
puppet automation system which automatically provisions and maintains our
UNIX/Linux servers. Primary maintainer/contributor to the service
configuration database written in Ruby on Rails. The software was
responsible for maintaining & updating configurations of the hosts,
monitoring, trending, logging, reporting, install-automation and needed to
work in a mixed (Windows & Linux) environment.


2004-2007    Senior Network Architect, Avvenu, Inc; Palo Alto, CA
  Was the senior-most member of the technical operations staff. Designed
and deployed lights-off resilient network infrastructure for the roll-out of
a remote access service. Created a self-healing ability in the network
utilizing NAGIOS and cfengine. Automated all levels of management of the
network, from host installation through application upgrade via cfengine
and other tools. Evaluated network hardware, colocation facilities, operating
systems, and management tools based on requirements of service and software.

  Wrote and deployed a Ruby on Rails application to manage shared music
for subscribed users. Supported Avvenu's Facebook application by deploying
development and production environments and working with development
team. Deployed new corporate website which included re-coding CSS/HTML,
deploying and customizing Wordpress, writing a database-backed management
system for press releases. Set up openads (OpenX) to manage rotating content
internal to the Avvenu application.

2002-2004    Project Implementer, Chip Express Corp; Santa Clara, CA

2004-2005     Infrastructure Consultant, Integrated Devices; San Diego, CA

2004  Infrastructure Consultant, TruSonic Inc; San Diego, CA

2000-2001 IT Manager, Netergy Networks (8x8); Santa Clara, CA

2000  IT Manager, SpeedEra Networks (now Akamai); Santa Clara, CA

1999  Move Coordinator, Bamboo.Com (now IPIX); Palo Alto, CA

1999  Network Architect, MediaPlex; Cupertino, CA

1998-1999 Senior SysAdmin, SaveSmart Inc; Mountain View, CA

1997  Operations Manager, PocketScience Inc; Santa Clara, CA

1995-1997 Network Administrator, Safari Internet; Ft Lauderdale, FL
  First employee - grew network from inception to a 1500 user mixed
(consumer & business) Internet Service Provider.  Responsible for all aspects of
network administration in addition to Internet related programming and
consulting for company customers.

1991-1995 LWV (HMMWV) Mechanic. US Army; Schofield Bks, HI


Personal Achievements:

  Co-designed, Tested, and Implemented a homebuilt wireless router with a
1 megabit capacity at 13+ miles in 1998. Prior art to patent #7035281

  Responsible in whole or part in several Open Source software
initiatives; ported LDAP nameservices to Linux, wrote an authentication
hash library for TCL/TK, wrote several HOWTO's including Wireless-Router
and LDAP-authentication, and worked on several web based projects
including a search-engine submittal tool and a re-work of the Gnats system
web frontend. Ported the 'Pygame Learning Environment' Reinforcement Learning
environment to Python 3.

  Deployed a 8 node 2 segment 100-base-T enterprise class network to
provide remote management and monitoring facilities for the house lava
lamps. (1998)

  Published an article with Oreilly, titled "Self Healing Networks
with NAGIOS & Cfengine".

  Presented to 'Large Scale Production Engineering' group on operational
lessons learned building large scale production services

  Published "Migrating servers into OpenStack" for Sysadvent 2012

  Eagle Scout, Army Veteran

  Code samples available at github: https://github.com/gregretkowski

  FAA Certs: Private Pilot (SEL/Glider), Repairman, Remote Pilot
  
  If you google me, I'm the technologist / sailor / pilot. The German Elvis
  impersonator is someone else. :)
  

###

Do not re-distribute this resume without obtaining my consent. Email
(don't call) when establishing initial contact with me. Thank you.