Knowledge Management (Data Scientist) SME - Military veterans preferred

Raytheon is seeking experienced data scientist/applications developer SMEs to create, leverage, and apply data science approaches to successfully implement information management solutions.


  • Identify, implement and improve methods for duplicate detection, document categorization, entity and information extraction using natural language processing, machine learning, data mining, and statistical algorithms.
  • Assist with the design and implementation of visualizations and reports for business intelligence metrics.
  • Propose, implement, and evaluate content analytic strategies for characterizing and categorizing large data sets of unstructured files and messages using COTS/GOTS/Open Source tools.
  • Develop custom software as required by Sponsor to characterize and categorize large datasets of unstructured files and messages.
  • Serve as a Subject Matter Expert (SME) in discussions with analytic tool developers and enterprise IT management.
  • Partner with information management SMEs to define and refine framework, strategies, and actions for collecting and analyzing unstructured file metadata and content stored in Sponsor's automated systems (e.g. email repositories, databases, shared drives).
  • Implement collection and analysis actions, such as ingesting, indexing, normalizing, and structuring file content and metadata in preparation for analysis using tools in the big data environment (GOTS, COTS, and open source tools including but not limited to Hadoop, Hive, Tableau, Spark, Visual Studio, Tensorflow and other emerging technologies).
  • Partner with information management SMEs to determine baseline, analyze patterns and characteristics in file content and metadata, and construct visualizations to share lessons learned and provide output recommendations based upon analytic results.
  • Lead and/or contribute to discussions with Sponsor and Sponsor partners on collection and analysis framework, strategies, processes, and methodologies.
  • Build relationships with stakeholders to negotiate access, security, and storage needs for the unstructured file objects and the features created during the collection and analysis process.
  • Provide recommendations and training to Sponsor and Sponsor partners on techniques and tools in the big data environment.
  • Write MapReduce jobs, Hive queries; Python, Java, Scala, R, and Scala programs as appropriate to perform various tasks related to machine learning and data science activities including data cleanup, data transformation, data mashing, data searching, and algorithm parallelization.
  • Implement algorithms from various sources (academia, federal labs or other Government Agencies) into parallelized MapReduce.
  • Analyze and correlate large amounts of data.
  • Run machine learning workflows from various platforms (such as Python, Spark, and Tensorflow) on large amounts of data.
  • Administer, configure, and optimize a distributed cluster ecosystem such as Hadoop or Spark.
  • Write and deploy web-based applications using HTML, JavaScript, Java, graph, and similar technologies that expose data and allow end users to view and interact with it.

Mandatory Skills/Experience:
  • Demonstrated on-the-job experience integrating and analyzing large data sets using big-data technologies such as Hadoop and Spark
  • Demonstrated on-the-job experience indexing data sets using Solr and/or ElasticSearch
  • Demonstrated on-the-job experience with Java, Python, and Bash scripting
  • Demonstrated on-the-job experience proposing, implementing and evaluating strategies for characterizing and categorizing large data sets.
  • Demonstrated on-the-job experience performing statistical analysis on large data sets
  • Ability to communicate technical concepts to a non-technical audience.

Desired Skills/Experience:
  • Understand and implement methodologies that are consistent with standard techniques in the data science field.
  • Familiarity with Linux/Windows
  • Systems administration of AWS
  • Systems administration of distributed systems such as Hadoop and Spark
  • Experience using databases such as Oracle and MySQL
  • Experience creating applications and visualizations with Javascript frameworks such as AngularJS or ReactJS
  • Familiarity with Scikit-Learn, Gensim, NLTK, Spacy and the applications of these tools to Natural Language Processing
  • Familiarity with Theano, Tensorflow, Torch, Keras, Mxnet, Deeplearning4j and the application of these tools to Natural Language Processing
  • Familiarity with classification and clustering algorithms such as LightGBM, Xgboost, Random Forest, Support Vector Machine, K-means and t-SNE

